When deploying with Terraform I wanted zip files to be reproducable. By that I mean that the hash of the zip files are identical, every byte matches.
The contents of two zip files can be identical, but various things will cause their hashes to be different.
These are the things which were required in order to make this happen. Each has more detail below. The attached Python function puts all this together to take a folder and make it into a reproducible directory.
- Files must be added to the zip archived in the same order
- Files must have identical last modified timestamps
- Files must have identical permissions
If the order is different, the hash is different. To solve this, you just have to sort the files before adding them to a zip file. To zip directory INPUT_DIR
into zip file ZIP_FILE
:
cd $INPUT_DIR
find . -print | sort | zip -X $ZIP_FILE -@"
Even if the hash of all the individual file contents in the zip are identical, their metadata may not be. Assuming we do not care about preserving modification time and that we just want the zip to be reproducable, this is achievable by running the command
touch -t 1701011215 FILE_NAME
on every file in the source directory before adding them to the zip archive. Here it will set their modified time to 12:15 on 1st January 2017.
Zip files also store "external file permissions", 4 bytes for each file. These four bytes are explained in this excellent StackOverflow answer. Bits 5-16 are most likely to be different, but bits 24-32 may get affected on Windows. So additionally, before adding files and folders to the zip file, run
chmod 777 DIR_NAME
on every directory, and
chmod 666 DIR_NAME
on every file.
The Terraform archive provider archive_file seems to give reproducible results, provided you use source
blocks and not the source_file
parameter. It looks like with the source_file
just zips up that file including permissions and timestamps, whereas source
blocks seem to be independent of them. In code, use this:
data "archive_file" "zip_file" {
type = "zip"
output_path = "archive.zip"
source {
content = file("myfile.txt")
filename = "myfile.txt"
}
}
Not this:
data "archive_file" "init" {
type = "zip"
source_file = "myfile.txt"
output_path = "archive.zip"
}