Skip to content

Instantly share code, notes, and snippets.

@lydell
Created March 18, 2022 21:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lydell/712790426dfbda18651f9017ca9c5a20 to your computer and use it in GitHub Desktop.
Save lydell/712790426dfbda18651f9017ca9c5a20 to your computer and use it in GitHub Desktop.
GitHub source zipball reproduction attempt
#!/usr/bin/env bash
# Example:
#
# 1. git clone git@github.com:elm/core.git
# 2. cd core
# 3. ./zip.bash 1.0.5 out.zip elm/core
# 4. Compare outputs of:
# - `unzip -l out.zip` and `unzip -l github-zipball.zip`
# - `wc -c out.zip` and `wc -c github-zipball.zip`
# Exit on errors.
set -e
# Read command line arguments.
tag="${1:?'You must provide a tag as the first argument.'}"
new_zip="${2:?'You must provide the output zip file as the second argument.'}"
repo="${3:?'You must provide the repo (user/reponame) as the third argument.'}"
# Turn the tag into a full commit hash.
commit="$(git rev-list -n 1 "$tag")"
# Get short hash ID of the annotated tag.
directory_suffix="$(git rev-parse --short "$tag")"
# Switch to the given commit.
git switch --detach "$commit"
# Format commit date for use with `touch` below.
date="$(TZ=UTC0 git show --quiet --date='format-local:%Y-%m-%dT%H:%M:%SZ' --format="%cd")"
# Create the directory to zip. Remove one created from previous runs first.
dir="${repo/\//-}-$directory_suffix"
rm -rf "$dir"
mkdir "$dir"
# Keep track of files and directories to put in the zip.
files=()
dirs=("$dir/")
# Go through all files and folders checked into git.
for original in $(git ls-tree -r -t --full-tree --name-only "$commit"); do
f="$dir/$original"
if test -d "$original"; then
# Give a trailing slash to directories. This is needed for sorting later.
dirs+=("$f/")
# Create directory.
mkdir -p "$f"
else
files+=("$f")
# Copy file to the zip directory.
cp "$original" "$f"
# Set mtime to the commit date.
touch -d "$date" "$f"
fi
done
for f in "${dirs[@]}"; do
# Set mtime to the commit date for all folders.
# This has to be done after all files are copied, since writing a file
# updates the mtime of folders.
touch -d "$date" "$f"
done
# Remove zip created from previous runs of this script, just in case.
rm -f "$new_zip"
# Create the new zip.
# -z creates a comment for the whole zip. The contents for the comment are passed on stdin via echo.
# GitHub seems to use the commit hash as comment.
# -9 is max compression. However, GitHub’s zips are even smaller. Not sure how they do that.
# The sort order is important – it’s preserved in the zip file.
all=("${dirs[@]}" "${files[@]}")
echo "$commit" | zip -z9 "$new_zip" $(IFS=$'\n'; echo "${all[*]}" | sort) >/dev/null
@lydell
Copy link
Author

lydell commented Mar 22, 2022

I got a tip that https://diffoscope.org/ is kind of made for this problem. It shows that for example file permissions differ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment