Skip to content

Instantly share code, notes, and snippets.

@bdjackson
Last active March 3, 2016 22:19
Show Gist options
  • Save bdjackson/e805841c2903d6d79b31 to your computer and use it in GitHub Desktop.
Save bdjackson/e805841c2903d6d79b31 to your computer and use it in GitHub Desktop.
RemoveBigFilesFromGitRepo

This solution is a combination of two sources

First, create list of all files tracked in the repository

git rev-list --objects --all | sort -k 2 > allfileshas.txt

Get the SHA for all committed files and sort them from biggest to smallest

git gc && git verify-pack -v .git/objects/pack/pack-*.idx | \
    egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | \
    sort -k 3 -n -r > bigobjects.txt

Extract the file names from bigobjects.txt for all the large, and write to a file

for SHA in `cut -f 1 -d\  < bigobjects.txt`; do
    echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | \
        awk '{print $1,$3,$7}' >> bigtosmall.txt
done;

Now, look at the bigtosmall.txt file, and keep only the file names that should be deleted from the history.

Look at each of the files listed in bigtosmall.txt, and remove them from the repository history. This is a destructive change!

for MY_BIG in $(cat bigtosmall.txt) ; do
  echo $MY_BIG
  git filter-branch -f \
                    --prune-empty \
                    --index-filter "git rm -rf --cached --ignore-unmatch $MY_BIG" \
                    --tag-name-filter cat -- --all
done

Sometimes, the above procedure will leave blobs in the repository that are no longer reachable in the history. Delete the blobs using the following

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git fsck --full --unreachable
git repack -A -d
git gc --aggressive --prune=now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment