Last active

Accepts a list of 'bad' blob ids, and replaces them with a 'xxx.REMOVED.sha' placeholder

  • Download Gist
1 2 3 4 5 6 7 8 9
#! /usr/bin/env sh
TREEDATA=$(git ls-tree -r $2 | grep ^.......blob | cut -c13-)
while IFS= read -r line ; do
echo "$TREEDATA" | grep ^$line | cut -c42- | xargs -n1 -iX sh -c "echo $line > 'X.REMOVED.sha' && rm 'X'" &
done < $1

This script is run with git filter-branch like this:

git filter-branch --tree-filter '/home/roberto/guardian/ /home/roberto/guardian/top-50-biggest-blobs.txt $GIT_COMMIT' -- --all

Using a ramdisk on Ubuntu (big speed increase):

$ mkdir repo-in-ram
$ sudo mount -t tmpfs -o size=2048M tmpfs repo-in-ram
$ cd repo-in-ram

Since writing this gist I've created The BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch for cleansing bad data out of Git repository history:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

The BFG is 10 - 720x faster than git-filter-branch, turning an overnight job into one that takes less than ten minutes.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.