Skip to content
Create a gist now

Instantly share code, notes, and snippets.

Accepts a list of 'bad' blob ids, and replaces them with a 'xxx.REMOVED.sha' placeholder
#! /usr/bin/env sh
TREEDATA=$(git ls-tree -r $2 | grep ^.......blob | cut -c13-)
while IFS= read -r line ; do
echo "$TREEDATA" | grep ^$line | cut -c42- | xargs -n1 -iX sh -c "echo $line > 'X.REMOVED.sha' && rm 'X'" &
done < $1
rtyley commented Dec 4, 2012

This script is run with git filter-branch like this:

git filter-branch --tree-filter '/home/roberto/guardian/ /home/roberto/guardian/top-50-biggest-blobs.txt $GIT_COMMIT' -- --all

rtyley commented Dec 4, 2012

Using a ramdisk on Ubuntu (big speed increase):

$ mkdir repo-in-ram
$ sudo mount -t tmpfs -o size=2048M tmpfs repo-in-ram
$ cd repo-in-ram

rtyley commented Feb 4, 2013

Since writing this gist I've created The BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch for cleansing bad data out of Git repository history:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

The BFG is 10 - 720x faster than git-filter-branch, turning an overnight job into one that takes less than ten minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.