Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Accepts a list of 'bad' blob ids, and replaces them with a 'xxx.REMOVED.sha' placeholder
#! /usr/bin/env sh
TREEDATA=$(git ls-tree -r $2 | grep ^.......blob | cut -c13-)
while IFS= read -r line ; do
echo "$TREEDATA" | grep ^$line | cut -c42- | xargs -n1 -iX sh -c "echo $line > 'X.REMOVED.sha' && rm 'X'" &
done < $1

rtyley commented Dec 4, 2012

This script is run with git filter-branch like this:

git filter-branch --tree-filter '/home/roberto/guardian/ /home/roberto/guardian/top-50-biggest-blobs.txt $GIT_COMMIT' -- --all


rtyley commented Dec 4, 2012

Using a ramdisk on Ubuntu (big speed increase):

$ mkdir repo-in-ram
$ sudo mount -t tmpfs -o size=2048M tmpfs repo-in-ram
$ cd repo-in-ram


rtyley commented Feb 4, 2013

Since writing this gist I've created The BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch for cleansing bad data out of Git repository history:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

The BFG is 10 - 720x faster than git-filter-branch, turning an overnight job into one that takes less than ten minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment