Skip to content
Create a gist now

Instantly share code, notes, and snippets.

Embed URL


Subversion checkout URL

You can clone with
Download ZIP
Accepts a list of 'bad' blob ids, and replaces them with a 'xxx.REMOVED.sha' placeholder
#! /usr/bin/env sh
TREEDATA=$(git ls-tree -r $2 | grep ^.......blob | cut -c13-)
while IFS= read -r line ; do
echo "$TREEDATA" | grep ^$line | cut -c42- | xargs -n1 -iX sh -c "echo $line > 'X.REMOVED.sha' && rm 'X'" &
done < $1

This script is run with git filter-branch like this:

git filter-branch --tree-filter '/home/roberto/guardian/ /home/roberto/guardian/top-50-biggest-blobs.txt $GIT_COMMIT' -- --all


Using a ramdisk on Ubuntu (big speed increase):

$ mkdir repo-in-ram
$ sudo mount -t tmpfs -o size=2048M tmpfs repo-in-ram
$ cd repo-in-ram


Since writing this gist I've created The BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch for cleansing bad data out of Git repository history:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

The BFG is 10 - 720x faster than git-filter-branch, turning an overnight job into one that takes less than ten minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.