Skip to content

Instantly share code, notes, and snippets.

@rtyley
Created December 4, 2012 12:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rtyley/4203237 to your computer and use it in GitHub Desktop.
Save rtyley/4203237 to your computer and use it in GitHub Desktop.
Accepts a list of 'bad' blob ids, and replaces them with a 'xxx.REMOVED.sha' placeholder
#! /usr/bin/env sh
TREEDATA=$(git ls-tree -r $2 | grep ^.......blob | cut -c13-)
while IFS= read -r line ; do
echo "$TREEDATA" | grep ^$line | cut -c42- | xargs -n1 -iX sh -c "echo $line > 'X.REMOVED.sha' && rm 'X'" &
done < $1
wait
@rtyley
Copy link
Author

rtyley commented Dec 4, 2012

This script is run with git filter-branch like this:

git filter-branch --tree-filter '/home/roberto/guardian/replace-with-sha.sh /home/roberto/guardian/top-50-biggest-blobs.txt $GIT_COMMIT' -- --all

@rtyley
Copy link
Author

rtyley commented Dec 4, 2012

Using a ramdisk on Ubuntu (big speed increase):

$ mkdir repo-in-ram
$ sudo mount -t tmpfs -o size=2048M tmpfs repo-in-ram
$ cd repo-in-ram

@rtyley
Copy link
Author

rtyley commented Feb 4, 2013

Since writing this gist I've created The BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch for cleansing bad data out of Git repository history:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

The BFG is 10 - 720x faster than git-filter-branch, turning an overnight job into one that takes less than ten minutes.

http://rtyley.github.io/bfg-repo-cleaner/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment