Created

Embed URL

HTTPS clone URL

SSH clone URL

You can clone with HTTPS or SSH.

Download Gist

Accepts a list of 'bad' blob ids, and replaces them with a 'xxx.REMOVED.sha' placeholder

View replace-with-sha.sh
1 2 3 4 5 6 7 8 9
#! /usr/bin/env sh
 
TREEDATA=$(git ls-tree -r $2 | grep ^.......blob | cut -c13-)
 
while IFS= read -r line ; do
echo "$TREEDATA" | grep ^$line | cut -c42- | xargs -n1 -iX sh -c "echo $line > 'X.REMOVED.sha' && rm 'X'" &
done < $1
 
wait
Owner

This script is run with git filter-branch like this:

git filter-branch --tree-filter '/home/roberto/guardian/replace-with-sha.sh /home/roberto/guardian/top-50-biggest-blobs.txt $GIT_COMMIT' -- --all

Owner

Using a ramdisk on Ubuntu (big speed increase):

$ mkdir repo-in-ram
$ sudo mount -t tmpfs -o size=2048M tmpfs repo-in-ram
$ cd repo-in-ram

Owner

Since writing this gist I've created The BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch for cleansing bad data out of Git repository history:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

The BFG is 10 - 720x faster than git-filter-branch, turning an overnight job into one that takes less than ten minutes.

http://rtyley.github.io/bfg-repo-cleaner/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.