Skip to content

Instantly share code, notes, and snippets.

@mattlong
Created September 28, 2016 00:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mattlong/d5c97c3829a24a19dcef1e65e00c37c1 to your computer and use it in GitHub Desktop.
Save mattlong/d5c97c3829a24a19dcef1e65e00c37c1 to your computer and use it in GitHub Desktop.
Purge git history of old files
# From http://stackoverflow.com/questions/17901588/new-repo-with-copied-history-of-only-currently-tracked-files
Delete everything and restore what you want
Rather than delete this-list-of-files one at a time, do the almost-opposite, delete everything and just restore the files you want to keep:
$ git checkout master
$ git ls-files > keep-these.txt
$ git filter-branch --force --index-filter \
"git rm --ignore-unmatch --cached -qr . ; \
cat $PWD/keep-these.txt | xargs git reset -q \$GIT_COMMIT --" \
--prune-empty --tag-name-filter cat -- --all
It may be faster to execute.
Cleanup steps
Once the whole process has finished, then cleanup:
$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --prune=now
# optional extra gc. Slow and may not further-reduce the repo size
$ git gc --aggressive --prune=now
Comparing the repository size before and after, should indicate a sizable reduction, and of course only commits that touch the kept files, plus merge commits - even if empty (because that's how --prune-empty works), will be in the history.
$GIT_COMMIT?
The use of $GIT_COMMIT seems to have caused some confusion, from the git filter-branch documentation (emphasis added):
The argument is always evaluated in the shell context using the eval command (with the notable exception of the commit filter, for technical reasons). Prior to that, the $GIT_COMMIT environment variable will be set to contain the id of the commit being rewritten.
That means git filter-branch will provide the variable at run time, it's not provided by you before hand. This can be demonstrated if there's any doubt using this no-op filter branch command:
$ git filter-branch --index-filter "echo current commit is \$GIT_COMMIT"
Rewrite d832800a85be9ef4ee6fda2fe4b3b6715c8bb860 (1/xxxxx)current commit is d832800a85be9ef4ee6fda2fe4b3b6715c8bb860
Rewrite cd86555549ac17aeaa28abecaf450b49ce5ae663 (2/xxxxx)current commit is cd86555549ac17aeaa28abecaf450b49ce5ae663
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment