Created
September 28, 2016 00:20
-
-
Save mattlong/d5c97c3829a24a19dcef1e65e00c37c1 to your computer and use it in GitHub Desktop.
Purge git history of old files
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# From http://stackoverflow.com/questions/17901588/new-repo-with-copied-history-of-only-currently-tracked-files | |
Delete everything and restore what you want | |
Rather than delete this-list-of-files one at a time, do the almost-opposite, delete everything and just restore the files you want to keep: | |
$ git checkout master | |
$ git ls-files > keep-these.txt | |
$ git filter-branch --force --index-filter \ | |
"git rm --ignore-unmatch --cached -qr . ; \ | |
cat $PWD/keep-these.txt | xargs git reset -q \$GIT_COMMIT --" \ | |
--prune-empty --tag-name-filter cat -- --all | |
It may be faster to execute. | |
Cleanup steps | |
Once the whole process has finished, then cleanup: | |
$ rm -rf .git/refs/original/ | |
$ git reflog expire --expire=now --all | |
$ git gc --prune=now | |
# optional extra gc. Slow and may not further-reduce the repo size | |
$ git gc --aggressive --prune=now | |
Comparing the repository size before and after, should indicate a sizable reduction, and of course only commits that touch the kept files, plus merge commits - even if empty (because that's how --prune-empty works), will be in the history. | |
$GIT_COMMIT? | |
The use of $GIT_COMMIT seems to have caused some confusion, from the git filter-branch documentation (emphasis added): | |
The argument is always evaluated in the shell context using the eval command (with the notable exception of the commit filter, for technical reasons). Prior to that, the $GIT_COMMIT environment variable will be set to contain the id of the commit being rewritten. | |
That means git filter-branch will provide the variable at run time, it's not provided by you before hand. This can be demonstrated if there's any doubt using this no-op filter branch command: | |
$ git filter-branch --index-filter "echo current commit is \$GIT_COMMIT" | |
Rewrite d832800a85be9ef4ee6fda2fe4b3b6715c8bb860 (1/xxxxx)current commit is d832800a85be9ef4ee6fda2fe4b3b6715c8bb860 | |
Rewrite cd86555549ac17aeaa28abecaf450b49ce5ae663 (2/xxxxx)current commit is cd86555549ac17aeaa28abecaf450b49ce5ae663 | |
... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment