Skip to content

Instantly share code, notes, and snippets.

@rodluger
Last active June 20, 2018 17:40
Show Gist options
  • Save rodluger/5b82903749720abe568ee7effb54c17d to your computer and use it in GitHub Desktop.
Save rodluger/5b82903749720abe568ee7effb54c17d to your computer and use it in GitHub Desktop.
Removing large files from git history

Cleaning the vplanet repo

Most of what we need to know is here. First we download BFG and create the alias

alias bfg='java -jar bfg-1.13.0.jar'

Note that we need the latest version of the Java Runtime Environment installed.

Let's create a mirror (bare clone) of the bitbucket repo:

git clone --mirror https://bitbucket.org/bitbucket_vpl/vplanet.git

This might take a while! Our repo is pretty big. When that's done, cd into vplanet.git. I found that I had to run

git gc

before doing anything else to force git to re-index the repo. Now we can run the commands described in the examples here. For instance, to remove all files larger than 100 MB from the history, cd out of the repository and run

bfg --strip-blobs-bigger-than 100M vplanet.git

All this did was to flag the offending files -- nothing was deleted. You can check the logs to ensure that nothing bad happened. Once you're happy, run

cd vplanet.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive

and then, the super dangerous and final step,

git push
@rodluger
Copy link
Author

rodluger commented Jun 20, 2018

Things I've been finding along the way:

  • There are several large files (tens to hundreds of MB) called prob_space.txt in git history
  • There's a file called 4.inv that's 82 MB
  • Why is vollay.h 41 MB??
  • There are a couple photoshop *.ps files that amount to ~20 MB
  • There are several files called plan*.dat that are a few MB each
  • I committed a bunch of MCMC outputs in the form *.mcmc.npz that are 20 MB each and can be safely deleted
  • There are dozens of *.png files that can probably be deleted
  • There are 130 *.pdf files that can probably be deleted
  • We can delete palatino-linotype.zip (1.2 MB)
  • The executable vplanet has been committed and modified over a dozen times, and that's adding ~30 MB to the repo size
  • Everything under examples/spinbody/HNBody_Comparison in the Families branch

@rodluger
Copy link
Author

rodluger commented Jun 20, 2018

Important: I just realized that by default BFG protects files present in the current commit on master, but not on any of the other branches. So we really should try to get as much as possible onto master before we do this. We can tell BFG to protect files on other branches, but there's going to be so much junk there...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment