Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Git this script lists the files that are larger than the size you specify. Example use: python git-find-big-files.py fix-remove-files 1000000 (file size in MBs)
#!/usr/bin/python
# run the script: python git-find-big-files.py <the branch> <file size>
# Example use: python git-find-big-files.py fix-remove-files 1000000 (this value equals 1 Megabyte)
# the Fix-remove-files specifies the branch that you are cleaning
import os, sys
def getOutput(cmd):
return os.popen(cmd).read()
if (len(sys.argv) != 3):
print len(sys.argv)
print "usage: %s size_in_bytes" % sys.argv[0]
else:
maxSize = int(sys.argv[2])
revisions = getOutput("git rev-list HEAD").split()
bigfiles = set()
for revision in revisions:
files = getOutput("git ls-tree -zrl %s" % revision).split('\0')
for file in files:
if file == "":
continue
splitdata = file.split()
commit = splitdata[2]
if splitdata[3] == "-":
continue
size = int(splitdata[3])
path = splitdata[4]
if (size > maxSize):
bigfiles.add("%10d %s %s" % (size, commit, path))
bigfiles = sorted(bigfiles, reverse=True)
for f in bigfiles:
print f
#Run this command while inside the git repo & be sure to input the file path for every file you want removed from the git repo
git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch test- <Enter your file & path here>' --prune-empty --tag-name-filter cat -- --all
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment