Skip to content

Instantly share code, notes, and snippets.

@punkdata
Last active September 25, 2022 00:36
Show Gist options
  • Save punkdata/16fd94e46345e62cfcbf to your computer and use it in GitHub Desktop.
Save punkdata/16fd94e46345e62cfcbf to your computer and use it in GitHub Desktop.
Git this script lists the files that are larger than the size you specify. Example use: python git-find-big-files.py fix-remove-files 1000000 (file size in MBs)
#!/usr/bin/python
# run the script: python git-find-big-files.py <the branch> <file size>
# Example use: python git-find-big-files.py fix-remove-files 1000000 (this value equals 1 Megabyte)
# the Fix-remove-files specifies the branch that you are cleaning
import os, sys
def getOutput(cmd):
return os.popen(cmd).read()
if (len(sys.argv) != 3):
print len(sys.argv)
print "usage: %s size_in_bytes" % sys.argv[0]
else:
maxSize = int(sys.argv[2])
revisions = getOutput("git rev-list HEAD").split()
bigfiles = set()
for revision in revisions:
files = getOutput("git ls-tree -zrl %s" % revision).split('\0')
for file in files:
if file == "":
continue
splitdata = file.split()
commit = splitdata[2]
if splitdata[3] == "-":
continue
size = int(splitdata[3])
path = splitdata[4]
if (size > maxSize):
bigfiles.add("%10d %s %s" % (size, commit, path))
bigfiles = sorted(bigfiles, reverse=True)
for f in bigfiles:
print f
#Run this command while inside the git repo & be sure to input the file path for every file you want removed from the git repo
git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch test- <Enter your file & path here>' --prune-empty --tag-name-filter cat -- --all
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment