Skip to content

Instantly share code, notes, and snippets.

@jesserobertson
Last active August 29, 2015 14:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jesserobertson/1ea9db79c207138094a8 to your computer and use it in GitHub Desktop.
Save jesserobertson/1ea9db79c207138094a8 to your computer and use it in GitHub Desktop.
Get the big blobs from your git repostory, in Python
import subprocess, os
def get_big_blobs(git_repo, nitems=10):
# Query the git repo for large items
os.chdir(git_repo)
big_blobs = subprocess.Popen(
('git verify-pack -v .git/objects/pack/pack-*.idx | '
'grep -v chain | sort -k3nr | head --lines={1} -').format(git_repo, int(nitems)),
shell=True,
stdout=subprocess.PIPE)
# Parse the result
for line in big_blobs.stdout:
# Line gives us sha, object, uncomp_size, comp_size etc...
tokens = line.split()
sha = tokens[0]
uncompressed_size = int(tokens[2]) / 1024
compressed_size = int(tokens[3]) / 1024
# Find the items in the git repository tree
item = subprocess.Popen('git rev-list --all --objects | grep {0}'.format(sha),
shell=True, stdout=subprocess.PIPE).stdout.read()
item = item.strip('\n').split()[1]
print uncompressed_size, item
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment