Skip to content

Instantly share code, notes, and snippets.



Last active Aug 29, 2015
What would you like to do?
Ruby script to cleanup large files from your git repository
#!/usr/bin/env ruby
# Ruby shows you the largest objects in your git repo's pack file & offers you to remove them automatically
# Based on by Antony Stubbs
# Use this to fetch all branches locally first:
# for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master`; do
# git branch --track ${branch##*/} $branch
# done
require 'shellwords'
# list all objects including their size, sort by size, take top 20
puts " > Listing objects"
objects = `git verify-pack -v .git/objects/pack/pack-*.idx | grep -Ev "non delta|chain length|git/objects" | tr -s " " | sort -k3gr | head -n 20`
puts " > Building file index"
files = Hash[`git rev-list --all --objects` {|l| l.chomp.split(' ', 2)}]
to_remove = []
puts " > largest files:"
objects.each do |object|
sha, _, size, compressedSize = object.split(' ')
file = files[sha]
exists = File.file?(file)
puts " %5d kB (compressed: %5d kB) - %s%s" % [size.to_i/1024, compressedSize.to_i/1024, file, ("*" if exists)]
to_remove << file unless exists
print " > do you want to remove these #{to_remove.size} files (not present in current branch)? (y/N): "
if gets.chomp == 'y'
files = {|f| Shellwords.escape(f)}.join(' ')
cmd = "git filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch #{files}' --prune-empty -f -- --all"
puts " > #{cmd}"
system cmd
cmd = "rm -rf .git/refs/original && git reflog expire --expire=now --all && git gc --aggressive --prune=now"
puts " > #{cmd}"
system cmd
puts " > now you need to 'git push origin --force --all && git push origin --force --tags'"
puts " > and tell your teammates to 'git rebase'"
puts ' > ok, bye.'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment