Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@victor-homyakov
Created February 20, 2015 09:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save victor-homyakov/690cd2991c77539ca4fe to your computer and use it in GitHub Desktop.
Save victor-homyakov/690cd2991c77539ca4fe to your computer and use it in GitHub Desktop.
Find big files in git history that don't exist in head. Usage: `ruby find-big-files-not-in-head.rb [size in MB] [rev]`
#!/usr/bin/env ruby -w
treshold, head = ARGV
head ||= 'HEAD'
Megabyte = 1000 ** 2
treshold = (treshold || 0.1).to_f * Megabyte
big_files = {}
commit_number = 0
commits_count = 0
IO.popen("git rev-list #{head}", 'r') do |rev_list|
commits_count = rev_list.each_line.count
puts "%s commits" % [commits_count]
end
IO.popen("git rev-list #{head}", 'r') do |rev_list|
rev_list.each_line do |commit|
commit_number += 1
if commit_number >= 1000 then
break
end
commit.chomp!
puts "commit %6d / %d %s" % [commit_number, commits_count, commit]
for object in `git ls-tree -zrl #{commit}`.split("\0")
bits, type, sha, size, path = object.split(/\s+/, 5)
size = size.to_i
if size > treshold then
if big_files.has_key? sha and big_files[sha][0] != path then
warn "Another path for #{sha} is #{path}"
else
big_files[sha] = [path, size, commit]
end
end
end
end
end
big_files.each do |sha, (path, size, commit)|
# where = `git show -s #{commit} --format='%h: %cr'`.chomp
# fatal: ambiguous argument '%cr'': unknown revision or path not in the working tree.
where = `git show -s #{commit} --format=oneline`.chomp
puts "%4.1fM\t%s\t%s" % [size.to_f / Megabyte, path, where]
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment