Skip to content

Instantly share code, notes, and snippets.

@ediweissmann
Created February 3, 2013 15:14
Show Gist options
  • Save ediweissmann/4702139 to your computer and use it in GitHub Desktop.
Save ediweissmann/4702139 to your computer and use it in GitHub Desktop.
Find big files in git repository. Usage: ruby big_file.rb [rev] [size in MB] $ ruby big_file.rb master 0.3
#!/usr/bin/env ruby -w
head, treshold = ARGV
head ||= 'HEAD'
Megabyte = 1000 ** 2
treshold = (treshold || 0.1).to_f * Megabyte
big_files = {}
IO.popen("git rev-list #{head}", 'r') do |rev_list|
rev_list.each_line do |commit|
commit.chomp!
for object in `git ls-tree -zrl #{commit}`.split("\0")
bits, type, sha, size, path = object.split(/\s+/, 5)
size = size.to_i
big_files[sha] = [path, size, commit] if size >= treshold
end
end
end
big_files.each do |sha, (path, size, commit)|
where = `git show -s #{commit} --format='%h: %cr'`.chomp
puts "%4.1fM\t%s\t(%s)" % [size.to_f / Megabyte, path, where]
end
@bienstock
Copy link

I suggest a minor fix, to be able to run the file from anywhere in the filesystem on a specific repo:

#!/usr/bin/env ruby -w
head, treshold, repo = ARGV
head ||= 'HEAD'
GIT_DIR = repo || Dir.pwd
Megabyte = 1000 ** 2
treshold = (treshold || 0.1).to_f * Megabyte

big_files = {}

IO.popen("git --git-dir=#{GIT_DIR}/.git rev-list #{head}", 'r') do |rev_list|
  rev_list.each_line do |commit|
    commit.chomp!
    for object in `git --git-dir=#{GIT_DIR}/.git ls-tree -zrl #{commit}`.split("\0")
      bits, type, sha, size, path = object.split(/\s+/, 5)
      size = size.to_i
      big_files[sha] = [path, size, commit] if size >= treshold
    end
  end
end

big_files.each do |sha, (path, size, commit)|
  where = `git --git-dir=#{GIT_DIR}/.git show -s #{commit} --format='%h: %cr'`.chomp
  puts "%4.1fM\t%s\t(%s)" % [size.to_f / Megabyte, path, where]
end

The new usage will be :

big_file.rb [rev] [size in MB] [repository_directory(absolute/relative)]

If [repository_directory] is empty, current direcory's absoulte path is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment