Skip to content

Instantly share code, notes, and snippets.

@asheplyakov
Created August 19, 2019 19:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save asheplyakov/bc777c387073235e60b3877da45bb7b2 to your computer and use it in GitHub Desktop.
Save asheplyakov/bc777c387073235e60b3877da45bb7b2 to your computer and use it in GitHub Desktop.
#!/bin/sh
set -e
get_all_objects () {
git rev-list --all --objects | git cat-file --batch-check='%(objectname) %(objecttype) %(objectsize) %(rest)'
}
get_all_objects | awk '
BEGIN {
size_limit = 10*1024*1024;
}
$2 == "blob" {
c += 1; s += $3
if ($3 > size_limit) {
tops += $3;
topc += 1;
bf[$4] = $3;
}
}
END {
big_file_percentage = 100.0*tops/s;
s = s/(1024.0*1024.0);
tops = tops/(1024.0*1024.0);
printf "file count: %d\nfile size: %.1f MB\n", c, s
printf "big file count: %d\nbig file size: %.1f MB (%.1f%%)\n", topc, tops, big_file_percentage
printf "big file definition: size > %.1f MB\n", size_limit/(1024.0*1024.0)
for (name in bf)
printf "%-10d %s\n", bf[name]/1024, name | "sort -k1 -n -r"
close("sort -k1 -n -r")
}'
@asheplyakov
Copy link
Author

asheplyakov commented Aug 19, 2019

Find out which files occupy most of the disk space in the git repository.
Unlike du this explores the whole history (not just the current working copy)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment