Skip to content

Instantly share code, notes, and snippets.

@Maxattax97
Forked from magnetikonline/README.md
Last active November 13, 2023 12:10
Show Gist options
  • Save Maxattax97/f566fdf67ac4ad2492ea1c732f5afdda to your computer and use it in GitHub Desktop.
Save Maxattax97/f566fdf67ac4ad2492ea1c732f5afdda to your computer and use it in GitHub Desktop.
List all Git repository objects by size.

List all Git repository objects by size

Summary

Bash script to:

  • Iterate all commits made within a Git repository.
  • List every object at each commit.
  • Order unique objects in descending size order.

Useful for removing large resources from a Git repository, for instance with migrations into GitHub where individual objects are limited to 100MB maximum.

Example

$ ./gitlistobjectbysize.sh

100644 blob de6bdeaefebec0bff53d4859833caddba635609c    123452290	something/really/large.iso
100644 blob 946488f3c2ab8abf5d36b88f9018af77dceda12d         2290	path/to/script.js
100644 blob 2e234e61460f2fa087f9aebbfee2f6b524bc38fe         1724	README.md
100644 blob 1807d789603ae1038985f76c54e6de3b093da761         1710	README.md
100644 blob 7b5071e880f1abed9191fb34425157901c0a51a7         1083	LICENSE
100755 blob ef377e40d54365c814b9324ab4001455f4b5d4d8          651	bashscript.sh
100644 blob 08ca429f5434247f12f503dd69df244399d4ef83           19	.gitignore
100644 blob 8a52f946a9aed2c242cbe8891b3510f750527bb2           18	.gitignore

If we now wish to remove something/really/large.iso we can rewrite history using git filter-branch:

$ git filter-branch \
	--tree-filter "rm -f something/really/large.iso" \
	-- --all

Ref 'refs/heads/master' was rewritten
#!/bin/bash -e
# work over each commit and append all files in tree to $tempFile
tempFile=$(mktemp)
IFS=$'\n'
for commitSHA1 in $(git rev-list --all); do
git ls-tree -r --long "$commitSHA1" >>"$tempFile"
done
# sort files by SHA1, de-dupe list and finally re-sort by filesize
sort --key 3 "$tempFile" | \
uniq | \
sort --key 4 --numeric-sort --reverse | \
awk '!visited[$5]++' | \
awk '{ split( "B KiB MiB GiB" , v ); s=1; while( $4>1024 ){ $4/=1024; s++ } print int($4) " " v[s] "\t\t" $5 }' | \
tac
# remove temp file
rm "$tempFile"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment