Skip to content

Instantly share code, notes, and snippets.

@jspiro
Last active June 16, 2020 22:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jspiro/729e64e451d05ba3ce655a34cfb43d2a to your computer and use it in GitHub Desktop.
Save jspiro/729e64e451d05ba3ce655a34cfb43d2a to your computer and use it in GitHub Desktop.
horrific but working way to summarize file sizes in a directory for evaluating what's tracked by git lfs
$ find . -type f -name '*.*' |
sed 's/.*\.//' |
sort -u |
while read ext; do
(
gfind . -name "*$ext" -printf "%s\n" |
sed 's/$/+/' |
tr -d '\n'
echo 0
) | bc | tr -d '\n'
echo -e "\t$ext"
done
1003744 .exe
911328 .zip
806448 .dc
623008 .pck
564880 .mp4
497016 .d
351848 .efz
323112 .g
...
@jspiro
Copy link
Author

jspiro commented Jun 16, 2020

tr '[ \t\n]' ',' | sed -n -E -e 's!([^,]+),([^,]+),([^,]+),!\2%\1@!gp' | tr '%' '\t' | tr '@' '\n' | sort -rn is my way of working around sed's lack of... everything. The equivalent regex replace in a modern IDE is: s@^(\..*)\n(.*)\s+total$@$2\t$1

@kenthoward
Copy link

You can condense this part of your bash pipeline into a single sed command and avoid all the character translation (tr). Note, some extra complexity was required to avoid embedding a literal tab, which I know you hate. 😉

from

... | tr -d '[ \t]total' | tr '\n' ',' | sed -n -E -e 's!([^,]+),([^,]+),!\2%\1@!gp' | tr '%' '\t' | tr '@' '\n' | ...

to

... | sed -n -E "/^\\..+/ {
N
s/([^[:space:]]*)[[:space:]]*\\n([^[:space:]]*)[[:space:]]*total/\\2$(echo -ne '\t')\\1/
p
}" | ...

@jspiro
Copy link
Author

jspiro commented Jun 16, 2020

:trollface:

@jspiro
Copy link
Author

jspiro commented Jun 16, 2020

Credit for a WORKING solution to @kenthoward

@jspiro
Copy link
Author

jspiro commented Jun 16, 2020

Notes:

  • Switch to iname if case should be insensitive for extensions
  • But for Git LFS, case matters, so either lowercase all extensions, or make case insensitive LFS matchs (.[eE][xX][eE]), or:
- sort by size, find a cut off where files are to small to bother
- swap the file size with the file name using `s/(.*) (.*)/$2 $1/g`
- sort naturally
- visually find all related line endings and add each case to .gitattributes

@kenthoward
Copy link

One little bug which probably didn't affect your results. When simplifying the extension extraction to just sed 's/.*\.//' the gfind -name test should have been updated to include a literal period.

gfind . -name "*.$ext" -printf "%s\n"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment