Skip to content

Instantly share code, notes, and snippets.

@n8henrie
Last active August 3, 2022 20:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save n8henrie/a7c3b48eb971f662c03e9da17ecb9ea4 to your computer and use it in GitHub Desktop.
Save n8henrie/a7c3b48eb971f662c03e9da17ecb9ea4 to your computer and use it in GitHub Desktop.
Well, this ended up being easier than I'd expected to implement with coreutils.
Wrapped it up into a little script that sorts by count and removes anything with only 1 result (like files).
Should be pretty easy to also add in a `du -sh` to get sizes if one wanted. Currently it runs in <2s on that 500,000 line file on my M1 Mac. Sharing in case useful for anyone else.
```bash
#!/usr/bin/env bash
# treecount.sh https://gist.github.com/a7c3b48eb971f662c03e9da17ecb9ea4
#
# Given an input file of paths as $1, counts the number of subfiles for each
# directory Useful for determining what directories are most frequently changes
# and might be good candidates for exclusion for restic backups (like caches
# that don't have a `CACHEDIR.TAG`)
#
# USAGE: `$ ./treecount.sh changes.txt`
#
# changes.txt should be a list of file paths without duplicates (`sort -u` is
# your friend) no other content. For my use case, I use `restic snapshots` to
# get a list of snapshots, and with a little processing run `restic diff` on
# each of those snapshots to get a list of modified files. I then filter out
# lines that do not start with `-`, `+`, or `M` (which indicate removals,
# additions, and modifications, respectively) and then deduplicate the
# resulting output.
#
# By default anything with less than 2 results is not included in the output of
# this script.
#
# nb4 I know the grep and sort could be done in awk, but grep and sort sure
# make it easy, don't they?
set -Eeuf -o pipefail
shopt -s inherit_errexit
main() {
local infile=$1
awk < "${infile}" -F/ '{
path=""
for (idx=2; idx<=NF; idx++) {
path = path "/" $idx
paths[path]++
}
}
END {
for (path in paths) {
print paths[path], path
}
}' |
grep -v '^1 ' |
sort -n
}
main "$@"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment