Skip to content

Instantly share code, notes, and snippets.

@mihalicyn
Last active February 28, 2023 21:30
Show Gist options
  • Save mihalicyn/af8f30a55927e0e6d09bde6c7d770cce to your computer and use it in GitHub Desktop.
Save mihalicyn/af8f30a55927e0e6d09bde6c7d770cce to your computer and use it in GitHub Desktop.
#!/bin/sh
#
# This script may be useful when you want to ensure that
# some file tree that your copied from one machine to another one
# is not corrupted.
# In this program we calculate total size of all files in some directory
# (recursively) using du (-b means apparent size in bytes; st_size field
# struct stat used).
# And as another control method cksum program used, and with help of awk,
# we calculate checksum of all crc checksums that we got from cksum.
# And... also we count total size of files in bytes using data from cksum
# (just for sure). Of course, this number should be equal to the first one.
# Example:
# $ ./calc.sh /mnt/alex/Files
# 59994260119
# ed623
# 59994260119
#
# Implementation of hash() function in gawk was borrowed from
# https://riptutorial.com/awk/example/12547/computing-a-hash-of-a-string
# We use busybox here, because I've used this script on my NAS
# but, of course, you can simply set busybox_path="" and all should works just fine
# in standart linux/*bsd environment
busybox_path="busybox"
tmpfile=$(mktemp)
cat > $tmpfile <<'EOF'
BEGIN {
for(n=0;n<256;n++) {
ord[sprintf("%c",n)] = n
}
res = ""
total = 0;
}
function hash(text, _prime, _modulo, _ax, _chars, _i) {
_prime = 104729;
_modulo = 1048576;
_ax = 0;
split(text, _chars, "");
for (_i=1; _i <= length(text); _i++) {
_ax = (_ax * _prime + ord[_chars[_i]]) % _modulo;
};
return sprintf("%05x", _ax)
}
{
# print $1;
tmp = "$1";
res = hash(sprintf("%s%s", tmp, res));
total += $2;
}
END {
print res;
print total;
}
EOF
$busybox_path find "$@" -type f -exec $busybox_path du -sb {} \; |\
$busybox_path awk 'BEGIN {total = 0;} { total += $1; } END { print total; }'
$busybox_path find "$@" -type f -exec $busybox_path cksum {} \; |\
$busybox_path awk -f $tmpfile
rm $tmpfile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment