|if [ $# -ne 2 ]; then|
|echo "Usage: $0 file partSizeInMb";|
|if [ ! -f "$file" ]; then|
|echo "Error: $file not found."|
|fileSizeInMb=$(du -m "$file" | cut -f 1)|
|parts=$((fileSizeInMb / partSizeInMb))|
|if [[ $((fileSizeInMb % partSizeInMb)) -gt 0 ]]; then|
|parts=$((parts + 1));|
|checksumFile=$(mktemp -t s3md5)|
|for (( part=0; part<$parts; part++ ))|
|skip=$((partSizeInMb * part))|
|$(dd bs=1m count=$partSizeInMb skip=$skip if="$file" 2>/dev/null | md5 >>$checksumFile)|
|echo $(xxd -r -p $checksumFile | md5)-$parts|
Very cool. I have a patch to make it work for Linux, if that's of interest. I'll fork and PR if so.
One file I have doesn't match S3's MD5 sum, even after multiple downloads. Chunk size is rather big (512 MB).
Not that a hash tells us much, but Amazon says its:
Whereas your script says:
File size is 609865657 bytes.
Different algorithm for big files? Doesn't really make sense.
We should use this, if uploaded with
Thanks for this, it helped me validate a heap of files I had in S3.
Note that AWS S3 supports a maximum of 10,000 parts. I recently exceeded this on a project with a 54GB file (5MB part size). The AWS SDK adjusts the part size to fit 10,000 parts. I used this expression to get the right part size to calculate the ETag correctly, if you happen to exceed 10,000 parts. I also specified the partsize in bytes for better accuracy.
partsize = (filesize / 10000) + 1