Skip to content

Instantly share code, notes, and snippets.

@emersonf
Last active May 16, 2024 12:30
Show Gist options
  • Save emersonf/7413337 to your computer and use it in GitHub Desktop.
Save emersonf/7413337 to your computer and use it in GitHub Desktop.
A Bash script to compute ETag values for S3 multipart uploads on OS X.
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Usage: $0 file partSizeInMb";
exit 0;
fi
file=$1
if [ ! -f "$file" ]; then
echo "Error: $file not found."
exit 1;
fi
partSizeInMb=$2
fileSizeInMb=$(du -m "$file" | cut -f 1)
parts=$((fileSizeInMb / partSizeInMb))
if [[ $((fileSizeInMb % partSizeInMb)) -gt 0 ]]; then
parts=$((parts + 1));
fi
checksumFile=$(mktemp -t s3md5)
for (( part=0; part<$parts; part++ ))
do
skip=$((partSizeInMb * part))
$(dd bs=1m count=$partSizeInMb skip=$skip if="$file" 2>/dev/null | md5 >>$checksumFile)
done
echo $(xxd -r -p $checksumFile | md5)-$parts
rm $checksumFile
@shntnu
Copy link

shntnu commented Jul 20, 2022

Thank you @skchronicles

I think there's an error in the parts calculations, now fixed below

https://gist.github.com/emersonf/7413337?permalink_comment_id=3244707#gistcomment-3244707

#!/bin/bash
set -euo pipefail
if [ $# -ne 2 ]; then
    echo "Usage: $0 file partSizeInMb";
    exit 0;
fi
file=$1
if [ ! -f "$file" ]; then
    echo "Error: $file not found." 
    exit 1;
fi
partSizeInMb=$2
partSizeInB=$((partSizeInMb * 1024 * 1024)) ### I added this
fileSizeInB=$(du -b "$file" | cut -f 1) ### I edited this
parts=$((fileSizeInB / partSizeInB)) ### I edited this and the next line
if [[ $((fileSizeInB % partSizeInB)) -gt 0 ]]; then 
    parts=$((parts + 1));
fi
checksumFile=$(mktemp -t s3md5.XXXXXXXXXXXXX)
for (( part=0; part<$parts; part++ ))
do
    skip=$((partSizeInMb * part))
    $(dd bs=1M count=$partSizeInMb skip=$skip if="$file" 2> /dev/null | md5sum >> $checksumFile)
done
etag=$(echo $(xxd -r -p $checksumFile | md5sum)-$parts | sed 's/ --/-/')
echo -e "${1}\t${etag}"
rm $checksumFile

@rajivnarayan
Copy link

rajivnarayan commented Jul 23, 2022

Thanks, this is quite useful.

I modified the script to speedup the hash computation and avoid generating temporary files. Link to script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment