Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A Bash script to compute ETag values for S3 multipart uploads on OS X.
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Usage: $0 file partSizeInMb";
exit 0;
fi
file=$1
if [ ! -f "$file" ]; then
echo "Error: $file not found."
exit 1;
fi
partSizeInMb=$2
fileSizeInMb=$(du -m "$file" | cut -f 1)
parts=$((fileSizeInMb / partSizeInMb))
if [[ $((fileSizeInMb % partSizeInMb)) -gt 0 ]]; then
parts=$((parts + 1));
fi
checksumFile=$(mktemp -t s3md5)
for (( part=0; part<$parts; part++ ))
do
skip=$((partSizeInMb * part))
$(dd bs=1m count=$partSizeInMb skip=$skip if="$file" 2>/dev/null | md5 >>$checksumFile)
done
echo $(xxd -r -p $checksumFile | md5)-$parts
rm $checksumFile
@jbekas

This comment has been minimized.

Copy link

@jbekas jbekas commented Dec 21, 2014

Awesome script! Thank you.

@Graham-M

This comment has been minimized.

Copy link

@Graham-M Graham-M commented Oct 2, 2015

Thanks for this, really helped me!

@bitwombat

This comment has been minimized.

Copy link

@bitwombat bitwombat commented Dec 17, 2015

Very cool. I have a patch to make it work for Linux, if that's of interest. I'll fork and PR if so.

One file I have doesn't match S3's MD5 sum, even after multiple downloads. Chunk size is rather big (512 MB).
Any ideas what this could be?

Not that a hash tells us much, but Amazon says its:
29fd5af267ee59b66273451bc0f549e8-2

Whereas your script says:
f209c8604d57297b0e06ca84fafeac00-2

File size is 609865657 bytes.

Different algorithm for big files? Doesn't really make sense.

@RichardBronosky

This comment has been minimized.

Copy link

@RichardBronosky RichardBronosky commented Jan 9, 2017

How do you know what part size was used/to use?
(Size: 9476171423 ETage: 44dab9123b49dab2c2b3b10c360ceda1-1130)

@komiyak

This comment has been minimized.

Copy link

@komiyak komiyak commented Aug 4, 2017

@RichardBronosky
I finally understand.
https://stackoverflow.com/questions/12186993/what-is-the-algorithm-to-compute-the-amazon-s3-etag-for-a-file-larger-than-5gb#answer-19896823

Note: If you uploaded with aws-cli via aws s3 cp then you most likely have a 8MB chunksize. According to the docs, that is the default.

We should use this, if uploaded with aws-cli via aws s3 cp.

$ ./s3etag.sh something.zip 8
@jocot

This comment has been minimized.

Copy link

@jocot jocot commented Apr 17, 2018

Thanks for this, it helped me validate a heap of files I had in S3.

Note that AWS S3 supports a maximum of 10,000 parts. I recently exceeded this on a project with a 54GB file (5MB part size). The AWS SDK adjusts the part size to fit 10,000 parts. I used this expression to get the right part size to calculate the ETag correctly, if you happen to exceed 10,000 parts. I also specified the partsize in bytes for better accuracy.

partsize = (filesize / 10000) + 1

@veenits

This comment has been minimized.

Copy link

@veenits veenits commented May 15, 2018

Thank you. This is helpful. Are there any alternatives for xxd on linux?

@cyb3rz3us

This comment has been minimized.

Copy link

@cyb3rz3us cyb3rz3us commented Feb 28, 2019

Awesome script - it doesn't work for SSE-KMS files so if you happen to uncover any intel on how AWS is generating the MD5 for that scenario, please share. Again, awesome job here.

@rfraimow

This comment has been minimized.

Copy link

@rfraimow rfraimow commented Dec 9, 2019

Thanks for the script, this is incredibly helpful and we're incorporating it into our workflows!

@skchronicles

This comment has been minimized.

Copy link

@skchronicles skchronicles commented Apr 7, 2020

Linux users

Here is an equivalent script if you are not using OSX. I hope this helps!

#!/bin/bash
set -euo pipefail
if [ $# -ne 2 ]; then
    echo "Usage: $0 file partSizeInMb";
    exit 0;
fi
file=$1
if [ ! -f "$file" ]; then
    echo "Error: $file not found." 
    exit 1;
fi
partSizeInMb=$2
fileSizeInMb=$(du -m "$file" | cut -f 1)
parts=$((fileSizeInMb / partSizeInMb))
if [[ $((fileSizeInMb % partSizeInMb)) -gt 0 ]]; then
    parts=$((parts + 1));
fi
checksumFile=$(mktemp -t s3md5.XXXXXXXXXXXXX)
for (( part=0; part<$parts; part++ ))
do
    skip=$((partSizeInMb * part))
    $(dd bs=1M count=$partSizeInMb skip=$skip if="$file" 2> /dev/null | md5sum >> $checksumFile)
done
etag=$(echo $(xxd -r -p $checksumFile | md5sum)-$parts | sed 's/ --/-/')
echo -e "${1}\t${etag}"
rm $checksumFile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.