Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Comparing two files via MD5 hash on Amazon S3 using Ruby
require 'digest/md5'
require 'aws/s3'
#set your AWS credentials
AWS::S3::Base.establish_connection!(
:access_key_id => 'XXX',
:secret_access_key => 'XXX'
)
#get the S3 file (object)
object = AWS::S3::S3Object.find('02185773dcb5a468df6b.pdf', 'your_bucket')
#separate the etag object, and remove the extra quotations
etag = object.about['etag'].gsub('"', '')
#get the local file
f = '/Users/matt/Desktop/02185773dcb5a468df6b.pdf'
digest = Digest::MD5.hexdigest(File.read(f))
#lets see them both
puts digest + ' vs ' + etag
#a string comparison to finish it off
if digest.eql? etag
puts 'same file!'
else
puts 'different files.'
end
@nictrix

This comment has been minimized.

Copy link

commented Oct 17, 2013

Thanks this was helpful, I didn't know where the etag value was located in the object.

However, I did run into a problem with memory usage, if it's a large file you may want to use this instead:

digest = Digest::MD5.file(f).to_s

Keeps your ruby memory usage from growing exponentially

@Dan2552

This comment has been minimized.

Copy link

commented Feb 3, 2015

etag doesn't appear to always use the md5:

- (String) etag

Returns the object's ETag.

Generally the ETAG is the MD5 of the object. If the object was uploaded using multipart upload then this is the MD5 all of the upload-part-md5s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.