Skip to content

Instantly share code, notes, and snippets.

@joshuadfranklin
Last active May 19, 2021 00:43
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save joshuadfranklin/5130355 to your computer and use it in GitHub Desktop.
Save joshuadfranklin/5130355 to your computer and use it in GitHub Desktop.
#!/usr/bin/python
import boto
import math
# Use boto to Copy an Object greater than 5 GB Using S3 Multipart Upload API
# probably could be made more pythonesque, based directly off the AWS Java example
# Copy an Object [greater than 5 GB] Using the AWS SDK for Java [S3] Multipart Upload API
# http://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjctsUsingLLJavaMPUapi.html
# copy in same bucket as a simple test
bucket_name = 'btest1234'
source_bucket = bucket_name
destination_bucket = bucket_name
orig_key_name = 'foo.gz'
dest_key_name = 'copy' + orig_key_name
s3 = boto.connect_s3(debug=1)
sb = s3.get_bucket(source_bucket)
ky = sb.lookup(orig_key_name)
objectSize = ky.size
print "found objectSize of %d" % objectSize
b = s3.get_bucket(destination_bucket)
mp = b.initiate_multipart_upload(dest_key_name, reduced_redundancy=True)
psize = 50 * math.pow(2.0, 20.0) # 2^20 = 1 MiB
bytePosition = 0
i = 1
while bytePosition < objectSize:
lastbyte = bytePosition + psize -1
if lastbyte > objectSize:
lastbyte = objectSize - 1
print "mp.copy_part_from_key part %d (%d %d)" % (i,bytePosition,lastbyte)
mp.copy_part_from_key(source_bucket, orig_key_name, i, int(bytePosition),int(lastbyte))
i = i+1
bytePosition += psize
mp.complete_upload()
print "done"
@davetarbox
Copy link

I found this useful, but shouldn't line 32 test for ">=" instead of just ">" ?
For psize = 50 and objectSize= 99, the current logic would transfer bytes 0-49 as the first part, and bytes 50-99 as the second part; but there is no byte 99.
I haven't read th ecode for copy_part_from_key; I expect it defends against this case.

@russellpierce
Copy link

get_key is now prefered over lookup. I'm given to understand that get_key uses a HEAD which is a bit cheaper and faster.

@bronius
Copy link

bronius commented Mar 23, 2017

Old post, I know, but AWS supports simple copy (S3.Client.copy) with built-in multipart when necessary. See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.copy
Thanks for your original share here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment