Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
boto3 S3 Multipart Upload
import argparse
import os
import boto3
class S3MultipartUpload(object):
# AWS throws EntityTooSmall error for parts smaller than 5 MB
PART_MINIMUM = int(5e6)
def __init__(self,
self.bucket = bucket
self.key = key
self.path = local_path
self.total_bytes = os.stat(local_path).st_size
self.part_bytes = part_size
assert part_size > self.PART_MINIMUM
assert (self.total_bytes % part_size == 0
or self.total_bytes % part_size > self.PART_MINIMUM)
self.s3 = boto3.session.Session(
profile_name=profile_name, region_name=region_name).client("s3")
if verbose:
def abort_all(self):
mpus = self.s3.list_multipart_uploads(Bucket=self.bucket)
aborted = []
print("Aborting", len(mpus), "uploads")
if "Uploads" in mpus:
for u in mpus["Uploads"]:
upload_id = u["UploadId"]
Bucket=self.bucket, Key=self.key, UploadId=upload_id))
return aborted
def create(self):
mpu = self.s3.create_multipart_upload(Bucket=self.bucket, Key=self.key)
mpu_id = mpu["UploadId"]
return mpu_id
def upload(self, mpu_id):
parts = []
uploaded_bytes = 0
with open(self.path, "rb") as f:
i = 1
while True:
data =
if not len(data):
part = self.s3.upload_part(
Body=data, Bucket=self.bucket, Key=self.key, UploadId=mpu_id, PartNumber=i)
parts.append({"PartNumber": i, "ETag": part["ETag"]})
uploaded_bytes += len(data)
print("{0} of {1} uploaded ({2:.3f}%)".format(
uploaded_bytes, self.total_bytes,
as_percent(uploaded_bytes, self.total_bytes)))
i += 1
return parts
def complete(self, mpu_id, parts):
result = self.s3.complete_multipart_upload(
MultipartUpload={"Parts": parts})
return result
# Helper
def as_percent(num, denom):
return float(num) / float(denom) * 100.0
def parse_args():
parser = argparse.ArgumentParser(description='Multipart upload')
parser.add_argument('--bucket', required=True)
parser.add_argument('--key', required=True)
parser.add_argument('--path', required=True)
parser.add_argument('--region', default="eu-west-1")
parser.add_argument('--profile', default=None)
return parser.parse_args()
def main():
args = parse_args()
mpu = S3MultipartUpload(
# abort all multipart uploads for this bucket (optional, for starting over)
# create new multipart upload
mpu_id = mpu.create()
# upload parts
parts = mpu.upload(mpu_id)
# complete multipart upload
print(mpu.complete(mpu_id, parts))
if __name__ == "__main__":
Copy link

teasherm commented Mar 1, 2017

had to roll a custom multipart upload (awscli erroring out on long upload with faulty network), and found boto3 multipart upload poorly documented so storing example code here

Copy link

lbrent2k commented Dec 27, 2018

We are working off your code for a Lambda function that pulls data from an FTP site, caches in memory and uploads chunks in a multipart, based on your code. We notice that for large files (1GB for eg) the upload process repeats. We are thinking maybe a part fails?

Copy link

amiantos commented Feb 28, 2019

Thanks a lot for this gist, it has been a fantastic resource for me.

Copy link

rfschroeder commented Jul 24, 2019

Great job! It will be very helpful to me.

Copy link

aarongooch commented Dec 18, 2019

super, A+++, 👍
used this script to upload 50GB files from a kube pod to S3.

Copy link

holyjak commented Jan 6, 2020

Thanks a lot, this has been most useful!

I have created a modified version able to resume the upload after a failure, useful if the network fails or your session credentials expire.

Copy link

shentonfreude commented Jan 31, 2020

This is super helpful and very clean, thanks.
However, while searching for this, I also found a dead simple way of doing it where you can force multipart by setting a size threshold, in the AWS docs, just 2 lines of code:

Copy link

vsoch commented Apr 5, 2020

How would this be modified to generated a presigned URL? I'm able to generate one, but it has a signature verification error, so I'm thinking that I'm missing something that sets the algorithm / version. Here are details if anyone can help! I'm trying to use the s3 boto3 client for a minio server for multipart upload with a presigned url because the minio-py doesn't support that.

Update - I think I figured out how to add the key - the config parameter below is newly added

from botocore.client import Config
s3_external = session.client(

The signed url generated now has the (previously missing) algorithm, etc. headers, however the signature doesn't match, so I'm wondering if the key generated by the client (Singularity / Sylabs scs-library-client) is different than what I am specifying - that almost must be it...

Update: i think the issue is that the signature includes the host, which is different inside (minio:9000) as opposed to outside ( the container, reading this post. boto/boto3#1982 (comment)

Copy link

balukrishnans commented May 15, 2020

Nice brother, great great job

Copy link

vsoch commented May 15, 2020

@balukrishnans are you talking to me or @teasherm? To follow up with my question above for future lurkers, it was a non-trivial thing that wound up needing a PR to the Minio Python client. Details about my particular implementation are here. And if you are referencing @teasherm, I agree, great job and thank you for posting this!

Copy link

woodsyb commented Aug 30, 2020

Thank you for writing/posting this. I'm pretty sure this is the only way to nicely do a multipart and also have the ability to have amazon verify the md5-sum(if you add that bit to the upload that is). One point:

assert (self.total_bytes % part_size == 0 or self.total_bytes % part_size > self.PART_MINIMUM)

isn't quite right thought as the last part can certainly be under the aws minimum for part you can verify that the cli does this often by verifying the etag against the combined md5 of each part.

Copy link

OmarAlashqar commented Nov 12, 2020

This is a gem. It's crazy how there's barely any documentation on this stuff 💎

Copy link

ybonda commented Mar 4, 2021

This is a gem. It's crazy how there's barely any documentation on this stuff 💎

Exactly! I dug tons of explanations and code samples until I found this one: Python S3 Multipart File Upload with Metadata and Progress Indicator

Copy link

baryluk commented Apr 21, 2022

Easy. Thanks for a nice example. Code is cleaner than documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment