Skip to content

Instantly share code, notes, and snippets.

@pwillis-els
Created July 17, 2020 16:29
Show Gist options
  • Save pwillis-els/d4b5cb29e6b91b7862b5a5b31744e008 to your computer and use it in GitHub Desktop.
Save pwillis-els/d4b5cb29e6b91b7862b5a5b31744e008 to your computer and use it in GitHub Desktop.
Optimizing S3 file transfer speed

Optimizing S3 file transfer speed

There is a nifty tool https://github.com/larrabee/s3sync which can transfer files very quickly with S3, but it reads all files into memory. If you're transferring lots of large files this process will quickly get killed by the kernel due to OOM.

The next best thing I have found is to tweak the AWS CLI to use aws s3 sync as fast as possible.

# ~/.aws/config
[default]
s3 =
  # 500 is usable number if you're only running one process,
  # but 100 is more reasonable is you're running multiple
  max_concurrent_requests = 100

  max_queue_size = 10000
  multipart_threshold = 64MB
  multipart_chunksize = 32MB

aws s3 sync s3://some-bucket/some-directory /data/some-directory &
aws s3 sync s3://some-bucket/some-other-dir /data/some-other-dir &

On a t2.large running this configuration, I get about 70MiB/s. It will incur some serious load (~30), so pick an instance that gives you the most sustained CPU for better performance, or you'll get a ton of CPU steal as the instance runs out of credits. It uses very little memory comparatively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment