Skip to content

Instantly share code, notes, and snippets.

@pedro
Created November 13, 2014 21:46
Show Gist options
  • Save pedro/8d425dfec3f2e00624ce to your computer and use it in GitHub Desktop.
Save pedro/8d425dfec3f2e00624ce to your computer and use it in GitHub Desktop.
Maximizing S3 Peformance

Maximizing S3 Peformance

  • 137% increase in AWS usage in the last year

  • Factors to keep in mind when picking a region

    • Performance: Proximity to users and to other resources in AWS
    • Compliance
    • Cost
  • Naming scheme is the most important thing

    • Necessary for consistent performance
    • Refer to their guide on Request Rate and Performance Considerations
    • Do not set a common prefix to all objects in the bucket (eg: 2013-01-02-myfile)
    • Instead make sure names have something random so transactions can be distributed evenly across partitions
      • Eg: hash the object name, or prefix it with a random sequence
      • You can also reverse the epoch
    • Use "folders" if you need more information on the original object name (eg: movies/)
    • When changing object names to comply, keep in mind it might take some time until the benefits apply
  • TPS: transactions per second

    • All writes/reads count. Keep in mind simple apps doing uploads/image processing could rack several of TPS per user input
  • Optimizing PUTs with multipart uploads

    • Gives you better performance as you use more bandwidth
    • Increases resiliency: do not upload everything again on errors
    • The final result is a single file in S3
    • Picking the size of each part:
      • 25-50MB recommended on high bandwidth, 10mb for mobile networks
      • Strike for a balance between part size and number of parts
        • Lots of small parts suffers with connection overhead
        • Few big parts doesn't give you the benefits above
    • When using SSL to upload watch out for CPU performance
      • Use AES-NI on hardware when available
  • Optimizing GETs

    • Use CloudFront
      • Lower latency, higher transfer rate, less S3 gets
      • Also supports on demand video
    • Do range-based GETs to read files in parallel, getting similar benefits to multipart uploads
    • LIST can also be heavy/slow. Cache your object list in Dynamo/CloudSearch/RDS/etc to avoid it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment