pedro/s3-performance.md

## s3-performance.md

      
    Raw
  

              s3-performance.md
            
          
    Maximizing S3 Peformance


137% increase in AWS usage in the last year


Factors to keep in mind when picking a region

Performance: Proximity to users and to other resources in AWS
Compliance
Cost


Naming scheme is the most important thing

Necessary for consistent performance
Refer to their guide on Request Rate and Performance Considerations
Do not set a common prefix to all objects in the bucket (eg: 2013-01-02-myfile)
Instead make sure names have something random so transactions can be distributed evenly across partitions

Eg: hash the object name, or prefix it with a random sequence
You can also reverse the epoch


Use "folders" if you need more information on the original object name (eg: movies/)
When changing object names to comply, keep in mind it might take some time until the benefits apply


TPS: transactions per second

All writes/reads count. Keep in mind simple apps doing uploads/image processing could rack several of TPS per user input


Optimizing PUTs with multipart uploads

Gives you better performance as you use more bandwidth
Increases resiliency: do not upload everything again on errors
The final result is a single file in S3
Picking the size of each part:

25-50MB recommended on high bandwidth, 10mb for mobile networks
Strike for a balance between part size and number of parts

Lots of small parts suffers with connection overhead
Few big parts doesn't give you the benefits above


When using SSL to upload watch out for CPU performance

Use AES-NI on hardware when available


Optimizing GETs

Use CloudFront

Lower latency, higher transfer rate, less S3 gets
Also supports on demand video


Do range-based GETs to read files in parallel, getting similar benefits to multipart uploads
LIST can also be heavy/slow. Cache your object list in Dynamo/CloudSearch/RDS/etc to avoid it