AWS provides a lot of different options for running compute.
In the context of a data pipeline, I was wondering how these stacked up, esp. in terms of price.
- 0.44 per DPU hour ; 4 vCPU and 16 GB
- only python 2.7 shell or spark: https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html
- limited python library support for shell jobs: https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#python-shell-supported-library
- billed per second, with a 1-minute minimum for spark-shell jobs
- pricing is the same in all regions
- $0.00001667 For ever GB-SECOND
- For 16 GB, 4 CPU: $0.00001667 * 16 * 60*60 = $0.9602 for 16 GB/hr
- $0.20 per 1M requests
- duration is rounded up to the nearest 100ms
- https://docs.aws.amazon.com/lambda/latest/dg/limits.html
- max memory is ~3 GB
- max runtime is 15 min
- pricing is the same in all regions
- $0.04048 per CPU hour
- $0.004445 per GB hour
- For 16 GB, 4 CPU: $0.04048 * 4 + $0.004445 * 16 = $0.2330
- billed per second, 1 min minimum
- prices are for us-east-1
- "Duration is calculated from the time you start to download your container image (docker pull) until the Task terminates, rounded up to the nearest second"
- largest instance is 4 CPU, 30 GB
- https://aws.amazon.com/ec2/pricing/on-demand/
- https://aws.amazon.com/ec2/spot/pricing/
- https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/
On demand pricing (all 16 GB, 4 vCPU; us-east-2)
- t3.xlarge = $0.1664
- t2.xlarge = $0.1856
- m5.xlarge = $0.192
- m5a.xlarge = $0.172
- m5ad.xlarge = $0.206
- m5d.xlarge = $0.226
- m4.xlarge = $0.20
Spot pricing (all 16 GB, 4 vCPU; us-east-2)
- t3.xlarge = $0.0501
- t2.xlarge = $0.0557
- m5.xlarge = $0.0401
- m5a.xlarge = $0.0403
- m5ad.xlarge = $0.0399
- m5d.xlarge = $0.0399
- m4.xlarge = $0.038
- For ECS, "There is no additional charge for EC2 launch type"
- For Batch, "There is no additional charge for AWS Batch. You pay for AWS resources (e.g. EC2 instances or AWS Lambda functions) you create to store and run your application."
- "With On-Demand instances, you pay for compute capacity by per hour or per second depending on which instances you run."
- "Defined Duration" and "Reserved Instances" options offer savings between on demand and spot
- "EC2 usage are billed on one second increments, with a minimum of 60 seconds"
Tends to be ~25% addition to EC2 base price; e.g.
- m5.xlarge = $0.192 per Hour (EC2) + $0.048 per Hour (EMR)
- m5a.xlarge = $0.172 per Hour (EC2) + $0.043 per Hour (EMR)
- m4.xlarge = $0.20 per Hour (EC2) + $0.06 per Hour (EMR)
- you pay a per-second rate for every second you use, with a one-minute minimum
Observations
- Lambda is by far the most expensive (twice as much as Glue).
- Glue is almost 2x the price of EC2, which is kind of surprising.
- I imagine there will be a price reduction here soon to bring the price here within range of Fargate.
- Fargate is close to EC2 on demand, with a small premium (5-10%).
- EMR is ~25% more than EC2 on-demand cost.
- It appears the EMR upcharge stays the same when using Spot instances. Even in this case, this is much more affordable than Glue.
- I imagine it will be possible to launch an ephemeral EMR job from a lamba step function soon. Even now that can be done with a few extra steps.
- Tying together AWS EMR "steps" and Lambfa steps seems like a "good idea".
- EC2 and friends (batch, ECS) are the cheapest, even at on-demand prices
- Once you go to the reduced price options ("defined duration", "reserved instances", or "spot") prices are much better, often 4x cheaper without having to look very hard
- Total price ratio of lambda vs spot: $0.96/$0.04 = 24
Quick takes
- If you will be running a lot of compute, you can't beat AWS Spot.
- In smaller volumes, Fargate is a great value for serverless compute.
- Lambda should be saved for lighter workloads.
- Now that Fargate can be used in lambda step functions, I don't think I'll have a need for Glue python scripts.
Relevant to the above if you are tying things together with step functions:
Lambda step functions
Links
Price
Notes and caveats