AWS provides a lot of different options for running compute.
In the context of a data pipeline, I was wondering how these stacked up, esp. in terms of price.
- 0.44 per DPU hour ; 4 vCPU and 16 GB
- only python 2.7 shell or spark: https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html
- limited python library support for shell jobs: https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#python-shell-supported-library
- billed per second, with a 1-minute minimum for spark-shell jobs
- pricing is the same in all regions
- $0.00001667 For ever GB-SECOND
- For 16 GB, 4 CPU: $0.00001667 * 16 * 60*60 = $0.9602 for 16 GB/hr
- $0.20 per 1M requests
- duration is rounded up to the nearest 100ms
- https://docs.aws.amazon.com/lambda/latest/dg/limits.html
- max memory is ~3 GB
- max runtime is 15 min
- pricing is the same in all regions
- $0.04048 per CPU hour
- $0.004445 per GB hour
- For 16 GB, 4 CPU: $0.04048 * 4 + $0.004445 * 16 = $0.2330
- billed per second, 1 min minimum
- prices are for us-east-1
- "Duration is calculated from the time you start to download your container image (docker pull) until the Task terminates, rounded up to the nearest second"
- largest instance is 4 CPU, 30 GB
- https://aws.amazon.com/ec2/pricing/on-demand/
- https://aws.amazon.com/ec2/spot/pricing/
- https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/
On demand pricing (all 16 GB, 4 vCPU; us-east-2)
- t3.xlarge = $0.1664
- t2.xlarge = $0.1856
- m5.xlarge = $0.192
- m5a.xlarge = $0.172
- m5ad.xlarge = $0.206
- m5d.xlarge = $0.226
- m4.xlarge = $0.20
Spot pricing (all 16 GB, 4 vCPU; us-east-2)
- t3.xlarge = $0.0501
- t2.xlarge = $0.0557
- m5.xlarge = $0.0401
- m5a.xlarge = $0.0403
- m5ad.xlarge = $0.0399
- m5d.xlarge = $0.0399
- m4.xlarge = $0.038
- For ECS, "There is no additional charge for EC2 launch type"
- For Batch, "There is no additional charge for AWS Batch. You pay for AWS resources (e.g. EC2 instances or AWS Lambda functions) you create to store and run your application."
- "With On-Demand instances, you pay for compute capacity by per hour or per second depending on which instances you run."
- "Defined Duration" and "Reserved Instances" options offer savings between on demand and spot
- "EC2 usage are billed on one second increments, with a minimum of 60 seconds"
Tends to be ~25% addition to EC2 base price; e.g.
- m5.xlarge = $0.192 per Hour (EC2) + $0.048 per Hour (EMR)
- m5a.xlarge = $0.172 per Hour (EC2) + $0.043 per Hour (EMR)
- m4.xlarge = $0.20 per Hour (EC2) + $0.06 per Hour (EMR)
- you pay a per-second rate for every second you use, with a one-minute minimum
Observations
- Lambda is by far the most expensive (twice as much as Glue).
- Glue is almost 2x the price of EC2, which is kind of surprising.
- I imagine there will be a price reduction here soon to bring the price here within range of Fargate.
- Fargate is close to EC2 on demand, with a small premium (5-10%).
- EMR is ~25% more than EC2 on-demand cost.
- It appears the EMR upcharge stays the same when using Spot instances. Even in this case, this is much more affordable than Glue.
- I imagine it will be possible to launch an ephemeral EMR job from a lamba step function soon. Even now that can be done with a few extra steps.
- Tying together AWS EMR "steps" and Lambfa steps seems like a "good idea".
- EC2 and friends (batch, ECS) are the cheapest, even at on-demand prices
- Once you go to the reduced price options ("defined duration", "reserved instances", or "spot") prices are much better, often 4x cheaper without having to look very hard
- Total price ratio of lambda vs spot: $0.96/$0.04 = 24
Quick takes
- If you will be running a lot of compute, you can't beat AWS Spot.
- In smaller volumes, Fargate is a great value for serverless compute.
- Lambda should be saved for lighter workloads.
- Now that Fargate can be used in lambda step functions, I don't think I'll have a need for Glue python scripts.
And comparing SWF and Lambda step functions
And for SWF you also have charges for
So it looks like step functions are strictly less expensive than SWF (at least within my understanding of what comprises a state transition in lambda step functions vs task, signal, etc in SWF). Which is not surprising if Amazon is trying to push customers in that direction so they can put most of their effort into the hot new orchestration tool.