turtlemonvh/aws_prices.md

## aws_prices.md

      
    Raw
  

              aws_prices.md
            
          
    AWS provides a lot of different options for running compute.
In the context of a data pipeline, I was wondering how these stacked up, esp. in terms of price.
Options

Glue

Links


https://aws.amazon.com/glue/pricing/

Price


0.44 per DPU hour ; 4 vCPU and 16 GB

Notes and caveats


only python 2.7 shell or spark: https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html
limited python library support for shell jobs: https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#python-shell-supported-library
billed per second, with a 1-minute minimum for spark-shell jobs
pricing is the same in all regions

Lambda

Links


https://aws.amazon.com/lambda/pricing/

Price


$0.00001667 For ever GB-SECOND
For 16 GB, 4 CPU: $0.00001667 * 16 * 60*60 = $0.9602 for 16 GB/hr

Notes and caveats


$0.20 per 1M requests
duration is rounded up to the nearest 100ms
https://docs.aws.amazon.com/lambda/latest/dg/limits.html

max memory is ~3 GB
max runtime is 15 min


pricing is the same in all regions

Fargate

Links


https://aws.amazon.com/fargate/pricing/

Price


$0.04048 per CPU hour
$0.004445 per GB hour
For 16 GB, 4 CPU: $0.04048 * 4 + $0.004445 * 16 = $0.2330

Notes and caveats


billed per second, 1 min minimum
prices are for us-east-1
"Duration is calculated from the time you start to download your container image (docker pull) until the Task terminates, rounded up to the nearest second"
largest instance is 4 CPU, 30 GB

EC2

Links


https://aws.amazon.com/ec2/pricing/on-demand/
https://aws.amazon.com/ec2/spot/pricing/
https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/

Price

On demand pricing (all 16 GB, 4 vCPU; us-east-2)

t3.xlarge = $0.1664
t2.xlarge = $0.1856
m5.xlarge = $0.192
m5a.xlarge = $0.172
m5ad.xlarge = $0.206
m5d.xlarge = $0.226
m4.xlarge = $0.20

Spot pricing (all 16 GB, 4 vCPU; us-east-2)

t3.xlarge = $0.0501
t2.xlarge = $0.0557
m5.xlarge = $0.0401
m5a.xlarge = $0.0403
m5ad.xlarge = $0.0399
m5d.xlarge = $0.0399
m4.xlarge = $0.038

Notes and caveats


For ECS, "There is no additional charge for EC2 launch type"
For Batch, "There is no additional charge for AWS Batch. You pay for AWS resources (e.g. EC2 instances or AWS Lambda functions) you create to store and run your application."
"With On-Demand instances, you pay for compute capacity by per hour or per second depending on which instances you run."
"Defined Duration" and "Reserved Instances" options offer savings between on demand and spot
"EC2 usage are billed on one second increments, with a minimum of 60 seconds"

EMR

Links


https://aws.amazon.com/emr/pricing/

Price

Tends to be ~25% addition to EC2 base price; e.g.

m5.xlarge = $0.192 per Hour (EC2) + $0.048 per Hour (EMR)
m5a.xlarge = $0.172 per Hour (EC2) + $0.043 per Hour (EMR)
m4.xlarge = $0.20 per Hour (EC2) + $0.06 per Hour (EMR)

Notes and caveats


you pay a per-second rate for every second you use, with a one-minute minimum

Summary

Observations

Lambda is by far the most expensive (twice as much as Glue).
Glue is almost 2x the price of EC2, which is kind of surprising.

I imagine there will be a price reduction here soon to bring the price here within range of Fargate.


Fargate is close to EC2 on demand, with a small premium (5-10%).
EMR is ~25% more than EC2 on-demand cost.

It appears the EMR upcharge stays the same when using Spot instances. Even in this case, this is much more affordable than Glue.
I imagine it will be possible to launch an ephemeral EMR job from a lamba step function soon. Even now that can be done with a few extra steps.
Tying together AWS EMR "steps" and Lambfa steps seems like a "good idea".


EC2 and friends (batch, ECS) are the cheapest, even at on-demand prices
Once you go to the reduced price options ("defined duration", "reserved instances", or "spot") prices are much better, often 4x cheaper without having to look very hard
Total price ratio of lambda vs spot: $0.96/$0.04 = 24

Quick takes

If you will be running a lot of compute, you can't beat AWS Spot.
In smaller volumes, Fargate is a great value for serverless compute.
Lambda should be saved for lighter workloads.
Now that Fargate can be used in lambda step functions, I don't think I'll have a need for Glue python scripts.