Autoscaling on AWS
Project 1 of 15-719 Advanced Cloud Computing.
Zhongyi Tong (firstname.lastname@example.org)
The program automates the workflow of building an autoscaling infrastucture on AWS, testing it against dynamic traffic, and destroying the infrastructure. The workflow also comes with policies to support autoscaling.
Before starting the workflow, make sure the parameters are properly set:
- AWS credentials in
Then, run the shell script:
Builing and testing an infrastructure are well seperated by
InfraProvider integrates with Terraform to manage the autoscaling infrastructure:
initreads an infrastructure description from
main.tfand initialize Terraform.
launchsets up these resources on AWS.
terminatedestroys all the resources.
TestProvider integrates with the provided testing infrastructure:
initreads ALB/LG DNS from infra_summary by
InfraProviderand generate testing access points.
launchregisters, starts the test and blocks user process. It periodically checks testing status until it's finished.
uploadpacks required files and upload for submission.
Auto Scaling Policies
The policies are generated based on the traffic patterns, experiments and the following observations:
- CPU utilization is the best metric to reflect server load.
- The desired CPU usage is definded as 40% - 60%. Lower utilization wastes instance hours; higher utilization may affect RPS.
- The termination and initilization of an instance takes a lot of time (more or around 1 minute). So when we observe a fast increase of CPU usage, we need to add instances quickly.
After several tests, step adjustments are used:
- CPU Utilization: [0, 20), remove 2 instances
- CPU Utilization: [20, 40), remove 1 instance
- CPU Utilization: [40, 60), do nothing (desired)
- CPU Utilization: [60, 100), add 2 instance
Cloudwatch will always go off in an one-minute interval. Auto scaling group decides what to do according to the policies above.
When using a fixed number of instances, the CPU Utilization graph looks much like the traffic pattern.
With auto scaling policies, its much more adaptive and random. Note that the graph below is not using the final policies.