Skip to content

Instantly share code, notes, and snippets.

@geeeeeeeeek
Created January 31, 2018 04:08
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save geeeeeeeeek/e733f9296250a7f290fe5d74bb128ef1 to your computer and use it in GitHub Desktop.
Save geeeeeeeeek/e733f9296250a7f290fe5d74bb128ef1 to your computer and use it in GitHub Desktop.
Autoscaling on AWS

Autoscaling on AWS

Project 1 of 15-719 Advanced Cloud Computing.

Zhongyi Tong (ztong@andrew.cmu.edu)

Overview

The program automates the workflow of building an autoscaling infrastucture on AWS, testing it against dynamic traffic, and destroying the infrastructure. The workflow also comes with policies to support autoscaling.

Usage

Before starting the workflow, make sure the parameters are properly set:

  • TPZ_ANDREW_ID in environ;
  • TPZ_SUBMISSION_PASSWORD in environ;
  • AWS credentials in ~/.aws/credentials.

Then, run the shell script:

./run.sh

Architecture

Infrastructure

Automation Script

Builing and testing an infrastructure are well seperated by InfraProvider and TestProvider.

image

InfraProvider integrates with Terraform to manage the autoscaling infrastructure:

  • init reads an infrastructure description from main.tf and initialize Terraform.
  • launch sets up these resources on AWS.
  • terminate destroys all the resources.

TestProvider integrates with the provided testing infrastructure:

  • init reads ALB/LG DNS from infra_summary by InfraProvider and generate testing access points.
  • launch registers, starts the test and blocks user process. It periodically checks testing status until it's finished.
  • upload packs required files and upload for submission.

Traffic Patterns

image

Auto Scaling Policies

The policies are generated based on the traffic patterns, experiments and the following observations:

  • CPU utilization is the best metric to reflect server load.
  • The desired CPU usage is definded as 40% - 60%. Lower utilization wastes instance hours; higher utilization may affect RPS.
  • The termination and initilization of an instance takes a lot of time (more or around 1 minute). So when we observe a fast increase of CPU usage, we need to add instances quickly.

After several tests, step adjustments are used:

  • CPU Utilization: [0, 20), remove 2 instances
  • CPU Utilization: [20, 40), remove 1 instance
  • CPU Utilization: [40, 60), do nothing (desired)
  • CPU Utilization: [60, 100), add 2 instance

Cloudwatch will always go off in an one-minute interval. Auto scaling group decides what to do according to the policies above.

Experiments:

When using a fixed number of instances, the CPU Utilization graph looks much like the traffic pattern.

image

With auto scaling policies, its much more adaptive and random. Note that the graph below is not using the final policies.

image

@rahulpragma
Copy link

hi, Where can I find run.sh what you have mentioned here? thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment