Skip to content

Instantly share code, notes, and snippets.

@ryanwohara
Forked from jrahme-cci/Scenario 3 - CYAI.md
Created September 7, 2021 18:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ryanwohara/873f2ef66f8b460cc32fc03fb01a18f9 to your computer and use it in GitHub Desktop.
Save ryanwohara/873f2ef66f8b460cc32fc03fb01a18f9 to your computer and use it in GitHub Desktop.

This scenario is a more hypothetical one but is representative of a situation that could occur at CircleCI.

You’ve just come online and have taken over the on-call shift for a service, VM-scheduler. This service is responsible for starting and maintaining customers’ VMs through the entire VM lifecycle. It is running on a cloud provider and uses two distinct regions to spin up VMs for customer jobs to run on. A page comes in, “VM-Scheduler: High VM boot failure rate” A graph is included in the alert that shows a sharp increase in boot failures over the last five minutes.

Similar to the last scenario, you have access to your laptop, and your expected monitoring and observability toolsets. This scenario occurs during a regular work week, but just after regular working hours.

@ryanwohara
Copy link
Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment