Skip to content

Instantly share code, notes, and snippets.

@jrahme-cci
Last active September 7, 2021 18:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jrahme-cci/5a80bc9407fef6b58d9cbba380710caf to your computer and use it in GitHub Desktop.
Save jrahme-cci/5a80bc9407fef6b58d9cbba380710caf to your computer and use it in GitHub Desktop.

This scenario is a more hypothetical one but is representative of a situation that could occur at CircleCI.

You’ve just come online and have taken over the on-call shift for a service, VM-scheduler. This service is responsible for starting and maintaining customers’ VMs through the entire VM lifecycle. It is running on a cloud provider and uses two distinct regions to spin up VMs for customer jobs to run on. A page comes in, “VM-Scheduler: High VM boot failure rate” A graph is included in the alert that shows a sharp increase in boot failures over the last five minutes.

Similar to the last scenario, you have access to your laptop, and your expected monitoring and observability toolsets. This scenario occurs during a regular work week, but just after regular working hours.

@jrahme-cci
Copy link
Author

scenario-3 CYAI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment