This scenario is a more hypothetical one but is representative of a situation that could occur at CircleCI.
You’ve just come online and have taken over the on-call shift for a service, VM-scheduler. This service is responsible for starting and maintaining customers’ VMs through the entire VM lifecycle. It is running on a cloud provider and uses two distinct regions to spin up VMs for customer jobs to run on. A page comes in, “VM-Scheduler: High VM boot failure rate” A graph is included in the alert that shows a sharp increase in boot failures over the last five minutes.
Similar to the last scenario, you have access to your laptop, and your expected monitoring and observability toolsets. This scenario occurs during a regular work week, but just after regular working hours.