Skip to content

Instantly share code, notes, and snippets.

This scenario is a more hypothetical one but is representative of a situation that could occur at CircleCI.

You’ve just come online and have taken over the on-call shift for a service, VM-scheduler. This service is responsible for starting and maintaining customers’ VMs through the entire VM lifecycle. It is running on a cloud provider and uses two distinct regions to spin up VMs for customer jobs to run on. A page comes in, “VM-Scheduler: High VM boot failure rate” A graph is included in the alert that shows a sharp increase in boot failures over the last five minutes.

Similar to the last scenario, you have access to your laptop, and your expected monitoring and observability toolsets. This scenario occurs during a regular work week, but just after regular working hours.

@ryanwohara
ryanwohara / Scenario 1 - CYAI.md
Created September 7, 2021 18:13 — forked from jrahme-cci/Scenario 1 - CYAI.md
The first scenario for a choose your own adventure interview

You are on-call for a service named “http-router”, a simple HTTP router service. It’s sole purpose in life is to take requests from a front-end web application and pass them to backend services named “order-service” and “return-service”. You are on call for the “http-router”, the other services are managed by other teams at the company.

While sitting on your sofa during a lovely holiday weekend and listening to music, you are interrupted by the sounds of your phone telling you that you’ve received a text message. It’s an alert saying “http-router: unable to reach order-service (STATUS: 500)”. It’s now time to turn off the gramophone and face a different kind of music.

For this scenario you have access to your laptop, and expected monitoring tool suite. This scenario occurs during a long weekend, in the evening

Keybase proof

I hereby claim:

  • I am ryanwohara on github.
  • I am rohara (https://keybase.io/rohara) on keybase.
  • I have a public key ASATh0gdDS3koEx0EnbWmtHu5eR2kuCWf9Ssa-oxfnjpcgo

To claim this, I am signing this object:

2015-03-26 14:03:53,565 [salt.log.setup ][ERROR ] An un-handled exception was caught by salt's global exception handler:
NameError: global name 'RSA' is not defined
Traceback (most recent call last):
File "/usr/bin/salt-minion", line 14, in <module>
salt_minion()
File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 57, in salt_minion
minion.start()
File "/usr/lib/python2.7/site-packages/salt/__init__.py", line 262, in start
self.prepare()
File "/usr/lib/python2.7/site-packages/salt/__init__.py", line 241, in prepare