dmathieu/checklist.md

## checklist.md

      
    Raw
  

              checklist.md
            
          
What happens when a dependency starts failing? What if it begins failing slowly?
How can the system degrade in a graceful manner?
How does the system react to overload? Is it “well conditioned?”
What’s the worst-case scenario for total failure?
How quickly can the system recover?
Is delayable work delayed?
How do you monitor the system? How do you detect anomalies?
How do you deploy the system? How do you deploy in an emergency?
Are you learning from all failures?