- What happens when a dependency starts failing? What if it begins failing slowly?
- How can the system degrade in a graceful manner?
- How does the system react to overload? Is it “well conditioned?”
- What’s the worst-case scenario for total failure?
- How quickly can the system recover?
- Is delayable work delayed?
- How do you monitor the system? How do you detect anomalies?
- How do you deploy the system? How do you deploy in an emergency?
- Are you learning from all failures?
Created
November 3, 2015 20:32
-
-
Save dmathieu/bf9c723176d99697877b to your computer and use it in GitHub Desktop.
Resiliency checklist
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment