-
Schedule it:
Now!
-
Choose tests:
Kill Mysql!
-
What do we expect to happen?
- Everything explodes
- White screen with 500 error code
- Maybe a maintenance page
- Timeout connection
- We should get an alert that mysql is down
-
What are we going to do if things go horribly wrong?
Walk away, go to a bar, and get a beer!
-
Share this!
Last active
March 24, 2019 06:36
-
-
Save jyee/a9c1d74d20ef40ff114fc6a1705780f4 to your computer and use it in GitHub Desktop.
CodeMotion Rome - Chaos experiment documents
We killed the pod and it respawned quickly so we didn't see anything. So we deleted the mysql deployment. This resulted in an error screen that the database was unavailable.
- We didn't get a maintenance page or other user-friendly error screen.
- Alerts were misconfigured, so we didn't receive alerts immediately.
- Alerts didn't include action steps or links to relevant dashboards.
- We should set up a better outage page to maintain the look/feel of the site and other options for users when the DB is down.
- We need to update the alerts so they are sent sooner when the DB goes down. We also need to add action steps.
-
Make Jira tickets from the "Things we should fix/change"
-
Write a summary
We killed the pod and it respawned quickly so we didn't see anything. So we deleted the mysql deployment. This resulted in an error screen that the database was unavailable. We learned that the current error screen can be improved and that our existing alerts need updating in window time and messaging.
-
Share This!
-
Celebrate!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment