Date: 17/09/2018
- Author: Filipe Mendes
- Team: Android Team
- Severity: Minor/Moderate/Major/Critical
- Status: Work in Progress / Complete
Short description (5 sentences). List the duration along with start and end times. State the impact (most user requests resulted in 500 errors, at peak 100%), main root cause.
Go as deep as you can to better understand what needs to be improved, do not sugarcoat.
What triggered the incident.
How the incident was detected. If it was detected by multiple sources (monitoring system, clients, customer support, accidental discovery), list them all.
How many requests have failed, how many users, companies affected. If not sure, estimate based on historical data.
How did you solve the problem? If an RFC was created, please write it here.
Date | Event |
---|---|
1st January | Describe what happened... |
2nd February | Describe what action was taken... |
- What went well
- What went wrong
- Where we got lucky
Add anything you feel important related to the incident. E.g. screenshots of the graphs related to the incidents, links to the resources etc.
Minor: minor issue not visible for most of the customers or an issue not disturbing customers' regular workflow. Examples: response times of some of APIs have doubled.
Moderate: low importance functionality not available for all/most of the customers or important functionality doesn't work for several companies. Examples: some users can't add activity.
Major: high importance functionality not available for all/most of the users or app doesn't work for several companies. Examples: search doesn't work, login takes too long, user 123 can't login.
Critical: critical functionality not available for all/most of the users, easy to reproduce or data loss. Examples: deals dont load, cant login, cant see homepage