Skip to content

Instantly share code, notes, and snippets.

@fmendes6
Last active July 8, 2021 14:44
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fmendes6/a64c7f9452b6ae19ed900a87b97268b0 to your computer and use it in GitHub Desktop.
Save fmendes6/a64c7f9452b6ae19ed900a87b97268b0 to your computer and use it in GitHub Desktop.

Date: 17/09/2018

  • Author: Filipe Mendes
  • Team: Android Team
  • Severity: Minor/Moderate/Major/Critical
  • Status: Work in Progress / Complete

Summary

Short description (5 sentences). List the duration along with start and end times. State the impact (most user requests resulted in 500 errors, at peak 100%), main root cause.

Root cause

Go as deep as you can to better understand what needs to be improved, do not sugarcoat.

Trigger

What triggered the incident.

Detection

How the incident was detected. If it was detected by multiple sources (monitoring system, clients, customer support, accidental discovery), list them all.

Impact

How many requests have failed, how many users, companies affected. If not sure, estimate based on historical data.

Resolution

How did you solve the problem? If an RFC was created, please write it here.

Timeline

Date Event
1st January Describe what happened...
2nd February Describe what action was taken...

Lessons learned

  • What went well
  • What went wrong
  • Where we got lucky

Other notes

Add anything you feel important related to the incident. E.g. screenshots of the graphs related to the incidents, links to the resources etc.

Minor: minor issue not visible for most of the customers or an issue not disturbing customers' regular workflow. Examples: response times of some of APIs have doubled.

Moderate: low importance functionality not available for all/most of the customers or important functionality doesn't work for several companies. Examples: some users can't add activity.

Major: high importance functionality not available for all/most of the users or app doesn't work for several companies. Examples: search doesn't work, login takes too long, user 123 can't login.

Critical: critical functionality not available for all/most of the users, easy to reproduce or data loss. Examples: deals dont load, cant login, cant see homepage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment