Skip to content

Instantly share code, notes, and snippets.

@aslakknutsen
Last active March 21, 2018 04:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save aslakknutsen/fd1af58cfd51ead7857e2bd9549fb26e to your computer and use it in GitHub Desktop.
Save aslakknutsen/fd1af58cfd51ead7857e2bd9549fb26e to your computer and use it in GitHub Desktop.
OSiO: Incident Report

The incident report is intended as a write up for transparency, cross-team knowledge transfer and to get in the habit of thinking in terms of continues improvement.

An Incident report should be created for each OSiO outage(small or large) in GitHub with label "type/incident": https://github.com/openshiftio/openshift.io/issues

Note: this is a public report so de sensitise data. No tokens, no users, no internal links etc

"Incident title"

Period: "From-To date"

System: "OSiO|Build|Idler"

Effected: "Estimated effected users"

Story

"Describe reason for change/events leading up to the incident. Include timeline of events."

"Who were at which bar when the phone rang etc"

"Links to feature dev issue, reported user error etc"

Rundown

"Describe what was found, how it was found, the effect across services and across users. Include timeline of events."

"metrics of events etc"

Takeaway

"What have we learned?"

"What are we doing to avoid it happening again?"

"Short/Medium/Long term"

"code changes? build/verification changes? monitoring changes? process changes?"

"Links to follow up changes"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment