Skip to content

Instantly share code, notes, and snippets.

@wallentx
Created June 13, 2018 23:29
Show Gist options
  • Save wallentx/11f863849a25606655d9998f0e38866f to your computer and use it in GitHub Desktop.
Save wallentx/11f863849a25606655d9998f0e38866f to your computer and use it in GitHub Desktop.
postmortem-template-confluence
Incident <#> - <Post Mortem Title>
Date:
This is a blameless Post Mortem.
We will not focus on the past events as they pertain to "could've", "should've", etc.
All follow up action items will be assigned to a team/individual before the end of the meeting. If the item is not going to be top priority leaving the meeting, don't make it a follow up item.
Incident Leader:
Name Title Team/Department
Authors:
Name Title Team/Department
Description:
Short explanation of the issue (1 or 2 sentences describing the problematic event that happened.)
Example: The production server needed to be updated. During the update procedure, all production servers crashed for an hour.)
Impact:
What was the impact of the incident. This should include the total
duration of the outage if applicable.
Contributing Factor(s):
Technical explanation of the issue. Should define the contributing factor(s) and
why it's an issue.
Detection:
Stabilization Steps:
What specific steps and actions were taken to stabilize the issue. This
does not always entail a "fix" as further actions should be listed under
"Action Items"
Timeline:
Please note the time to detect and time to resolve and add to the incidents list
Timeline of events, including exact duration of downtime.
The timeline should be in chronological order, showing what happened when, but
it should also explain what the team knew at the time.
For example, someone deploys a bad build that triggers an alert, but no one
initially realizes this is what happened. The timeline should list first that the
bad build was deployed, but that the oncall person was not aware of this at the
time it occured. Later the timeline might list an event where the oncall person
becomes aware this is the case.
Action Items:
Action items going forward to fix the issue and reduce chance of contributing factors being an issue.
This MUST include owners/teams assigned to these actions to see them through, and have an issue tracked in this repository (or otherwise linked to external team kanban/issue tracker).
Lessons Learned:
What went well:
What went wrong:
Where we got lucky:
Supporting information:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment