A post mortem is a write up of a system or process failure such that causes can be found and tasks created to prevent this class of issues from happening in future. It is blameless, and as such does not focus on past events as they pertain to “Could’ve”, “Should’ve”.
A_SHORT_SUMMARY
All related work can be tracked in the canonical bug tracker at:
-
BUG_TRACKER_URL_WITH_FILTER_FOR_TICKETS
The impacts to the business that owns the service includes:
Term |
Meaning |
Customer |
The users of the application or service |
Project Owner (PO) |
The owner of the project |
Incident Coordinator (IC) |
The person or people responsible for managing and documenting the incident |
Operations (OPS) |
The person or people responsible for investigating and providing a temporary solution for the issue |
Communication (COM) |
The person who is designated the point of contact between all team members |
Planning (PLN) |
The person who provides accountability that all follow up changes need to be made. |
Contributing Factor |
Something that happened that played a role in causing or prolonging the outage. Factors are broken down into "primary" and "secondary" factors. |
Primary Contributing Factor |
Something that caused the outage directly |
Secondary Contributing Factor |
Something that prolonged the outage, though it did not directly cause it |
Mitigating Factor |
Something that helped reduce the severity or length of the outage, but was not part of structured procedure or normal circumstance. |
Ongoing Risk |
Something that could have the outage worse or more likely to occur, but simply due to fortune did not. |
(To read & integrate):