Last active
July 18, 2025 18:09
-
-
Save asyrewicze/3fbef48088ac39196eec1fbdd85f0065 to your computer and use it in GitHub Desktop.
Outage Postmortem Markdown Template
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # IT Outage Postmortem | |
| ## Report by: | |
| > Name, Job Title, Team Name | |
| > | |
| > -INSERT RESPONSE- | |
| ## What Happened | |
| > TL;DR summary of the incident | |
| > | |
| > -INSERT RESPONSE- | |
| ## Impact Summary | |
| > Duration: (Ex. 2h 14m) | |
| > Affected Services: (Ex. Public website, API gateway...etc) | |
| > User Impact: (Ex. 503 errors for 18% of traffic during outage window) | |
| > Customer-Facing? Yes/No ??? | |
| ## Timeline | |
| - `12:03 AM` - Monitoring alert triggered | |
| - `12:10 AM` - Initial triage began | |
| - ... | |
| ## Root Cause | |
| > What actually broke? | |
| > | |
| > -INSERT RESPONSE- | |
| ## Contributing Factors | |
| > Were there other issues that made this worse? | |
| > (Ex. Missing alert, wrong runbook, undertrained staff, legacy system quirks) | |
| > | |
| > -INSERT RESPONSE- | |
| ## (Optional) SLA (Service Level Agreement) Impact | |
| > Outage was / was not within defined SLA window - Explain | |
| > | |
| > -INSERT RESPONSE- | |
| ## (Optional) Compliance Framework Implications (If any) | |
| > What framework violation occured? (Ex. HIPAA control XYZ) | |
| > | |
| > -INSERT RESPONSE- | |
| ## Security Related Incident? | |
| > Yes or No? Explain | |
| > | |
| > -INSERT RESPONSE- | |
| ## Fix Implemented | |
| > What was done to resolve the issue? | |
| > | |
| > -INSERT RESPONSE- | |
| - [ ] Temporary workaround | |
| - [ ] Permanent fix deployed | |
| - [ ] Follow-up change(s) scheduled | |
| > If follow-ups are scheduled what are they? And When? | |
| > | |
| > -INSERT RESPONSE- | |
| ## Lessons Learned | |
| ### Monitoring | |
| - [ ] Alert triggered appropriately? | |
| - [ ] Escalation procedures effective? | |
| ### Communication | |
| - [ ] Stakeholders notified in time? | |
| - [ ] Response team aligned? | |
| ### Documentation | |
| - [ ] Runbook useful? | |
| - [ ] Diagrams or workflows outdated? | |
| ### Process / Human Factors | |
| - [ ] Better onboarding or training needed? | |
| - [ ] Were roles clearly defined? | |
| > Add any additional insights below: | |
| > | |
| > -INSERT RESPONSE- | |
| ## Action Items | |
| | Owner | Task | Due Date | Status | | |
| |-------|------|----------|--------| | |
| | Alice | Patch server config | 2025-07-20 | ☐ | |
| | Bob | Update internal docs | 2025-07-21 | ☐ | |
| | Carol | Run RCA review | 2025-07-25 | ☐ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment