Skip to content

Instantly share code, notes, and snippets.

@RStankov
Created March 7, 2024 19:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save RStankov/37a8098d0f7053d2210bd570f94f2fc0 to your computer and use it in GitHub Desktop.
Save RStankov/37a8098d0f7053d2210bd570f94f2fc0 to your computer and use it in GitHub Desktop.
⛑️ Emergency Kit

⛑️ Emergency Kit

🎥 Video walkthrough of emergency handling `[VIDEO IN TOOL LIKE LOOM]`

Tools

Process

Steps to diagnose an incident:

Screenshot_2023-02-28_at_11 49 03 Screenshot_2023-02-28_at_11 49 06
  1. Communicate 2. All communication should be #engineering-emergency 1. Post every action you do related to the emergency 3. Acknowledge the incident in #feedback
  2. Investigate the issue
  3. Fix the issue
  4. Monitor if fix worked
  5. Do postmortem
    1. Implement improvements

Tips for handling issues

  • Revert deploys till the last working deploy
  • In monitoring tools
    • expand to 48 hours period and look for spikes
    • watch time, CPU load or memory
  • Check all recent changes, don’t ignore any of those
    • focus first on database, dependancies or infrastructure changes
  • Have a theory about every unusual behavior of the system, test your theories should explain every one
  • Isolate the issue to the lowest point in the tech stack
  • No need to open Pull Request for hotfix, you can just merge in master

[ADD PROJECT SPECIFIC TIPS]

Common Issues and Solutions

This is a non-definitive list. It is just a shortlist of ways to investigate symptoms of a bad deployment and how to fix them possibly.

My changes are not appearing after deploy `[PROJECT SPECIFIC REASON AND HOW TO HANDLE]`
Database is under heavy load `[PROJECT SPECIFIC REASON AND HOW TO HANDLE]`
Site performance is reduced or resulting in 503s `[PROJECT SPECIFIC REASON AND HOW TO HANDLE]`
When in doubt - rollback `[HOW TO ROLLBACK DEPLOY]`

[PROJECT SPECIFIC ISSUES]

How-tos

How to restart services `[LINK TO DOCUMENTAION]`
How to revert deploy `[LINK TO DOCUMENTAION]`
How to scale number of servers `[LINK TO DOCUMENTAION]`
How to fast track emergency fixes `[LINK TO DOCUMENTAION]`

[PROJECT SPECIFIC HOW-TOS]


Postmortem

  1. Write a postmortem, use this template.
  2. Share in #egineering Stack channel
  3. Add to the agenda of next engineering meeting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment