Skip to content

Instantly share code, notes, and snippets.

@MXWest
Last active December 11, 2024 09:06
Show Gist options
  • Save MXWest/80a52b71937e1621cc6dd055e30c18c9 to your computer and use it in GitHub Desktop.
Save MXWest/80a52b71937e1621cc6dd055e30c18c9 to your computer and use it in GitHub Desktop.
SRE Antipatterns

By no means a complete list, but rather ones I think we should focus in on short term.

Antipattern 2: Humans Staring at Screens

If you have to wait for a human to detect an error, you've already lost

Any practice for which the detection of a problem condition relies on a human noticing that a particular series of data is abnormal. Substitue thresholds, correlation engines, velocity metrics, etc.

Antipattern 3: Mob Incident Response

All hands-on-deck incident handling without thought to coordination of efforts, reserves, and OSHT* troubleshooting, sleep cycles, human cognitive limits, or the deleterious effect of interrupts on engineering work.

*OSHT Troubleshooting:

  1. Observe the situation,
  2. State the problem
  3. Hypothesize the cause/ solution
  4. Test the solution.

Antipattern 9: Speed-Bump Engineering

Prevention of all errors is impossible, costly, and annoying to anyone trying to get things done.

Any process that increases the length of time between the creation of a change and its production release without either adding value to or providing definitive feedback on the production impacts of the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment