A runbook is a precise list of steps for doing routine tasks, or debugging a system.
Good candidates for runbooks:
- procedures with a relatively precise begin state and end state
- lists of commands for launching a kubernetes cluster
- lists of commands for tearing down a kubernetes cluster
- links to dashboards for troubleshooting problems with a particular system, with some explanation of what to look for, what's normal, and what's not
- lists of commands that produce useful debug output for a particular system
These things don't belong in runbooks (but runbooks may link to them):
- System architecture diagrams
- Org charts, call trees, names of individuals
- Large paragraphs of text
- "Overview" type documentation, even if those docs include lists of commands
- How to build a software project
There is no such thing as a 100% automated system. A human will always need to be in the loop at some point.
Runbooks are an opportunity to break down silos. A well-written runbook empowers others who know less about a particular system than you do.
Sometimes, other documentation is actually a half-baked runbook in disguise. If you encounter this, ask a colleague if it makes sense to break out a precise series of steps into a runbook. This frees up the original docs for a more architectural style, philosophical style, creative style, etc.
Runbooks are docs, and they have all the challenges of keeping docs up to date.
Runbooks shouldn't have too many steps. If a sequence of steps in a runbook can be collapsed into a reliable script, we should strive to do that. This is part of our goal of eliminating toil. Automation scripts can begin their life as a solid runbook. Over time, a truly reliable automation program can emerge from a well-documented series of steps.