Skip to content

Instantly share code, notes, and snippets.

@darkuncle
Last active January 17, 2024 23:18
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save darkuncle/d366a6c4323db64a7b761704d6f1ab95 to your computer and use it in GitHub Desktop.
Save darkuncle/d366a6c4323db64a7b761704d6f1ab95 to your computer and use it in GitHub Desktop.
The Rules - guidelines learned through hard experience in operations

(subject to additions, but rarely changes)

rule 0: It has to work.

rule 1: As simple as possible.

rule 2: Use the right tool for the job.

rule 3: Everything is a tradeoff. (see Rule 41)

rule 4: A tool you know beats one you don't (but previous rules trump).

rule 5: Learn to use the defaults.

rule 6: Learn to use the classics. (cf. taco bell programming)

rule 7: Assume good intent, if not competence; the common case is benign neglect - and everyone makes mistakes.

rule 8: No one-offs. (see rule 21)

rule 9: If you don't have time to do it right, you will never have time to do it over (but see also rule 10).

rule 10: Good and complete beats perfect and unfinished (related: worse is better). (but see also Rule 23)

rule 11: If at all possible, do it while you're thinking of it - but in any case, always write it down. If it isn't written down, it doesn't exist.

rule 12: (corollary to previous rule) Telepathy should never be an implicit part of your project planning. Communicate, in writing, early and often, especially with those outside of your immediate team. (If it's written down but nobody else can see it, it doesn't exist for them.)

rule 13: Listen to your gut - if something feels sketchy or vaguely-defined, double-check it before you proceed. Hand-waving is a sure sign of danger ahead. (see also rules 8 and 9)

rule 14: Drive to success, not to a deadline. The timeline should be an output of good project planning, not an input. (see also rules 9 and 12)

rule 15: (corollary to previous rule) Before you begin, be clear on what constitutes success; be clear if that definition changes; be very clear on this point with colleagues and customers.

rule 16: Automate, automate, automate. If you have to do it more than twice, make the computer do it for you. (with obvious caveats)

rule 17: Consider your audience and adjust the level of detail accordingly, lest you lose the interest of upper management or the respect of your technical peers.

rule 18: Understand the difference between a symptom and a cause, and strive to treat the latter. Communicate this distinction (and its importance) clearly.

rule 19: Generally speaking, shallow but broad experience will serve you better than narrow and deep - be an expert on a few things, but be conversant with as many things as possible. (see Steve Simmons' classic description of a systems administrator)

rule 20: Uptime is job one. It doesn't matter how compelling that new feature is if nobody can actually see it. (see rule 0)

rule 21: Consistency is a virtue, but beware monoculture: variations among platforms and software should be few and intentional, especially at scale. (see rule 8)

rule 22: The perfect is the enemy of the good: don't let an inability to do everything become an excuse to do nothing. (see rule 10)

rule 23: (exception to previous rule) When it comes to security, do it right or not at all. Something that has the appearance of security, but is severely flawed (e.g. HTTPS using SSLv3), can be worse than something that is explicitly not secure (HTTP). Enable and encourage decisions (for users, mgmt and colleagues) based on reality, not appearances.

rule 24: Be the kind of leader you yourself would like to follow.

rule 25: (special travel rule) Pay in advance. Always carry cash. Only use the cash if you can't use your cards.

rule 26: Never promise what someone else will have to deliver without checking with them first. It's easy to say something will be done when you're not the one doing it.

rule 27: Just because an option is the cheapest doesn't make it the least expensive. The biggest costs are often not present in your monthly bill.

rule 28: Automation begins (and oftentimes, ends) with documentation and the elimination of edge cases (see rule 8). If it's not reproducible, you can't automate it.

rule 29: It's a very short step from outage to outrage. Manage those expectations early and often; communication is critical (see rule 12).

rule 30: Service delivery is everyone's job, without exception (rules 0 and 20).

rule 31: The available work will always grow to equal at least 150% of your available time; an aggressive defense of work/life balance is key to career satisfaction and longevity.

rule 32: Requirements (whether functional or non-functional) specify an outcome, not an implementation. Don't become overly attached to the current way of doing things.

rule 33: Sometimes the best solution to a problem is to change the requirements (see rule one). However, beware the XY Problem.

rule 34: Make it hard to do the wrong thing: the default thing, the easy thing, and the right thing should all be the same thing. "Build your opponent a golden bridge to retreat across." -- Sun Tzu

rule 35: DNS is always production.

rule 36: The problem is probably DNS.

rule 37: You can’t fix a people problem with a technical solution.

rule 38: Everything fails, all the time - expect it, architect for it, plan accordingly. Retries, failovers, and graceful degradation are critically important architectural principles (but see also rule 1).

rule 39: There are ways to misuse any tool.

rule 40: When you don't know what the problem is, every discrepancy is suspect.

rule 41: Nothing is free. If you think it’s free, you don’t understand the tradeoffs yet. Not all costs are financial (see rule 27).

rule 42: Don't panic.

rule 43: Learn from the mistakes of others; life's too short to make them all yourself.

rule 44: (Corollary to previous rule) No matter how many mistakes you avoid, there remain an infinite number that you will not.

rule 45: Every piece of data you store is a liability first and an asset second; don't keep data you don't need. Breaches will happen (see rule 38).

rule 46: Every new capability carries with it a corresponding risk; the newer the technology involved, the harder it is to assess downstream risks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment