Skip to content

Instantly share code, notes, and snippets.

@barelyknown
Created November 9, 2020 20:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save barelyknown/f70129226bfd2833d9b29fc59c25d583 to your computer and use it in GitHub Desktop.
Save barelyknown/f70129226bfd2833d9b29fc59c25d583 to your computer and use it in GitHub Desktop.
We're not going to rename "Incidents" to "Opportunities", but we probably should!

If half the secret to optimizing operational performance is careful planning of each day's work, the other half is learning from all significant deviations between planned and actual performance so that avoidable issues aren't repeated. XBE calls this process "Incident Management", and it's the key to continuous improvement.

Most teams find incident management challenging for two main reasons:

  1. Strong cultural aversion to placing attention on mistakes and problems
  2. Unfamiliarity with continuous improvement tools

Culture eats strategy for breakfast.

When a company's culture makes incident management uncomfortable or punitive, familiarity with continuous improvement tools won't matter much. Therefore, let's focus first on what norms get in the way and what changes will make continuous improvement possible.

If you want to be great, you can't be good enough.

The only way to continuously improve is to believe you're not good enough in the first place. Otherwise, any suggestion of significant improvement opportunity either doesn't have urgency, or (even worse) is an identity threat. Unfortunately, it is very common for company cultures to be built (explicitly or implicitly) on the idea that the company is already good enough. Given that the primary motivation of most employees is to stay employed, and the most common non-economic reason for an employee to be let go is because of a culture/values mismatch, most employees in most companies are fearful of having any attention drawn to incidents that they played some part in.

While culture change takes time, it happens most quickly when new norms are made explicit and modeled at the top of the organization. Implicit and inconsistent expectations make for slow and scary change. Once an organization has demonstrated that it sees incidents as valuable assets to be mined for sustained improvement, then competence with continuous improvement tools starts to matter.

Division by zero is meaningless.

Efficiency is a measure of the resources consumed to produce a unit of quality output ($ / ton). As we learned as kids, you can't divide by zero, and so if we reduce cost (the numerator) but have zero quality output to show for it (the denominator), it's meaningless. Effectiveness first, efficiency second. This is true for the core operation, but also for the incident management process itself. Don't worry about how efficient you are at managing incidents until you're good at it.

The secret to problem solving is knowing what problem you're solving.

With the distinction between effectiveness and efficiency in mind, we must study variance to find what actually went wrong in the planning and/or execution of jobs. We call these "incidents", and they should both quantify the impact on output (effectiveness) and input (efficiency). Any "significant" incidents should be recorded immediately, whether or not you know why they happened or if anything could have been done to prevent them. When determining the definition of "significant", you may want to consider other policies that you have (safety, purchasing, etc) and go with something consistent. In general, you won't regret the time that you spend documenting incidents if you have dependable downstream triage and improvement processes.

Time moves in one direction, memory another.

While impacts on output and input are self-evident and therefore easy to document, the circumstances and decisions that led to the incident are not always so obvious. As time passes, it becomes difficult to accurately collect the information needed to do a proper root cause analysis. Therefore, we recommend researching incidents quickly and thoroughly, whether or not you have the resources available to make immediate improvements. The recorded information won't fade, and memories certainly will. In XBE, we recommend using the "Root Cause" feature to record this research to keep organized and focused while driving upstream until you find where action could have changed the fates (or not).

Action items convert the heat of incidents into progress.

Without knowing the root cause, you can't take action, but just because you do doesn't mean you will. It's critical to triage root causes into action items and manage them through completion. Success requires agreement about the solution, delegated ownership, resource availability, and a deadline. Organizations that don't act to address the root causes of visible incidents not only absorb the opportunity cost of inaction, but can also damage their culture.

For everything else, comments.

There is great power in shared context. And therefore, if something isn't self-evident in the data and might be helpful to posterity, make note of it.

What can we do to help?

Whether you have a feature idea, a question about existing capabilities, or a process/change management challenge that we could help with, let us know. The XBE community has made tremendous progress in these areas, and we're here to help lift everyone to the next level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment