Skip to content

Instantly share code, notes, and snippets.

@rhenning
Last active January 10, 2022 14:17
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save rhenning/f9fe784188642d4c0f769993e82b6ac4 to your computer and use it in GitHub Desktop.
Save rhenning/f9fe784188642d4c0f769993e82b6ac4 to your computer and use it in GitHub Desktop.
Notes from The Phoenix Project

The Phoenix Project

Chapters 1-5

  • Bill goes out of his way during his conversation with Dick to understand the real business impact of the payroll outage.
  • IT Operations is frequently viewed as a business cost/liability rather than a valued asset, as evidenced by language used to discuss infrastructure. Steve says "IT ... should be like the toilet ... I don't ever worry about it not working." Note that IT's building workspace has stains on the carpet and peeling paint.
  • Major stakeholders and technology managers are all represented and coordinating in realtime during the SAN outage.
  • Bill does not escalate when faced with the arrogance of Wes's "Quite frankly ... I think your head would explode if you had to deal with the relentless pace and complexity of what I deal with every day." Instead he takes a deep breath, counts to three, and empathizes with Wes, explaining that he too is frustrated with the situation.
  • Phoenix is an A-priority project in the company, which has already been communicated by management as critical to business success, and Brent is constantly being pulled off its tasks to work on other things, such as the SAN outage. He needs to be removed from the critical path on those issues, whether by training others or letting them figure it out on their own.
  • Closely control the expectations of coworkers like Sarah, who have the attention of Sr. Management, and are keen manipulators and responsibility evaders. This is someone who has probably read Stanley Bing's Sun Tzu Was A Sissy one too many times.
  • The IT team has invested in virtualization software, which streamlines the process of delivering virtual machines to stakeholders, but they still deliver late due to overutilization of employees and queue time. The takeaway here is that technology is just a tool, so take blue-sky promises with a grain of salt. Technology is unlikely to solve business process or personal relationship issues.
  • Effective root cause analysis requires an accurate timeline of all changes in a system and its dependencies.
  • The SAN failed due to an upgrade that contained years worth of patches that could not be rolled back. Taking on such large upgrades is fraught with peril. It's better to do small, frequent upgrades. Wes states that IT couldn't get the necessary maintenance window for upgrades until their hand was forced by the vendor. This is asking for trouble - every service should have regular maintenance windows and availability targets.
  • Brent lacks empathy for Ann, who is a financial analyst, not a technologist. He laughs at her attempt to read the corrupted database fields despite her revelation that a single corrupted column is a key piece of evidence. Don't be a jerk - not everyone in an engineer, nor should they be expected to be. Brent probably wouldn't do a very good job at FinOps.
  • Talking through observed behavior of the SAN post-upgrade could have prevented an outage caused by rollback. The issue was unlikely upgrade-related as only a single DB column was corrupted. In other words, don't jump to conclusions.
  • InfoSec isn't invited to IT or product meetings because they're seen as impeding progress. Lack of inclusion is used by John as justification for subverting the change management process and rolling out DB encryption on the sly, which was the root cause of a payroll outage. Agree on a change management process that works and isn't onerous. If the change process is overly painful it will be resisted: "It takes twenty minutes to fill out all those fields for a five minute change!"
  • Development and Ops aren't collaborating on test environments and deployment as part of project delivery, causing significant delays to the Dev team and rework on the part of Ops. Hash out testing and delivery as a function of Dev, Ops, and QA if applicable, and include requirements in the product specification.
  • IT was invited to Chris's product architecture and planning meetings but no one from IT attended. This resulted in operational requirements not being factored into delivery commitments and a sense of resentment felt by Chris when Bill accuses his team of throwing things over the wall.
  • Brent is not attending product meetings because he's working outages and escalations. His expertise adds more value to the product by working directly with development to address operational requirements in the product itself. Bill tells Wes to pull Brent off of any fire fighting and refer anyone who has a problem with that to him. Note that Bill both takes charge and responsibility for his edicts throughout.
  • "There's a chain of command: gripes go up, not down." Never complain about colleagues to reports.

Chapters 6-10

  • Include Testing and Operational Requirements when estimating features.
  • Summarize key insights of data when forwarding it to teammates
  • Better to have situational awareness even if grim than not
  • Interrupts will kill your long-term planning. Eliminate them if possible.
  • Don't be afraid to get your hands dirty. Erik is consulting for the board and is unpacking donuts. He may've even brought them.
  • Better to set up self-service processes than be in the critical path of every request
  • Erik introduces us to the four types of work
    • Business project work
    • IT project work
    • Unplanned work (interrupts)
    • Changes
      • A change
  • Visualize work entering your team or else you can't manage it
    • Visualize WIP to limit it
  • The manufacturing facility is immaculate, obviously seen as a profit center for the company
  • Additional reading
    • Theory of Constraints
    • Lean production
    • Toyota Production System
    • Total Quality Management
  • Release work based on speed of the bottleneck only
    • Improvements made elsewhere are an illusion - they don't speed up the system
@vvchauit
Copy link

continue with chapter 11-35 right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment