Skip to content

Instantly share code, notes, and snippets.

@rxgpt
rxgpt / phoenixproject.md
Created June 16, 2024 20:41
The Phoenix Project summary

On my first week at AppDynamics, my manager gave me a copy of The Phoenix Project to help me understand our customers. The book tells the story of a fictional company, Parts Unlimited, that’s struggling financially and losing business to its competitors who adapted to the trend of business moving online by building great online retail experiences.

  • Bill, the newly-promoted VP of IT operations, has 3 months to transform the IT organization otherwise the entire IT department will be outsourced.
  • Bill meets and talks regularly with Eric, a board member who gradually helps him realize what manufacturing plant work can teach him about IT operations.
  • Bill gradually came to understand the analogy between IT work and manufacturing work, and made dramatic improvements in the reliability, efficiency, uptime, and deployment speed of the IT department.
  • The rest of the company eventually comes to realize the importance of IT to the entire business, and it becomes financially successful again.

Problems in Parts

@rxgpt
rxgpt / nodejsinternals.md
Last active December 30, 2019 21:14
The inner workings of Node.js
@rxgpt
rxgpt / nodejsperfproblems.md
Last active July 15, 2022 00:51
Sources of latency in Node.js, and how to diagnose them

I used to be responsible for several language agents at AppDynamics, including the Node.js agent. Here's what I learned (caveat: these notes came together in 2016-2017, and the Node world moves very quickly! The community is always adding new sources of telemetry and higher-level abstractions that make it easier to write good Node code.):

Diagnosing common performance problems in Node.js services

"Context" means the user request, route handler, middleware, 3rd-party module, helper, or callback where the problem is. (Many of the functions involved in a user request are anonymous, which means that further narrowing down context is a challenge).

The "best" signal for a particular problem is the one that detects the problem and rules out the most other possibilities (e.g., event loop max tick length can indicate several different problems, and there are better signals that would more quickly narrow down these specific problems).

@rxgpt
rxgpt / distsoftproblems.md
Last active August 18, 2019 17:15
The causes of performance problems in distributed software systems

The causes of performance problems in distributed software systems

A performance problem has three parts:

  1. The event that introduces the problem (e.g., application configuration change)
  2. The symptoms of the problem (e.g., CPU usage spike)
  3. The cause (e.g., logging was left in DEBUG mode)

This page focuses on the causes of performance problems.