Skip to content

Instantly share code, notes, and snippets.

@ciacci1234
Last active August 4, 2022 19:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ciacci1234/c59441bd79a80ae2acdffedc5b13c925 to your computer and use it in GitHub Desktop.
Save ciacci1234/c59441bd79a80ae2acdffedc5b13c925 to your computer and use it in GitHub Desktop.
Kill It With Fire by Marianne Bellotti: Takeaways and Excerpts

The following is a series of excerpts from a book by Marianne Bellotti on how to revive and maintain a complex system of software that has been in operational use over a period of time, and has accrued a significant amount of technical and operational debt. Each header before the excerpt is my own or a section title from the book.

Preface "We build our computer systems the way we build our cities: over time, without a plan, on top of ruins." - Ellen Ullman

Introduction

Restoring legacy systems to operational excellence is ultimately about resuscitating an interative development process so that the systems are being maintained and evolving as time goes on...there is little downside to maintaining all systems as if they are legacy systems. It is easy to build things, but it is difficult to rethink them once they are in place. Legacy modernizations are not hard because they are technically hard - the problems and solutions are usually well understood - it's the people side of the modernization that is hard. Getting the time and resources to actually implement the change, building an appetite for change to happen and keeping that momentum, managing the intra-organizational communication necessary to move a system that any number of other systems connect to or rely upon - those things are hard.

pg.15 - Running the risk of cargo-culting

Sometimes it is difficult to compare your use case to the use case of other seemingly similar organizations. The biggest offender on this front is the commercial cloud, precisely because it adds value to such a broad set of use cases...whether Big Data as a Service saves you any money depends on how long it takes it to get that big in the first place. Having petabytes of data collected over a five-year period is a different situation from having petabytes generated over the course of a few hours.

pg.31 - How tech spreads

Overall, interfaces and ideas spread through networks of people, not based on merits or success. Exposure to a given configuration creates the perception that it's easier and more intuitive, causing it to be passed down to more generations of technology. The lesson to learn here is the systems that feel familiar to people always provide more value than the systems that have structural elegances but run contrary to expectations.

pg.44 - Divide and Conquer

Large problems are always tackled by breaking them down into smaller problems. Solve enough small problems, and eventually the large problem collapses and can be resolved.

pg.46 - On Large Systems Experiencing Stability Issues

(Emphasis mine) On the other hand, some legacy systems perform their core functions within the parameters the organization needs to be successful, but they are unstable. They are not too slow; they produce the correct result and within the resources the organizations has available for the task, but there are frequent "surprises," such as outages with bizarre black-swan style root causes or routine upgrades that somes go very poorly. Ongoing development work is stopped because unforeseen technical conflicts popup and need to be resolved. In 1983, Charles Perrow coined the term normal accidents to describe systems that were so prone to failure, no amount of safety procedures could eliminate accidents entirely. According to Perrow, normal accidents are not the product of bad technology or incompetent staff. Systems that experience normal accidents display two important characteristics...They are tightly coupled...[and] They are complex

pg.72-73 - Principles of Modernization Projects

Expectation management is really important. Typically organizations...misjudge how long modernization projects take, and they misjudge how much time they can save and how to save it. Modernization projects have better outcomes...with the following guidelines:

  • Keep it simple.
  • Spend some time trying to recover context [in the legacy system].
  • Tools and automation should supplement human effort, not replace it.

pg.77 - The importance of metrics and autonomy

(Emphasis Mine) Legacy modernization projects go better when the individuals contributing to them feel comfortable being autonomous and when they can adapt to challenge and surprises as they present themselves because they understand what the priorities are. The more decisions need to go up to a senior group - be that VPs, enterprise architects, or a CEO - the more delays and bottlenecks appear. The more momentum is lost, and people stop believing success is possible. When people stop believing success is possible, they stop bringing their best work. Measureable problems empower team members to make decisions. Everyone has agreed that metric X needs to be better; any actions taken to improve metric X need not be run up the chain of command.

pg.101 - 108 - Mess: Fixing Things That Are Not Broken

Bellotti discusses how small teams inevitably build monoliths and why monoliths work. She also discusses considerations to keep in mind when a team is debating moving away from a monolith.

pg.159 - Takeaways section in Design As Destiny Chapter

  • Design is problem setting. Incorporating it into your process will help your teams become more resilient.
  • By themselves, technical conversations tend to incentivize people to maintain status by criticizing ideas. Design can help mitigate those effects by giving conversations the structure of a game and a path to winning.
  • Legacy modernizations are ultimately transitions and require leaders with high tolerance for ambiguity.
  • Conway's law doesn't mean you should design your organization to look like the technology you want. It means you should pay attention to how the organization structure incentivizes people to behave. These forces will determine what the technology looks like.
  • Don't design the organization; let the organization design itself by choosing a structure that facilitates the communication teams will need to get the job done.

pg.191 - 194 - Working Groups vs Committees

A discussion on groups formed that enable work to get done vs groups formed that prevent work from getting done

pg.207 - 209 - Building Something Wrong the Correct Way

An emphasis on building simple first and avoiding building for scale before you actually need that scale. Bellotti also has some rules of thumbs on resources required to implement, maintain, and monitor a service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment