RFC: Measuring the Stability of a Software System
Proposition: How to Measure Software Stability
In an ideal world, a development team only works on adding features for the business. However, defects, regressions, and unplanned work interfere with the team's ability to deliver. The stability of a piece of software lies in its ability to provide functionally correct features to the business without introducing new problems.
Indicators that the software is not delivering the promised value are defects, regressions, and unplanned demands on the engineering team's time. These indicators can be thought of as "drag" on system productivity. Drag can be reduced by investing in internal projects and technical best practices such as Test Driven Development (TDD), Continuous Integration (CI), and Continuous Deployment (CD).
|Type of Work||Symbol||Notes|
|Features||f||This is the work that adds value to the business. Ideally, all of the engineering team's work can be found here.|
|Defects||d||Defects are features that do not work correctly in some range of scenarios.|
|Regressions||r||Regressions differ from defects in that these are features that functioned correctly at one time but now fail in some range of scenarios. The core difference between defects and regressions is that regressions are work that is being paid for again.|
|Unplanned Work||u||This is any demand made to the engineering team's time that does not directly relate to planned feature addition. This can include emergency analysis of production failures, ad-hoc reporting, or anything else that is not planned.|
|Drag||D||The cost of d + r + u. Defects, regressions, and unplanned work are not the only potential sources of drag--they are simply the most common in software engineering. The potential number of sources of drag are infinite. Some other examples include hand-offs, waiting from results from other teams, unclear requirements, ambiguous test output. The definition of "drag" should be adapted to fit the circumstances.|
|Internal Projects||i||Projects undertaken by the engineering team specifically for the purpose of fighting Drag|
|Overhead||O||Drag + Internal Projects. Some amount of overhead is always necessary so this number will likely never be zero. However, some kinds of overhead are destructive and we should work to minimize or eradicate them.|
|Total Work||T||Overhead + features|
|Instability||I||Drag / Total Work|
|Stability||S||The abiilty of a system to withstand change without breaking existing functionality--or, the estimated % of effort spent on the system that results in value-add instead of break-fix. Expressed as 1 - Instability.|
Given this terminology, we can represent software stability as a function of time spent on various product-related activities.
D = d + r + u O = D + i T = f + O S = 1 - (O / T)
Using this formula, a score of 1 represents a maximally stable system. A score of near-0 represents a maximally unstable system.
Example: Stable System
The team plans 100 points of work. There are no defects, regressions, or unplanned work to take them away from feature delivery, and no internal projects necessary.
D = 0 + 0 + 0 => 0 O = 0 + 0 => 0 T = f + O => 100 + 0 => 100 S = 1 - (0 / 100) => 1.00
Example: Unstable System
A team plans 157 points of work. There are 25 points of defects, 10 points of regression, 18 points of unplanned work, and no internal projects.
D = 25 + 10 + 18 => 53 O = 25 + 0 T = 157 S = 1 - (53 / 157) => .662
|Features||Defects||Unplanned Work||Regression||Drag||Internal Projects||Overhead||Total Work||Instability||Stability|
It is highly unlikely that any system will ever achieve an S=100 as any change implies some minimal amount of overhead. However, overhead exists in different categories. Some overhead facilitates change in the system (internal projects) while the rest brings rigidity to the system (defects, regressions, unplanned work). The goal is to minimize overhead and confine any necessary overhead to practices that enhance the stability of the system.
It is easiest to measure Stability in Scrum or Kanban teams who are wholly responsible for the products they manage end-to-end. In Scrum and Kanban, each task completed in the Scrum can be categorized as it's worked and tallied at the end of the sprint.
There are potential gaps that could make this measurement difficult to ascertain. For example, if production escalations are handled by an external escalation team it would be necessary to count their work against that of the engineering team's for the same time period. The metric follows the project--not org-chart divisions.
Drag on the system is a productivity killer. If left unchecked, drag will increase over time to the point that very little work can actually be done on the system. Drag can be reduced with comparatively little investment in internal projects.
Mathematically, an increase in i will result in a decrease in D over time that is far greater than the initial cost of i. However, increasing spending on i will have the short-term effect of reducing S instead of increasing it because of the initial time spent, and because it will require a reduction in the amount of spending on f. It's critically important to understand this and to communicate it so that no on is surprised.
It is also important that the investment in i not exceed that of f so that we can measure the impact of our investment in i on S.