Skip to content

Instantly share code, notes, and snippets.

@Morendil
Last active January 10, 2024 12:04
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Morendil/1872c57d9c3648bb9a2d9f1905ce6d83 to your computer and use it in GitHub Desktop.
Save Morendil/1872c57d9c3648bb9a2d9f1905ce6d83 to your computer and use it in GitHub Desktop.
Can we bury the NIST study once and for all now?

(N.B. This is a blog post I wrote on Google+ in 2014, which had since disappeared from the Web.)

Can we bury the NIST study once and for all now?

The NIST study concluded that "the impact of inadequate software testing infrastructure on the US economy was between 22.2 and 59.5 billion dollars".

As usual, people mention this figure as if it was an undisputed fact (for instance, you can find it on a couple Wikipedia pages). It's a good bet that they haven't read the original document carefully and critically. If they had, they might have noticed some red flags in the "study" and would at the very least hedge by emphasizing that it is an estimate.

There are two important aspects to any estimate: precision and accuracy.

Precision is the size of the error bars around the estimate. "Between $50Bn and $70Bn" isn't at all the same as "somewhere between a few hundred million and a few hundred billion, with sixty billion being our best guess". With a narrow spread, it's much easier to justify investing some proportionate amount of money in attempting to solve the problem. If your uncertainty is large, there's a greater risk you'll be wasting money.

Accuracy is about whether we even have a reason to believe that the estimate has landed anywhere near the "true" value. Are we overestimating? Underestimating? Giving an answer that doesn't have anything to do with the question being asked?

The NIST procedure, as I was able to reconstruct it, went something like this (I'm actually simplifying a bit):

  • ask survey respondents the question "how much did minor bugs cost you last year"
  • average this across all respondents
  • divide total expense by number of employees at the respondent to get a "cost of bugs per employee"
  • multiply the cost of bug per employee by total employment in that sector, based on BLS employment data

(Except that to extrapolate the results of their financial services survey, instead of employees, they scaled by "million dollars in transaction volume".)

Then they "normalized" all that again into a per employee cost for both the automotive and financial sectors... and scaled it all up again to the entire economy, again by multiplying by X million employees. Now, whatever one thinks of this procedure (I think the heterogeneous scaling factors are at best bizarre), it can't escape the laws of physics.

Specifically, that any measurement is subject to uncertainties, including the measurements from "number of employees". And these uncertainties add up as you add together estimates, or multiply one estimate by another.

To get a grip on the uncertainties involved, I tried to replicate the work of the NIST authors: that is, I tried to reproduce their derivation of the final estimate based on survey responses and estimates from BLS.

For instance, about half of NIST's total estimate can be accounted for by the costs directly incurred in paying developers and testers; the other half by the cost to end users as a consequence of software bugs. These are two distinct estimates that are added up to get the final answer. The sub-estimates are further subdivided into estimates for the automotive sector and for the financial services sector (the two sectors that were surveyed) and subdivided again into estimates for the costs from "major errors" and "minor errors" and other categories, and so on.

I eventually gave up because after a few steps, I just couldn't find any way to get their numbers to add up. (A link to the spreadsheet: https://docs.google.com/spreadsheets/d/19YU_FJO122cjc1sI27eQSMwc2ToeheLqDp0xVEelMP8; readers are more than welcome to copy, check, and improve upon my work.)

Though ultimately fruitless, insofar as I wasn't able to reproduce all the steps in the derivation of the final estimate, the exercise was worthwhile. I got quite familiar with their numbers, in the process of trying to understand their derivation. I learned new things.

For instance, the study breaks down costs incurred through bad testing into various categories, including major errors and minor errors.

Apparently, for "minor errors" and in the automotive sector, the average cost of one bug in that category was four million dollars.

(Yes, they seem to be claiming an average cost per bug of $4M. This from table 6-11. I'm actually hoping someone tells me I'm interpreting that wrong; it's such an embarrassingly absurd result.)

Also, whereas "major" errors cost 16 times as much as "minor errors" in small automotive companies, this reverses in large ones, with "minor errors" having a substantially higher cost than "major errors".

So someone who believes the $60Bn number would also have to believe some very counter-intuitive things - since these numbers are inputs to the overall estimate.

The alternative is to believe there are serious problems with the study. Which opens up the question of its accuracy. On that score, two major aspects in academic research tend to be sample size and methodology. NIST's research was survey-based.

How many people did NIST ask? Paragraph 6.2.2 informs us that "four developers completed substantial portions of the entire survey". Section 7 is a bit vaguer about how many people responded for the "developer" portion of the costs, but it looks as if the total sample size was less than 15, which seems like a direly inadequate basis on which to place half of the total estimate.

The surveys of end users seem to have had a more reasonable sample size: 179 respondents in the automotive sector and 98 in financial services. (However, it must be noted that the surveys had rather dismal response rates, 20% and 7% respectively.)

What did NIST ask? They asked for a few people's opinion of how much they spent on bugs when. The inputs to the model are quite literally educated guesses. One survey is about 40 questions long, and respondents were told that they could answer the survey in 25 minutes including time to research the data.

I would argue that most people have no idea how much bugs cost other than the "exponential rise" model, which largely predates 2002. If you have less than a minute to answer a question about how bugs cost, you're probably going to reach for the answer you remember from school or that you read in articles.

So, this "survey" about the cost of bugs would predictably be largely self-fulfilling. You get the numbers you expect to get. The numbers' connection with reality is tenuous at best.

If you are quoting the $60 billion estimates, you are basically endorsing:

  • odd findings such as a cost of $4M per minor error
  • the idea that minor errors may cost more than major ones
  • the statistical validity of unreasonably small sample sizes
  • most problematically, the validity of opinion over actual measurement

Think about this before spreading the NIST numbers any further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment