Skip to content

Instantly share code, notes, and snippets.

@Morendil
Last active October 28, 2023 10:19
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Morendil/f9c2e9f3f450d3a76de8aeee7cf2bd00 to your computer and use it in GitHub Desktop.
Save Morendil/f9c2e9f3f450d3a76de8aeee7cf2bd00 to your computer and use it in GitHub Desktop.

The citations game: Wolverton Ratios

Rubric: Software Engineering : Factual Claims : Defect Cost Increase : Wolverton Ratios

Context

See previous note on the IBM Systems Sciences Institute

In absolute numbers, the Wolverton are as follows: 139:455:977:7136:14102, claimed dollar costs of fixing an "average" defect. (Itself an absurd claim, see Leprechauns, I should perhaps write more on that.)

Normalizing to "if it costs one unit to fix at the requirements stage", these work out to 1:3:7:50:100 (requirements, design, coding, testing, maintenance)

It pops up in many books and articles and in various forms, for instance a very 1990s-looking Excel 3D bar chart.

The Big Puzzle is that a bunch of later article and books attribute these ratios to a paper by Boehm and Basili "Software Defect Reduction Top 10 List" which, it is easy to verify, does not contain these numbers. (It's a whole two pages long.)

Ergo, these later authors who are citing Boehm and Basili actually HAVE NOT READ that paper are have just copied and pasted a citation which flattered their existing biases.

What is investigated here is "what exactly happened", a forensic investigation. The crime is how little attention we are paying, as a profession, to the question "what process of empirical investigation generated the data we are looking at, and how reliable was that process".

Listing the works chronologically kind of spoils it as storytelling, since the investigation actually happened in reverse: coming across the claim in relatively recent articles, asking "where did this came from" - and asking it again, and again, and again. This document is a reference, not a write-up.

PDFs of "Top 10 List" paper: https://www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.78.pdf http://www.cs.cmu.edu/afs/cs/academic/class/17654-f01/www/refs/BB.pdf

Timeline

1977, Wolverton, RW

"Tutorial, Quantitative Management: Software Cost Estimating"

This was a tutorial given at the inaugural COMPSAC conference. A valued friend with access to a university library holding a copy of the book was kind enough to send me a photo of the relevant page:

COMPSAC '77 proceedings, p236

Apparently the origin of the "data" is some software portion of the Safeguard anti-missile program: https://en.wikipedia.org/wiki/Safeguard_Program

The text credits a "W. E. Stephenson" as having collected data on Safeguard, but the chart itself cites a "R. O. Lewis" as the source of the error cost data specifically. These should therefore more properly be called the "Lewis ratios" - Wolverton's name was the one I found in an initial and slightly less tenacious investigation.

We know this is the source because it next appears cited by:

1981, Radatz, Jane

https://apps.dtic.mil/dtic/tr/fulltext/u2/a104249.pdf

"Analysis of IV&V data"

Emphasizing the importance of early detection, Wolverton (Reference 30) cites figures stating that a design change costs, on the average, $977 to correct during code and checkout and $7136 during test and integration (1975 figures).

p87: $195:$489:$977:$7136 or 1:2.5:5:36.5

p89 extrapolates this to add a phase @ $14655

Exact same method of calculating a ROI for investment in IV&V that we'll see later on in NASA docs (which I've blasted as being "Flaubert math", cf https://www.lesswrong.com/posts/tggnLEXxrTDWQwDL3/rocket-science-and-big-money-a-cautionary-tale-of-math-gone)

1981, Boehm

The name "Safeguard" appears (without citation or explanation, it seems) in Boehm's "Software Engineering Economics", providing some of the data points on this chart:

Boehm's chart

1992, R.O. Lewis

The Safeguard data and some of the process for collecting it is described (over 15 years later!) in Robert O. Lewis' book "Independent Verification and Validation: A Life Cycle Engineering Process for Quality Software"

Google Books Preview of p275

2003, Cigital

http://web.archive.org/web/20030404212123/http://www.cigital.com/solutions/roi-cs2.php

Case Study: Finding Defects Earlier Yields Enormous Savings

Seems to be "patient 0" for the numbers and the Excel chart and attributing the costs to Boehm & Basili (or possibly, Capers Jones, or Humphrey)

The attribution to Basili & Boehm is obviously bogus.

The attribution to Jones is bogus when you know the slightest thing about Jones, in particular that he's virulently opposed to cost per defect metrics.

The attribution to Humphrey is quickly disproven by looking inside the book (which admittedly people would have needed to buy the book for, today I can use Google Books).

The numbers are off from the Radatz numbers but clearly no coincidence, I suspect they were fudged a bit to avoid the appearance of "round" ratios

2004, Drabick, Roger

"Best Practices for the Formal Software Testing Process"

https://books.google.fr/books?id=bVcUAAAAQBAJ&printsec=frontcover&dq=Best+Practices+for+the+Formal+Software+Testing+Process&hl=fr&sa=X&ved=0ahUKEwi6-93ChqPkAhUNYxoKHYTJDxIQ6AEIKDAA#v=onepage&q=defect%20cost&f=false

Page 21/22: "data from a study by J.W. Radatz", this is how I found Radatz in the first place

The numbers are a bit off, 194:489:997:7136, possibly honest transcription errors

2004, Gage et al.

"We did nothing wrong"

https://simson.net/ref/2006/csci_e-180/ref/Baseline0304-DissectionNEW.pdf

Influential article? Redrawn version of the Cigital chart

2006, Wagner, Stefan

https://elib.uni-stuttgart.de/bitstream/11682/8977/1/main.pdf

A negative result: comprehensive survey of cost factors in the literature, no mention of the Wolverton ratios

2006-12-07, Everett, Gerald D.

Software Testing: Testing Across the Entire Software Development Life Cycle

https://books.google.fr/books?id=z8UdPmvkBHEC&pg=PA14&lpg=PA14&dq=Basili+and+Boehms+industry+average+cost+to+correct+defects&source=bl&ots=2W4uO1zVKZ&sig=ACfU3U3Z3JUm5TBkrtgNvo-JvZSN2XKRww&hl=fr&sa=X&ved=2ahUKEwiV---k-J3kAhXRz4UKHTJdAd0Q6AEwCnoECAgQAQ#v=onepage&q=455%20977&f=false

p14 - ugly curve "The numbers first published in 1996 were revalidated in 2001"

2008-06, Everett, Gerald D.

https://www.istqb.org/images/Articles/everett_the%20Value%20of%20Software%20Testing%20to%20Business.pdf

No chart but a table of the "cost factors"; same Everett as the 2006 book

2008-08-01, Dallas, Andrew

https://www.mddionline.com/adopting-static-analysis-tools

References Capers Jones "Software Assessments, Benchmarks, and Best Practices", Humphrey, "Introduction to the Personal Software Process" in addition to the usual Top 10 - a semi-honest, "covering all bases" way of citing Cigital indirectly

2008-04, Strickler, John

https://agileelements.wordpress.com/2008/04/22/cost-of-software-defects/

Blog, crediting Capers Jones "Software Assessments, Benchmarks, and Best Practices"

2008-10, Golze, Andreas

http://2008.secrus.org/en/-pageid=4548&submissionid=5480.htm http://2008.secrus.org/en/etc/secr2008_andreas_golze_professional_testing.ppt

"Reduce Project risk through early defect detection", conference presentation

Excel-style chart

2009, KPMG

https://info.kpmg.us/content/dam/institutes/en/government/pdfs/2009/gov-it-projects-need-qa-iv-v.pdf

More modern-looking chart

2010, Pressman, R.

https://books.google.fr/books?hl=fr&id=y4k_AQAAIAAJ&dq=software+engineering+a+practitioner%27s+approach&focus=searchwithinvolume&q=977

Figure 14.2, based on data collected by Boehm and Basili [Boe01b] and illustrated by Cigital Inc. [Cig07], illustrates this phenomenon. The industry average cost to correct a defect during code generation is approximately $977 per error.

This is the Pressman of the "Pressman ratios", now in its seventh edition. Boe01b is the "Top 10" article.

It is baffling that the editorial process for possibly the foremost book in the field let this through for the 7th edition. It is apparently gone from the 8th edition, without a retraction as far as I can tell.

2011, Shamieh, Cathleen

ftp://ftp.software.ibm.com/software/sk/pdf/SystemsEngineeringforDummies.pdf

p 49. "for dummies" means we round them out…

2011-06, Akella et al.

https://www.researchgate.net/publication/228983772_Effective_Independent_Quality_Assessment_using_IVV#pf8

2012-07, Typemock

https://www.typemock.com/wp-content/uploads/2012/07/Infographic-The-Severity-of-Bugs-Are-We-Doomed.pdf

This is notable for corrupting the $977 into $937. So if someone is quoting "$937 during coding" at you, they're most likely referencing this Typemock infographic.

2012-05-12, Oehl, Catherine

https://slideplayer.com/slide/1526223/

slide 11, the Excel-style chart

2015-02, Al-lawatiya et al.

https://pdfs.semanticscholar.org/8b1f/4f33d8a6c39489a47a58f305fdbe25e1a14b.pdf

2016, Menzies et al.

https://arxiv.org/pdf/1609.04886.pdf

"Are Delayed Issues Harder to Resolve?"

Solid negative result, still largely ignored (*)

We found no evidence for the delayed issue effect; i.e. the effort to resolve issues in a later phase was not consistently or substantially greater than when issues were resolved soon after their introduction.

(*) detailed citation analysis needed, but early results not hopeful, see below

2017-06-08, Routh, Jim

https://www.darkreading.com/perimeter/the-economics-of-software-security-what-car-makers-can-teach-enterprises-/a/d-id/1329083

"Source: me."

2017-07-14, Ivers, Jim

https://www.securityweek.com/how-reduce-risk-while-saving-cost-resolving-security-defects

Haha. Jim 2 quotes Jim 1 and adds "If anyone is credible on this, he is… we didn't have empirical data, but now we do."

2017-11-3, McMorrow, Dermot

https://slcontrols.com/justify-early-extra-investment-reduce-late-budget-overruns/

Excel style chart

2017-11-23, Madou, Matias

https://www.owasp.org/images/0/07/OWASP_BeNeLux-Day_2017_how_to_spend_%243.6_mil_on_one_coding_mistake_by_Matias_Madou.pdf

"Actual data from Routh, Aetna"

2018-01, Hovorushchenko, Tetiana

https://www.researchgate.net/publication/326252812_Methodology_of_Evaluating_the_Sufficiency_of_Information_for_Software_Quality_Assessment_According_to_ISO_25010

Cites the "for dummies" book

2018, Potencier, F.

https://blackfire.io/docs/book/01-introduction

PHP Code Performance Explained (book)

Classic example of quoting the Typemock $937 figure but attributing it to Boehm and Basili.

2019, Agrawal A.

https://repository.lib.ncsu.edu/bitstream/handle/1840.20/36633/etd.pdf?sequence=1

On the Nature of Software Engineering Data (Implications of ε-Dominance in Software Engineering)

It is also useful to be able to predict issue lifetime specifically when the issue is created, since it is found earlier that delaying to resolve issues can become harder and costlier [Men17].

So here we have a PhD student working under the direction of the author of the one negative result, claiming it as his source for the positive version! I despair.

2019 Marques Pereira, H

https://pdfs.semanticscholar.org/ac5c/a4a8daebd1b2a66f3b25976e96c969d7c83a.pdf?_ga=2.79443676.937395100.1566906669-2056108499.1565280346

"Automatização de testes para plataformas Oracle - Xstore"

Through Figure 1.1, it is possible to verify that the error correction cost has a growth quite pronounced as a project progresses between the various phases. With this in mind, effective and efficient quality control is essential from the earliest stages and fundamentally in the phase before the start of production. [Translated from Portuguese by Google Translate]

Yet another. Feels awful to work in an industry where someone can disprove a result yet be cited a few years later as having proved it.

(Unreliable source: undated, no name)

https://shodhganga.inflibnet.ac.in/bitstream/10603/53250/10/10_chapter%201.pdf

@digitalmacgyver
Copy link

digitalmacgyver commented Oct 26, 2023

Thank you for this writeup - it was a fun read. I thought you might be interested to hear about a similar oft-repeated software engineering figure which is nonsense when traced to it's roots.

Sometimes people will claim an exponential growing cost to fix defects based on the development stage where they are detected of:

  • Design: 1x
  • Coding: 5x
  • Integration Testing: 10x
  • Acceptance Testing: 15x
  • Post-release: 30x

The origin of these figures is table 5-1 on page 5-4 of the 2002 NIST Planning Report 02-3 "The Economic Impacts of Inadequate Infrastructure for Software Testing" which is an example table with completely made up data for the purposes of illustrating how later on in the report they will attribute costs.

This made up data was cited in a number of places, including in this paper from IBM / Rational Software which uses in tin this introduction as a source apparently without understanding that the numbers were not supposed to be indicative of reality.

It shows up frequently in various blog posts etc., often without citation: 1, 2, 3

@Morendil
Copy link
Author

@digitalmacgyver Yes, the NIST "study" is a train wreck. The above are not far from the Pressman ratios covered here and the comments had some discussion of the NIST document as well. I've also exhumed my own critique of the NIST report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment