Morendil/citations-game-wolverton-ratios.md

## citations-game-wolverton-ratios.md

      
    Raw
  

              citations-game-wolverton-ratios.md
            
          
    The citations game: Wolverton Ratios

Rubric: Software Engineering : Factual Claims : Defect Cost Increase : Wolverton Ratios
Context

See previous note on the IBM Systems Sciences Institute
In absolute numbers, the Wolverton are as follows: 139:455:977:7136:14102, claimed dollar costs of fixing an "average" defect. (Itself an absurd claim, see Leprechauns, I should perhaps write more on that.)
Normalizing to "if it costs one unit to fix at the requirements stage", these work out to 1:3:7:50:100 (requirements, design, coding, testing, maintenance)
It pops up in many books and articles and in various forms, for instance a very 1990s-looking Excel 3D bar chart.
The Big Puzzle is that a bunch of later article and books attribute these ratios to a paper by Boehm and Basili "Software Defect Reduction Top 10 List" which, it is easy to verify, does not contain these numbers. (It's a whole two pages long.)
Ergo, these later authors who are citing Boehm and Basili actually HAVE NOT READ that paper are have just copied and pasted a citation which flattered their existing biases.
What is investigated here is "what exactly happened", a forensic investigation. The crime is how little attention we are paying, as a profession, to the question "what process of empirical investigation generated the data we are looking at, and how reliable was that process".
Listing the works chronologically kind of spoils it as storytelling, since the investigation actually happened in reverse: coming across the claim in relatively recent articles, asking "where did this came from" - and asking it again, and again, and again. This document is a reference, not a write-up.
PDFs of "Top 10 List" paper:
https://www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.78.pdf
http://www.cs.cmu.edu/afs/cs/academic/class/17654-f01/www/refs/BB.pdf
Timeline

1977, Wolverton, RW

"Tutorial, Quantitative Management: Software Cost Estimating"
This was a tutorial given at the inaugural COMPSAC conference. A valued friend with access to a university library holding a copy of the book was kind enough to send me a photo of the relevant page:

Apparently the origin of the "data" is some software portion of the Safeguard anti-missile program:
https://en.wikipedia.org/wiki/Safeguard_Program
The text credits a "W. E. Stephenson" as having collected data on Safeguard, but the chart itself cites a "R. O. Lewis" as the source of the error cost data specifically. These should therefore more properly be called the "Lewis ratios" - Wolverton's name was the one I found in an initial and slightly less tenacious investigation.
We know this is the source because it next appears cited by:
1981, Radatz, Jane

https://apps.dtic.mil/dtic/tr/fulltext/u2/a104249.pdf
"Analysis of IV&V data"

Emphasizing the importance of early detection, Wolverton (Reference 30) cites figures stating that a design change costs, on the average, $977 to correct during code and checkout and $7136 during test and integration (1975 figures).

p87: $195:$489:$977:$7136 or 1:2.5:5:36.5
p89 extrapolates this to add a phase @ $14655
Exact same method of calculating a ROI for investment in IV&V that we'll see later on in NASA docs (which I've blasted as being "Flaubert math", cf https://www.lesswrong.com/posts/tggnLEXxrTDWQwDL3/rocket-science-and-big-money-a-cautionary-tale-of-math-gone)
1981, Boehm

The name "Safeguard" appears (without citation or explanation, it seems) in Boehm's "Software Engineering Economics", providing some of the data points on this chart:

1992, R.O. Lewis

The Safeguard data and some of the process for collecting it is described (over 15 years later!) in Robert O. Lewis' book "Independent Verification and Validation: A Life Cycle Engineering Process for Quality Software"

2003, Cigital

http://web.archive.org/web/20030404212123/http://www.cigital.com/solutions/roi-cs2.php
Case Study: Finding Defects Earlier Yields Enormous Savings
Seems to be "patient 0" for the numbers and the Excel chart and attributing the costs to Boehm & Basili (or possibly, Capers Jones, or Humphrey)
The attribution to Basili & Boehm is obviously bogus.
The attribution to Jones is bogus when you know the slightest thing about Jones, in particular that he's virulently opposed to cost per defect metrics.
The attribution to Humphrey is quickly disproven by looking inside the book (which admittedly people would have needed to buy the book for, today I can use Google Books).
The numbers are off from the Radatz numbers but clearly no coincidence, I suspect they were fudged a bit to avoid the appearance of "round" ratios
2004, Drabick, Roger

"Best Practices for the Formal Software Testing Process"
https://books.google.fr/books?id=bVcUAAAAQBAJ&printsec=frontcover&dq=Best+Practices+for+the+Formal+Software+Testing+Process&hl=fr&sa=X&ved=0ahUKEwi6-93ChqPkAhUNYxoKHYTJDxIQ6AEIKDAA#v=onepage&q=defect%20cost&f=false
Page 21/22: "data from a study by J.W. Radatz", this is how I found Radatz in the first place
The numbers are a bit off, 194:489:997:7136, possibly honest transcription errors
2004, Gage et al.

"We did nothing wrong"
https://simson.net/ref/2006/csci_e-180/ref/Baseline0304-DissectionNEW.pdf
Influential article? Redrawn version of the Cigital chart
2006, Wagner, Stefan

https://elib.uni-stuttgart.de/bitstream/11682/8977/1/main.pdf
A negative result: comprehensive survey of cost factors in the literature, no mention of the Wolverton ratios
2006-12-07, Everett, Gerald D.

Software Testing: Testing Across the Entire Software Development Life Cycle
https://books.google.fr/books?id=z8UdPmvkBHEC&pg=PA14&lpg=PA14&dq=Basili+and+Boehms+industry+average+cost+to+correct+defects&source=bl&ots=2W4uO1zVKZ&sig=ACfU3U3Z3JUm5TBkrtgNvo-JvZSN2XKRww&hl=fr&sa=X&ved=2ahUKEwiV---k-J3kAhXRz4UKHTJdAd0Q6AEwCnoECAgQAQ#v=onepage&q=455%20977&f=false
p14 - ugly curve "The numbers first published in 1996 were revalidated in 2001"
2008-06, Everett, Gerald D.

https://www.istqb.org/images/Articles/everett_the%20Value%20of%20Software%20Testing%20to%20Business.pdf
No chart but a table of the "cost factors"; same Everett as the 2006 book
2008-08-01, Dallas, Andrew

https://www.mddionline.com/adopting-static-analysis-tools
References Capers Jones "Software Assessments, Benchmarks, and Best Practices", Humphrey, "Introduction to the Personal Software Process" in addition to the usual Top 10 - a semi-honest, "covering all bases" way of citing Cigital indirectly
2008-04, Strickler, John

https://agileelements.wordpress.com/2008/04/22/cost-of-software-defects/
Blog, crediting Capers Jones "Software Assessments, Benchmarks, and Best Practices"
2008-10, Golze, Andreas

http://2008.secrus.org/en/-pageid=4548&submissionid=5480.htm
http://2008.secrus.org/en/etc/secr2008_andreas_golze_professional_testing.ppt
"Reduce Project risk through early defect detection", conference presentation
Excel-style chart
2009, KPMG

https://info.kpmg.us/content/dam/institutes/en/government/pdfs/2009/gov-it-projects-need-qa-iv-v.pdf
More modern-looking chart
2010, Pressman, R.

https://books.google.fr/books?hl=fr&id=y4k_AQAAIAAJ&dq=software+engineering+a+practitioner%27s+approach&focus=searchwithinvolume&q=977

Figure 14.2, based on data collected by Boehm and Basili [Boe01b] and illustrated by Cigital Inc. [Cig07], illustrates this phenomenon. The industry average cost to correct a defect during code generation is approximately $977 per error.

This is the Pressman of the "Pressman ratios", now in its seventh edition. Boe01b is the "Top 10" article.
It is baffling that the editorial process for possibly the foremost book in the field let this through for the 7th edition. It is apparently gone from the 8th edition, without a retraction as far as I can tell.
2011, Shamieh, Cathleen

ftp://ftp.software.ibm.com/software/sk/pdf/SystemsEngineeringforDummies.pdf
p 49. "for dummies" means we round them out…
2011-06, Akella et al.

https://www.researchgate.net/publication/228983772_Effective_Independent_Quality_Assessment_using_IVV#pf8
2012-07, Typemock

https://www.typemock.com/wp-content/uploads/2012/07/Infographic-The-Severity-of-Bugs-Are-We-Doomed.pdf
This is notable for corrupting the $977 into $937. So if someone is quoting "$937 during coding" at you, they're most likely referencing this Typemock infographic.
2012-05-12, Oehl, Catherine

https://slideplayer.com/slide/1526223/
slide 11, the Excel-style chart
2015-02, Al-lawatiya et al.

https://pdfs.semanticscholar.org/8b1f/4f33d8a6c39489a47a58f305fdbe25e1a14b.pdf
2016, Menzies et al.

https://arxiv.org/pdf/1609.04886.pdf
"Are Delayed Issues Harder to Resolve?"
Solid negative result, still largely ignored (*)

We found no evidence for the delayed issue effect; i.e. the effort to resolve issues in a later phase was not consistently or substantially greater than when issues were resolved soon after their introduction.

(*) detailed citation analysis needed, but early results not hopeful, see below
2017-06-08, Routh, Jim

https://www.darkreading.com/perimeter/the-economics-of-software-security-what-car-makers-can-teach-enterprises-/a/d-id/1329083
"Source: me."
2017-07-14, Ivers, Jim

https://www.securityweek.com/how-reduce-risk-while-saving-cost-resolving-security-defects
Haha. Jim 2 quotes Jim 1 and adds "If anyone is credible on this, he is… we didn't have empirical data, but now we do."
2017-11-3, McMorrow, Dermot

https://slcontrols.com/justify-early-extra-investment-reduce-late-budget-overruns/
Excel style chart
2017-11-23, Madou, Matias

https://www.owasp.org/images/0/07/OWASP_BeNeLux-Day_2017_how_to_spend_%243.6_mil_on_one_coding_mistake_by_Matias_Madou.pdf
"Actual	data from Routh, Aetna"
2018-01, Hovorushchenko, Tetiana

https://www.researchgate.net/publication/326252812_Methodology_of_Evaluating_the_Sufficiency_of_Information_for_Software_Quality_Assessment_According_to_ISO_25010
Cites the "for dummies" book
2018, Potencier, F.

https://blackfire.io/docs/book/01-introduction
PHP Code Performance Explained (book)
Classic example of quoting the Typemock $937 figure but attributing it to Boehm and Basili.
2019, Agrawal A.

https://repository.lib.ncsu.edu/bitstream/handle/1840.20/36633/etd.pdf?sequence=1
On the Nature of Software Engineering Data (Implications of ε-Dominance in Software Engineering)

It is also useful to be able to predict issue lifetime specifically when the issue is created, since it is found earlier that delaying to resolve issues can become harder and costlier [Men17].

So here we have a PhD student working under the direction of the author of the one negative result, claiming it as his source for the positive version! I despair.
2019 Marques Pereira, H

https://pdfs.semanticscholar.org/ac5c/a4a8daebd1b2a66f3b25976e96c969d7c83a.pdf?_ga=2.79443676.937395100.1566906669-2056108499.1565280346
"Automatização de testes para plataformas Oracle - Xstore"

Through Figure 1.1, it is possible to verify that the error correction cost has a growth
quite pronounced as a project progresses between the various phases. With this in mind,
effective and efficient quality control is essential from the earliest stages and
fundamentally in the phase before the start of production. [Translated from Portuguese by Google Translate]

Yet another. Feels awful to work in an industry where someone can disprove a result yet be cited a few years later as having proved it.
(Unreliable source: undated, no name)

https://shodhganga.inflibnet.ac.in/bitstream/10603/53250/10/10_chapter%201.pdf