The citations game: Pressman Ratios
Rubric: Software Engineering : Factual Claims : Defect Cost Increase : Pressman Ratios
See previous note on the IBM Systems Sciences Institute
A book, "Software Engineering: A Practitioner's Approach", contains the citation we are interested in:
[IBM81] "Implementing Software Inspections," course notes, IBM Systems Sciences Institute, IBM Corporation, 1981
The author is Roger S. Pressman, "an internationally recognized consultant and author in software engineering" with long experience in the field - "for over 30 years, he has worked as a software engineer, a manager, a professor, an author, and a consultant, focusing on software engineering issues". (Pressman)
The book is a best-seller in the field, with over 3 million copies sold (QAI) of eight editions so far. The first edition was published in 1982.
One question I have not been able to settle is which edition of the book first introduced this citation. I've seen PDF copies of the 5th and 8th edition which have it. A paper from 1995 contains an indirect citation via the 3d edition (Megen).
Given the proximity in dates I'm assuming it was there from the first edition, but I could be wrong.
In a 2003 article, "Why the Future Belongs to the Quants" (Quants), we find the following form:
IBM Systems Sciences Inst., Implementing Software Inspections, monograph, IBM, 1981.
A monograph is "a specialist work of writing on a single subject or an aspect of a subject, often by a single author, and usually on a scholarly subject" (Wikipedia). (This can be said of many things - the present document is a monograph.)
As discussed in the previous notes, my interpretation of "course notes" is that Pressman was attended internal corporate training at IBM, where an instructor presented material, possibly from some other source, containing the claimed ratios. Neither the instructor nor Pressman were implying in any way that the study was done at IBM.
The above form of the citation, however, does claim this implicitly.
Also, and unlike Mengen's 1995 article, Geer et al.'s 2003 article does not credit Pressman at all.
I can come up with three main hypotheses:
- one of the authors had read Pressman, and chose both to alter the citation and not to credit Pressman;
- one the authors had independently attended the IBM course, and chose to report it this way;
- one of the authors had read the original IBM source material, and correctly cited it as a "monograph".
The first two would be cases of (mild) scholarly misconduct in my eyes. The last strikes me as implausible given all other evidence on the topic.
The "monograph" form of the citation turns up in one other place: "Security Metrics: Replacing Fear, Uncertainty, and Doubt", a 2007 book authored by Andrew Jaquith, who is one of the "et al." in Geer et al.
An article by Mukesh Soni published in 2006 on iSixSigma.com, "Defect Prevention: Reducing Costs and Enhancing Quality", refers to the claim in the text rather than by providing a formal citation:
The Systems Sciences Institute at IBM has reported that the cost to fix an error found after product release was four to five times as much as one uncovered during design, and up to 100 times more than one identified in the maintenance phase. (Figure 1)
Dating this article is harder than it should be, as it's an informal Web publication rather than a scholarly article; at its current URL it remains undated (Prevention). For example, the book "Agile Testing" by Crispin and Gregory cites this article but incorrectly dates it to 2008. But the Internet Archive has a copy of the article dating back to 2006 (Archive).
Here again, Soni does not reference Pressman, and what's worse, uses an active verb to imply that the data comes directly from IBM.
The quote above accompanies the following picture:
I want to to pause at this point and consider how much sense the above sentence makes.
- discard the bit about IBM
- let X be "the cost to fix an error uncovered during design"
- let Y be "the cost to fix an error found after product release"
- let Z be "the cost to fix an error identified in the maintenance phase"
Then the above is saying:
- X equals 1
- Y equals X times 4 to 5
- Y equals Z times up to 100
So far so good, but where are these numbers on the accompanying chart? X, the cost at design, is easy to find; it's the first bar. Z, the cost of the maintenance phase, is the last bar. The chart says that Z is equal to 100 times X, but that's not what the text says.
There is no bar labelled "after product release". There is no bar that has a value 4 to 5 times another bar. There is no bar which has a value greater than that associated with "maintenance". The main topic of the sentence is Y, and that is nowhere to be found in the chart.
In short, the above quoted sentence makes zero sense when we consider the picture it's supposed to describe.
One of my favorite techniques consists of taking a substantial portion of a sentence and Googling for exact matches. This helps a lot in turning up instances of plagiarism.
Take the phrase, "and up to 100 times more than one identified in the maintenance". Any occurrence on the Web is likely to be either a quotation of Soni's article, or (if Soni is not credited) a copy-paste job plagiarizing the article.
As of this writing, Google reports about 130 matches on the Web (Web), 2 in published books (Books), and 9 in scholarly articles (Scholar).
The latter is particularly concerning. Academics are copying and pasting a citation incorrectly attributing data to a study by a part of IBM, and at the same time uncritically copying and pasting a nonsensical sentence lifted out of a blog post. It finds its way for instance into a published thesis (Thesis).
The extent of the plagiarism varies, for instance entire paragraphs of Soni's article are lifted in one conference paper (Plagiarized).
Critical and uncritical
To me, the uncritical stance implied by copying something that doesn't make sense is strong evidence that the people suggesting that IBM "has reported" whatever they say, not only haven't read one word from IBM to that effect, but probably haven't read Pressman either, and it's not a big stretch to infer that they haven't read any serious content on software engineering either. (Because if they had, they would recognize for instance Boehm's work on the same topic as more credibly sourced, and cite that instead.)
Others seem to at least take one minute to look at the chart and rephrase the text into something that makes more sense, here's one recent example (Recent) picked at random from the Web:
Also, the Systems Science Institute at IBM reports that bugs found during the testing phase could be up to 15 times more costly than when found during the design phase.
The attribution remains but at least what is described matches the text. That's perhaps one glimmer of hope in the whole mess.
Still, if you read any mention of the Pressman ratios, or of the IBM Systems Science Institute, you would be doing the author a favor by notifying them that they're only discrediting themselves by using them.