sleepyfox/2008-07-03-demon-ex-machina.md

## 2008-07-03-demon-ex-machina.md

      
    Raw
  

              2008-07-03-demon-ex-machina.md
            
          
    author: @sleepyfox
title: Demon ex machina
date: 03-Jul-2008

Demon ex Machina


Why are speed cameras like software metrics? Speed cameras are (or so we are told in the UK) a 'Road Safety initiative' that is designed to reduce the number of fatalities and serious injuries in road traffic accidents. A software metric is measurement of a software system's properties that is used (or so we are told) to increase quality and decrease cost and risk from software development projects.
So what do Safety cameras, or 'Scameras' and software metrics have in common? Both use measures that are easy to determine, but that extensive research tells us are not highly correlated to the programmes' stated objectives. If you're interested in the research and background of why so-called 'safety cameras' don't actually improve road safety, see SafeSpeed.
Context (tangentially): At the Google Open Source Code Jam last month I had the pleasure to present; the theme of the event was 'Productivity'.
There were many interesting and thoughtful presentations on technical matters e.g. Continuous Integration and Domain Specific Languages cropped up several times. Predictably there was a lot of talk about the 'what' and 'how', but little of the higher knowledge levels of 'when' and even 'why'. This is not unexpected in a profession that although values abstraction and cognition, always hits the top 5 in evaluations of levels of workplace stress.
I thought it would be interesting to look at what exactly we mean when we use the term 'productivity' in the context of knowledge workers. Productivity is well defined for other industries e.g. manufacturing - 'units of output per unit of input', or 'widgets per labour hour'. The problem comes when we try and apply the same reasoning to knowledge workers, what exactly constitutes 'output'?
Much research has been done around the LOC measure and how it relates to system complexity, cost and reliability. Unsurprisingly, this research tends to be littered with caveats and fudge factors, if you want to see for yourself take a look at COCOMO, or SMERFS. My own investigation into the research on this topic whilst I was at Symbian (working to improve their code reliability metrics system) is that the only thing that the research does prove is the positive correlation between LOC added/changed and the defect injection rate, which is hardly an earth-shaking revelation; more code = more defects.
The chief complaint when using LOC as a measure of output is that the measure is one of volume of output, and is not representative either of a) effort or b) of value. Let me explain: Let's say Gary Guru has a complex problem to solve, and spends hours thinking of the solution before writing a small, elegant module with 1000 LOC. Now compare this with Archie Average, who only superficially analyses the problem and spends all his time coding a suite of inter-linked modules that solve the problem in 10,000 LOC.
Both have accumulated the same T&M cost (assuming the same labour rates). If we were to use LOC as a measure then Archie would seem to be an order of magnitude more productive than Gary. However, as we know LOC is positively correlated with defect injection rate, we know that we will spend at least ten times as much fixing Archie's code, and the TCO will be much higher after release as it will take considerably longer to diagnose and fix problems in Archie's rambling code-base than Gary's.
It gets worse: as soon as the knowledge leaks out that developers know that they are being evaluated on the measure; now they will attempt to game the measure. If a goal of the organisation is to encourage reuse (and what organisation wouldn't want to encourage what has become the 'holy grail' of software development?) then the implementation of LOC as a productivity measure positively discourages this, as every time I reuse someone else's code I appear less productive!
My own experience over my two decades of software development experience mirrors that of Douglas Hoffman, who wrote the excellent paper: 'The Darker Side of Metrics' (Hoffman, 2000)¹
He eloquently states that whenever you introduce software metrics to an organisation, there is a hidden 'dark side' effect based on how human beings will game with the metric to their own perceived advantage. Hoffman notes that if we predict that testing will find 100 bugs in the next testing period, this will become a self-fulfilling prophecy. If there are less than 100 bugs found then it will appear that the testing team are not working hard enough, and if much more than 100 bugs are found then it will appear that the development team are producing shoddy work, and as the testing and development teams work closely together in most organisations this would be bad for the interpersonal relationships of the teams. Consciously or sub-consciously the 'bugs found' metric will converge on the prediction, bugs will be split or rolled-up arbitrarily, reclassified as 'working as designed' or 'not to be fixed in this release' or any one of a number of other strategies to allow the prediction to be correct and achieve the 'win-win' situation.
Hoffman notes that the research seems to show that the only thing that the 'defects found per unit time' metric actually positively correlates to is 'testing effort'... (surprise surprise)
If you want to see a wonderful discourse on LOC as a measure and basis for productivity metrics in software development then you have only to look at Steve McConnell's essay "Measuring Productivity of Individual Programmers"².
This was a piece that I found in my research for improvement of the 'CodeChurn' system at Symbian, and was highly topical as I had just come from GE, an early adopter and keen proponent of Six Sigma. The more experienced and erudite of the contributors voiced an opinion that mirrored my experience at GE, that whilst Six Sigma can and does do wonders for manufacturing companies, it's employment in software development organisations is problematical and anything but straight-forward.
With this in mind, let's return to the theme: 'measuring the output of knowledge workers'. We could posit that some sort of 'function point' or 'Use Case' based scheme of measurement would surely be better, the difficulty with this approach is that we need to normalise for complexity and granularity, as one 'function' is not necessarily as complex (and thus easy to implement) as another.
I remember asking my tutor during my post-graduate year doing Comp-Sci during a tutorial 'how do you measure system (rather than algorithmic) complexity?' and his answer: 'in inches'. Seeing my blank expression he explained that you took a ruler and measured the length of shelf space that the printed documentation took up.
It turns out that we cannot even measure (the much simpler) Algorithmic Complexity accurately, for anything above a trivial limit to system size, as the following paper 'Large Limits to Software Estimation' (J.P. Lewis, 2001)³
Lewis uses an analogue to Godel's incompleteness theorem (Chaitin's incompleteness theorem) for Algorithmic Complexity to prove that it is impossible to objectively measure system complexity. This, at one stroke, makes a mockery of all those management methodology and software methodology zealots whose rationale for why their dogma does not produce verifiable, repeatable results is that 'the process was not followed diligently enough' or 'the measures were not made accurately enough' - the snake oil that they are pushing is simply incapable of delivering what it promises. The old adage of "Plan the work and work the plan and the plan will work" is simply not applicable to software development.
So the next time someone tries to foist their plan-driven dogma on you, you have some ammunition to counter them with! Hopefully we can move the industry away from the 'target-driven' mentality one project at a time.
References

Footnotes


"The Darker Side of Metrics", Douglas Hoffman, Pacific Northwest Software Quality Conference, 2000 ↩


"Measuring Productivity of Individual Programmers", Steve McConnell, Construx, Inc., 2008 ↩


"Large Limits to Software Estimation", J.P. Lewis, Stanford University Press, 2001 ↩