virtualandy/estimatingsoftware.md

## estimatingsoftware.md

      
    Raw
  

              estimatingsoftware.md
            
          
    Why estimating software using LOC is bad and what to do about it

Recently, software engineers at Integrity Applications Incorporated were asked about software estimation efforts using source lines of code (SLOC). After much consternation, they've realized what everyone who has ever developed or even used software knows: this stuff is hard. This article tries to summarize why software estimation efforts are difficult and flawed by including a diverse set of references. We also take a look at alternative approaches to classic SLOC estimation algorithms in hopes that there is a better way to estimate effort on software projects.
Software is not a strict science

Despite degrees in computer science and labels of engineer for those who build software systems, a programmer may be closer to an artist or craftswoman.

Programming is much closer to a craft than a science or engineering discipline. It's a combination of skill and experience expressed through tools. The craftsman chooses specific tools (and sometimes makes their own) and learns to use them to create.


To my mind that's a craft. I think the best programmers are closer to watchmakers than bridge builders or physicists. Sure, it looks like it's science or engineering because of the application of logic and mathematics, but at its core it's taking tools in your hands (almost) and crafting something. -[John Graham-Cumming] some-things

A design document in software is not quite analagous to a blueprint produced by an architect. A chalkline for a buildings foundation rarely moves (California mudlides and earthquakes excepted). Customers and users of a building don't ask for new entrances or exists when the building is half-finished. And if they do, it ends up looking like the Winchester Mystery House.

The Winchester Mystery House near San Jose, California, which saw continual construction from 1884 to 1922. Sounds like Duke Nukem 3D. Credit: Library of Congress, Prints & Photographs Division, 571113 
Chaos Reigns

The world is chaotic. So goes software. Software requirements or features desired by users may change, disappear or spring out of nowhere as the project grows.

Even if you managed to find the mythical “optimal” solution within the cross domain constraints of a design problem, there is another reason that optimal design is impossible: things change. On the day your code is finished, your design stops changing, but the world keeps moving. Your competitors may release a new version, or go out of business. Your budget may be cut in half, or the size of your staff may double. New information may be made available to you, or you may find something than invalidates a key assumption you made. Any of the individual variables that you tried to design for can change at any time. And most important of all, your customer’s needs and desires change. -Scott Berkun

Or, as Jeff Gothelf succinctly described in a opinion piece on Harvard Business Review:

The problem with requirements is that they are often wrong.

If change is one of the few constants you can count on, then perhaps it is impossible to design a system before any code is written, or users give it a try. The Department of Defense (DoD) is starting to realize that designing a finished solution before a project is built may be fraught with error and a more agile approach is required.

Long development cycles and rapidly changing requirements make it difficult to properly identify the end state of an IT system at the onset of the project.


MITRE's Handbook for Implementing Agile in Department of Defense Information Technology Acquisition

It seems that it is hard to make rigid decisions (i.e. estimations) about software-related tasks.
What's My Line

The classic approach to estimating the effort involved in a software project is lines of code. On the surface, this makes sense - every software application has code, so why not measure and estimate based on how much code we have? Let's look at a list of reasons why this might be problematic in reality.

If a man month is a myth, then it is likely that code output over time is not equivalent across developers, either. Just ask IBM:


In four months Barnaby wrote 137,000 lines of bullet-proof assembly language code. Rubenstein later checked with some friends from IBM who calculated Barnaby’s output as 42-man years. -John Dvorak


Lines of code are different across languages, frameworks, or even coding styles (inversion of control versus functional, object oriented versus procedural, etc). This makes apples-to-apples comparisons difficult, or even invalid.


Java has around 1.7 times the LOC of Python from my example. - Stephan Schmidt


You may not know what language/framework works best (or even how to build something correctly) until you get started on building an application. This is the general idea behind prototypes, or tracer bullets, as described in The Pragmatic Programmer. Not knowing what or how to build something will obviously affect any estimates, which usually occur before any code is written.


Often, you can't even begin to accurately estimate how long something will take until you start doing it. -Jeff Atwood


As mentioned, SLOC seems like an easy way to measure output. But what happens when we have lots of output, or code?


In general, more code (or lines thereof) equals more bugs.  -Chad Perin, TechRepublic

So we must want less code, then, right? Unfortunately, short and compact lines of code may be tough to understand or to remember what they do (i.e. Perl):
sub p{map$s+=$n%$_?0:$_,1..($n=pop)-1;$s==$n}
I think that checks for perfect numbers...
SLOC is unreliable at best and just plain deceiving at worst. It's a similar quagmire when it comes to estimation of effort. So if you can't measure SLOC based on past projects and use those in estimates, there must be a way to measure overall productivity of software developers on past projects, right?
Not exactly, according to Martin Fowler:

Productivity, of course, is something you determine by looking at the input of an activity and its output. So to measure
software productivity you have to measure the output of software development - the reason we can't measure productivity is
because we can't measure output.

Ouch.
Futile. Yeah, Right.

If you've read this far and digested all this bad news, you must be thinking "Software development efforts must be futile."
Not for developers. Stubbornly attemping to solve problems are one of the things we do best. Here are a few approaches, ideas and strategies for estimating and measuring the output and schedules of software development.

Measure everything. This is one of the most important suggestions in a series of posts from Jeff Atwood on his Coding Horror blog.


The real art of software estimation, then, is the frantic search for data points to hang your estimates on.
Hopefully you're fortunate enough to work for an organization that captures historical data for your projects. - Jeff Atwood


Measure at a broad, historical scale. Both Atwood and Fowler believe there is some value in estimates based on historical data.


You can get a rough sense of a team's output by looking at how many features they deliver per iteration.
It's a crude sense, but you can get a sense of whether a team's speeding up, or a rough sense if one team
is more productive than another. - Fowler


Joel Spolsky and the gang at Fog Creek Software have formalized this idea into what they call Evidence Based Scheduling (EBS).


You gather evidence, mostly from historical timesheet data, that you feed back into your schedules. What you get is not just
one ship date: you get a confidence distribution curve, showing the probability that you will ship on any given date. - Joel Spolsky


When estimating, use a wide range of opinions. And have some fun with it by trying "planning poker".


The best way I’ve found for agile teams to estimate is by playing planning poker (Grenning 2002).
Planning poker combines expert opinion, analogy, and disaggregation into an enjoyable approach to
estimating that results in quick but reliable estimates.


Not that we've ever done this, but don't forget the experts in the estimation process. You need them, even if they are busy.


If, as suggested in MacDonell and Shepperd (2003), there is a high degree of independence between estimates based on common
effort estimation models and expert judgment, and it is difficult to devise rules for selecting the most accurate estimation
method, the solution seems to be to use a combination of models and experts. - Magne Jørgensen


Embrace uncertainty. Chaos is guaranteed, so plan ahead for how you'll handle it when requirements change, when the customer
wants a blue button instead of a green one, etc. After all, without customers and their winds of change, we wouldn't have much to do. This is one of the main principles of the Agile Manifesto:


Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
Welcome changing requirements, even late in development. Agile processes harness change for the customer's
competitive advantage.


Adapting to change can go beyond estimation and planning and be part of the ongoing design of a software system as well.
Evolutionary architecture and emergent design is a series of articles from Neal Ford on
IBM Developer Works that covers the idea of letting a design
"evolve" as a system or application is built. The following is a small excerpt from the final installment of that series:


Emergent design embodies agile philosophies about design. When design decisions arise, ask yourself:
Do I need to make this decision now?
Can I safely defer this decision?
What can I do to make the decision reversible?
If you're in an environment in which you can easily refactor your code, making a temporarily suboptimal decision
isn't so scary, because you can fix it without too much pain. If you set up your projects to adapt to change,
deferring decisions isn't damaging, because you have optimized for course corrections.
Embracing change requires the ability to look at decisions with ruthless objectivity and to change the ones that
are making things worse. - Neal Ford


Use advanced metrics to help find potential pain points, areas to refactor, or particularly hard problems to solve.
This may include metrics like cyclomatic complexity and code analysis tools such as Sonar.


Metrics and visualizations help you identify important parts of your code, allowing you to extract them as first-class
design elements...Cyclomatic complexity is a measure of the relative complexity of one method versus another.
Afferent coupling represents the count of how many other classes use the current class. - Ford


If you can simplify what you are trying to estimate, or build, you may be more accurate in your estimates and more
efficient in your coding and design.


Tony Hoare said: "There are two ways of constructing a software design: One way is to make it so simple that there are
obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.
The first method is far more difficult. - Graham-Cumming


Don't work in isolation. Look at what others developers are doing. How are open source projects run, in terms of
estimation or frequency of releases? How do they decide what to work on over the next few months (and know if it's
too much to handle)? How are projects from other industries built, run, or estimated? One way to get involved with
others is to connect with the local community through users groups or local meetings and events. Social media can
also present an avenue to learn from and interact with software advocacy groups or projects.

Software Estimation: Are we there yet?

As one IAI software developer put it, "software estimation is sorcery". And much like sorcery, software estimation has
been around a while. It's something we're not likely to permanently solve anytime soon, but we can improve our approaches
and try new techniques in order to get more accurate.
Remember that there are no silver bullets in software engineering. We'll
never find a single, perfect process to improve software estimation, no matter what the creator of a fancy new process or consultant may claim.
In addition to some of the aforementioned techniques, there are alternatives to classic SLOC estimates and metrics. However,
if you're going to use SLOC, at least use COCOMO II or something similar.
Most importantly, talk to the people around you, whether customers or fellow developers, in order to be more efficient in the estimation and delivery of software. Better algorithms and spreadsheet macros pale in comparison to communication across a software team. As much 'science' goes into computers, so does the personality of those involved in developing both hardware and software.

Business people and developers must work together daily throughout the project...
The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
-Agile Manifesto

References

Some Things I've Learned About Programming, John Graham-Cummings
The myth of perfect design, Scott Berkun
A Better Project Model than the 'Waterfall', Jeff Gothelf, Harvard Business Review
DoD IT Modernization, DoD CIO (pdf slideshow)
Whatever happened to Wordstar, John Dvorak
Comparing Java and Python – is Java 10x more verbose than Python (LOC)? A modest empiric approach, Stephan Schmidt
Steve McConnell in the doghouse, Jeff Atwood
The danger of complexity: More code, more bugs, Chad Perin, TechRepublic
CannotMeasureProductivity, Martin Fowler
Let's Play Planning Poker!, Jeff Atwood
How Good an Estimator are You? Part III, Jeff Atwood
Evidence Based Scheduling, Joel Spolsky
Forecasting of Software Development Work Effort: Evidence on Expert Judgment and Formal Models, Magne Jørgensen, Simula Research Laboratory
Evolutionary architecture and emergent design (series), Neal Ford, IBM DeveloperWorks
Evolutionary architecture and emergent design: Emergent design in the wild, Neal Ford, IBM DeveloperWorks
Evolutionary architecture and emergent design: Emergent design through metrics, Neal Ford, IBM DeveloperWorks
Agile Manifesto (principles)
Additional References and Suggested Reading or Watching

Software Estimation: Demystifying the Black Art, Steve McConnell (Microsoft Press, 2006)
Neal Ford on Agile Engineering Practices, Neal Ford (O'Reilly Media, 2011)
Bliki, a combined blog and wiki by Martin Folwer, self described "loud-mouthed pundit on the topic of software development."
System Error: Fixing the flaws in government IT, Justine Stephen, James Page, Jerrett Myers, Adrian Brown, David Watson, Sir Ian Magee (UK Institute for Government, 2011)