xamgore/what is spaghetti code.html

## what is spaghetti code.html
<p>One of the easiest ways for an epithet to lose its value is for it to become over-broad, which causes it to mean
    little more than “I don’t like this”. Case in point is the term, <strong>“spaghetti code”</strong>, which people
    often use interchangeably with “bad code”. The problem is that not all bad code is spaghetti code. Spaghetti
    code is an especially virulent but specific <em>kind</em> of bad code, and its particular badness is instructive
    in how we develop software. Why? Because individual people rarely write spaghetti code on their own. Rather,
    certain styles of development process make it increasingly common as time passes. In order to assess this, it’s
    important first to address the original context in which “spaghetti code” was defined: the dreaded (and mostly
    archaic)&nbsp;<strong>goto</strong> statement.</p>
<p>The goto statement is a simple and powerful control flow mechanism: jump to another point in the code. It’s what
    a compiled assembly program actually does in order to transfer control, even if the source code is written using
    more modern structures like loops and functions. Using goto, one can implement whatever control flows one needs.
    We also generally agree, in 2012, that goto is flat-out inappropriate for source code in most modern programs.
    Exceptions to this policy exist, but they’re extremely rare. Most modern languages don’t even have it.</p>
<p>Goto statements can make it difficult to reason about code, because if control can bounce about a program, one
    cannot make guarantees about what state a program is in when it executes a specific piece of code. Goto-based
    programs can’t easily be broken down into component pieces, because any point in the code can be wormholed to
    any other. Instead, they devolve into an “everything is everywhere” mess where to understand a piece of the
    program requires understanding all of it, and the latter becomes flat-out impossible for large programs. Hence
    the comparison to spaghetti, where following one thread (or noodle) often involves navigating through a large
    tangle of pasta. You can’t look at a bowl of noodles and see which end connects to which. You’d have to
    laboriously untangle it.</p>
<p>Spaghetti code is code where “everything is everywhere”, and in which answering simple questions such as (a)
    where a certain piece of functionality is implemented, (b) determining where an object is instantiated and how
    to create it, and (c) assessing a critical section for correctness, just to name a few examples of questions one
    might want to ask about code, require understanding the whole program, because of the relentless pinging about
    the source code that answer simple questions requires. It’s code that is incomprehensible unless one has the
    discipline to follow each noodle through from one side to the other.&nbsp;<em>That</em> is spaghetti code.</p>
<p>What makes spaghetti code dangerous is that it, unlike other species of bad code, seems to be a common byproduct
    of software entropy. If code is properly modular but some modules are of low quality, people will fix the bad
    components if those are important to them. Bad or failed or buggy or slow implementations can be replaced with
    correct ones while using the same interface. It’s also, frankly, just much easier to define correctness (which
    one must do in order to have a firm sense of what “a bug” is) over small, independent functions than over a
    giant codeball designed to do too much stuff. Spaghetti code is evil because (a) it’s a very common subcase of
    bad code, (b) it’s almost impossible to fix without causing changes in functionality, which will be treated as
    breakage if people depend on the old behavior (potentially by abusing “sleep” methods, thus letting a
    performance improvement cause seemingly unrelated bugs!) and (c) it seems, for reasons I’ll get to later, not to
    be preventable through typical review processes.</p>
<p>The reason I consider it important to differentiate spaghetti code from the superset, “bad code”, is that I think
    a lot of what makes “bad code” is subjective. A lot of the conflict and flat-out incivility in software
    collaboration (or the lack thereof) seems to result from the predominantly male tendency to lash out in the face
    of unskilled creativity (or a perception of such, and in code this is often an extremely biased perception): to
    beat the pretender to alpha status so badly that he stops pestering us with his incompetent displays. The
    problem with this behavior pattern is that, well, it’s not useful and it rarely makes people better at what
    they’re trying to do. It’s just being a prick. There are also a lot of anal-retentive wankbaskets out there who
    define good and bad programmers based on cosmetic traits so that their definition of “good code” is “code that
    looks like I wrote it”. I feel like the spaghetti code problem is better-defined in scope than the larger but
    more subjective problem of “bad code”. We’ll never agree on tabs-versus-spaces, but we all know that spaghetti
    code is incomprehensible and useless. Moreover, as spaghetti code is an especially common and damaging case of
    bad code, assessing causes and preventions for this subtype may be generalizable to other categories of bad
    code.</p>
<p>People usually use “bad code” to mean “ugly code”, but if it’s possible to determine&nbsp;<em>why</em> a piece of
    code is bad and ugly, and to figure out a plausible fix, it’s already better than most spaghetti code. Spaghetti
    code is incomprehensible and often unfixable. If you know <em>why</em> you hate a piece of code, it’s already
    above spaghetti code in quality, since the latter is just featureless gibberish.</p>
<p>What causes spaghetti code? Goto statements were the leading cause of spaghetti code at one time, but goto has
    fallen so far out of favor that it’s a non-concern. Now the culprit is something else entirely: the modern
    bastardization of object-oriented programming. Inheritance is an especially bad culprit, and so is premature
    abstraction: using a parameterized generic with only one use case in mind, or adding unnecessary parameters. I
    recognize that this claim– that OOP as practiced is spaghetti code– is not a viewpoint without controversy. Nor
    was it without controversy, at one time, that <em>goto</em> was considered harmful.</p>
<p>One of the biggest problems in comparative software (that is, the art of comparing approaches, techniques,
    languages, or platforms) is that most comparisons focus on simple examples. At 20 lines of code, almost nothing
    shows its evilness, unless it’s contrived to be dastardly. A 20-line program written with goto will usually be
    quite comprehensible, and might even be easier to reason about than the same program written without goto. At 20
    lines, a step-by-step instruction list with some explicit control transfer is a very natural way to envision a
    program. For a static program (i.e. a platonic form that need never be changed and incurs no maintenance) that
    can be read in one sitting, that might be a fine way to structure it. At 20,000 lines, the goto-driven program
    becomes incomprehensible. At 20,000 lines, the goto-driven program has been hacked and expanded and tweaked so
    many times that the original vision holding the thing together has vanished, and the fact that a program can be
    in a piece of code “from anywhere” means that to safely modify the code requires confidence quantified by “from
    everywhere”. Everything is everywhere. Not only does this make the code difficult to comprehend, but it means
    that every modification to the code is likely to make it worse, due to unforeseeable chained consequences. Over
    time, the software becomes “biological”, by which I mean that it develops behaviors that no one intended but
    that other software components may depend on in hidden ways.</p>
<p>Goto failed, as a programming language construct, because of these problems imposed by the unrestricted pinging
    about a program that it created. Less powerful, but therefore more specifically targeted, structures such as
    procedures, functions, and well-defined data structures came into favor. For the one case where people needed
    global control flow transfer (error handling) exceptions were developed. This was a progress from the extreme
    universality and abstraction of a goto-driven program to the concretion and specificity of pieces (such as
    procedures) solving specific problems. In unstructured programming, you can write a Big Program that does all
    kinds of stuff, add features on a whim, and alter the flow of the thing as you wish. It doesn’t have to solve “a
    problem” (so pedestrian…) but it can be a meta-framework with an embedded interpreter! Structured programming
    encouraged people to factor their programs into specific pieces that solved single problems, and to make those
    solutions reusable when possible. It was a precursor of the Unix philosophy (do one thing and do it well) and
    functional programming (make it easy to define precise, mathematical semantics by eschewing global state).</p>
<p>Another thing I’ll say about goto is that it’s rarely needed as a language-level primitive.&nbsp;One could
    achieve the same effect using a while-loop, a “program counter” variable defined outside that loop that the loop
    either increments (step) or resets (goto) and a switch-case statement using it. This could, if one wished, be
    expanded into a giant program that runs as one such loop, but code like this is never written. What the fact
    that this is almost never done seems to indicate is that goto is rarely needed. Structured programming thereby
    points out the insanity of what one is doing when attempting severely non-local control flows.</p>
<p>Still, there was a time when abandoning goto was extremely controversial, and this structured programming idea
    seemed like faddish nonsense. The objection was: why use functions and procedures when goto is strictly more
    powerful?</p>
<p>Analogously, why use referentially transparent functions and immutable records when&nbsp;<em>objects</em> are
    strictly more powerful? An object, after all, can have a method called <strong>run</strong> or
    <strong>call</strong> or <strong>apply</strong> so it can be a function. It can also have static, constant
    fields only and be a record. But it can also do a lot more: it can have initializers and finalizers and open
    recursion and fifty methods if one so chooses. So what’s the fuss about this functional programming nonsense
    that expects people to build their programs out of things that are much less powerful, like records whose fields
    never change and whose classes contain no initialization magic?</p>
<p>The answer is that power is not always good. <em>Power</em>, in programming, often advantages the “writer” of
    code and not the reader, but maintenance (i.e. the need to read code) begins subjectively around 2000 lines or 6
    weeks, and <em>objectively</em> once there is more than one developer on a project. On real systems, no one gets
    to be just a “writer” of code. We’re readers, of our own code and of that written by others. Unreadable code is
    just not acceptable, and only accepted because there is so much of it and because “best practices”
    object-oriented programming, as deployed at many software companies, seem to produce it. A more “powerful”
    abstraction is more general, and therefore less specific, and this means that it’s harder to determine exactly
    what it’s used for when one has to read the code using it. This is bad enough, but single-writer code usually
    remains fairly disciplined: the powerful abstraction&nbsp;<em>might</em> have 18 plausible uses, but only one of
    those is actually used. There’s a singular vision (although usually an undocumented one) that prevents the
    confusion. The danger sets in when others who are not aware of that vision have to modify the code. Often, their
    modifications are hacks that implicitly assume one of the other 17 use cases. This, naturally, leads to
    inconsistencies and those usually result in bugs. Unfortunately, people brought in to fix these bugs have even
    less clarity about the original vision behind the code, and their modifications are often equally hackish. Spot
    fixes may occur, but the overall quality of the code declines. This is the spaghettification process. No one
    ever sits down to write himself a bowl of spaghetti code. It happens through a gradual “stretching” process and
    there are almost always multiple developers responsible. In software, “slippery slopes” are real and the
    slippage can occur rapidly.</p>
<p>Object-oriented programming, originally designed to prevent spaghetti code, has become (through a “design
    pattern” ridden misunderstanding of it) one of the worst sources of it. An “object” can mix code and data freely
    and conform to any number of interfaces, while a class can be subclassed freely about the program. There’s a lot
    of power in object-oriented programming, and when used with discipline, it can be very effective. But most
    programmers don’t handle it well, and it seems to turn to spaghetti over time.</p>
<p>One of the problems with spaghetti code is that it forms incrementally, which makes it hard to catch in code
    review, because each change that leads to “spaghettification” seems, on balance, to be a net positive. The plus
    is that a change that a manager or customer “needs yesterday” gets in, and the drawback is what looks like a
    moderate amount of added complexity. Even in the Dark Ages of goto, no one ever sat down and said, “I’m going to
    write an incomprehensible program with 40 goto statements flowing into the same point.” &nbsp;The clutter
    accumulated gradually, while the program’s ownership transferred from one person to another. The same is true of
    object-oriented spaghetti. There’s no specific point of transition from an original clean design to
    incomprehensible spaghetti. It happens over time as people abuse the power of object-oriented programming to
    push through hacks that would make no sense to them if they understood the program they were modifying and if
    more specific (again, less powerful) abstractions were used. Of course, this also means that fault for
    spaghettification is everywhere and nowhere at the same time: any individual developer can make a convincing
    case that his changes weren’t the ones that caused the source code to go to hell. This is part of why
    large-program software shops (as opposed to small-program Unix philosophy environments) tend to have such
    vicious politics: no one knows who’s actually at fault for anything.</p>
<p>Incremental code review is great at catching the obvious bad practices, like mixing tabs and spaces, bad variable
    naming practices, and lines that are too long. That’s why the more cosmetic aspects of “bad code” are less
    interesting (using a definition of “interesting” synonymous with “worrisome”) than spaghetti code. We already
    know how to solve them in incremental code review. We can even configure our continuous-integration servers to
    reject such code. As for spaghetti code, where there is no clear definition, this is difficult if not impossible
    to do. Whole-program review is necessary to catch that, but I’ve seen very few companies willing to invest the
    time and political will necessary to have actionable whole-program reviews. Over the long term (10+ years) I
    think it’s next to impossible, except among teams writing life- or mission-critical software, to ensure this
    high level of discipline in perpetuity.</p>
<p>The answer, I think, is that Big Code just doesn’t work. Dynamic typing falls down in large programs, but static
    typing fails in a different way. The same is true of object-oriented programming, imperative programming, and to
    a lesser but still noticeable degree (manifest in the increasing number of threaded state parameters) in
    functional programming. The problem with “goto” wasn’t that goto was inherently evil, so much as that it allowed
    code to become Big Code very quickly (i.e. the threshold of incomprehensible “bigness” grew smaller). On the
    other hand, the frigid-earth reality of Big Code is that there’s “no silver bullet”. Large programs just become
    incomprehensible. Complexity and bigness aren’t “sometimes undesirable”. They’re always dangerous. <a
            href="http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.html">Steve
        Yegge got this one right.</a></p>
<p>This is why I believe the Unix philosophy is inherently right: programs shouldn’t be vague, squishy things that
    grow in scope over time and are never really finished. A program should do one thing and do it well. If it
    becomes large and unwieldy, it’s refactored into pieces: libraries and scripts and compiled executables and
    data. Ambitious software projects shouldn’t be structured as all-or-nothing single programs, because every
    programming paradigm and toolset breaks down horribly on those. Instead, such projects should be structured as&nbsp;<em>systems</em>
    and given the respect typically given to such. This means that attention is paid to fault-tolerance,
    interchangeability of parts, and communication protocols. It requires more discipline than the haphazard sprawl
    of big-program development, but it’s worth it. In addition to the obvious advantages inherent in cleaner, more
    usable code, another benefit is that <em>people actually read code</em>, rather than hacking it as-needed and
    without understanding what they’re doing. This means that they get better as developers over time, and code
    quality gets better in the long run.</p>
<p>Ironically, object-oriented programming was originally intended to encourage something looking like small-program
    development. The original vision behind object-oriented programming was not that people should go and write
    enormous, complex objects, but that they should use object-oriented discipline&nbsp;<em>when</em> complexity is
    inevitable. An example of success in this arena is in databases. People demand so much of relational databases
    in terms of transactional integrity, durability, availability, concurrency and performance that complexity is
    outright necessary. Databases are complex beasts, and I’ll comment that it has taken the computing world
    literally&nbsp;<em>decades</em> to get them decent, even with enormous financial incentives to do so. But while
    a database can be (by necessity) complex, the interface to one (SQL) is much simpler. You don’t usually tell a
    database what search strategy to use; you write a declarative SELECT statement (describing what the user wants,
    not how to get it) and let the query optimizer take care of it.&nbsp;<em><br>
    </em></p>
<p>Databases, I’ll note, are somewhat of an exception to my dislike of Big Code. Their complexity is well-understood
    as necessary, and there are people willing to devote their careers entirely to mastering it. But people should
    not have to devote their careers to understanding a typical business application. And they won’t. They’ll leave,
    accelerating the slide into spaghettification as the code changes hands.</p>
<p>Why Big Code? Why does it exist, in spite of its pitfalls? And why do programmers so quickly break out the
    object-oriented toolset without asking first if the power and complexity are needed? I think there are several
    reasons. One is laziness: people would rather learn one set of general-purpose abstractions than study the
    specific ones and when they are appropriate. Why should anyone learn about linked lists and arrays and all those
    weird tree structures when we already have ArrayList? Why learn how to program using referentially transparent
    functions when objects can do the trick (and so much more)? Why learn how to use the command line when modern
    IDEs can protect you from ever seeing the damn thing? Why learn more than one language when Java is already
    Turing-complete? Big Code comes from a similar attitude: why break a program down into small modules when modern
    compilers can easily handle hundreds of thousands of lines of code? Computers don’t care if they’re forced to
    contend with Big Code, so why should we?</p>
<p>However, more to the point of this, I think, is hubris with a smattering of greed. Big Code comes from a belief
    that a programming project will be so important and successful that people will just swallow the complexity– the
    idea that one’s own DSL is going to be as monumental as C or SQL. It also comes from a lack of willingness to
    declare a problem solved and a program finished even when the meaningful work is complete. It also comes from a
    misconception about what programming is. Rather than existing to solve well-defined problems and then get out of
    the way, as small-program methodology programs do, they become more than that. Big Code projects often have an
    overarching and usually impractical “vision” that involves generating software for software’s sake. This becomes
    a mess, because “vision” in a corporate environment is usually bike-shedding that quickly becomes political. Big
    Code programs always reflect the political environment that generated them (<a
            href="http://en.wikipedia.org/wiki/Conway%27s_law">Conway’s
        Law</a>) and this means that they invariably look more like collections of parochialisms and inside humor
    than the more universal languages of mathematics and computer science.</p>
<p>There is another problem in play. Managers love Big Code, because when the programmer-to-program relationship is
    many-to-one instead of one-to-many, efforts can be tracked and “headcount” can be allocated. Small-program
    methodology is superior, but it requires trusting the programmers to allocate their time appropriately to more
    than one problem, and most executive tyrannosaurs aren’t comfortable doing that. Big Code doesn’t actually work,
    but it gives managers a sense of control over the allocation of technical effort. It also plays into the
    conflation of bigness and success that managers often make (cf. the interview question for executives, “How many
    direct reports did you have?”) The long-term spaghettification that results from Big Code is rarely an issue for
    such managers. They can’t see it happen, and they’re typically promoted away from the project before this
    becomes an issue.</p>
<p>In sum, spaghetti code is bad code, but not all bad code is spaghetti. Spaghetti is a byproduct of industrial
    programming that is usually, but not always, an entropic result of too many hands passing over code, and an
    inevitable outcome of large-program methodologies and the bastardization of “object-oriented programming” that
    has emerged out of these defective, executive-friendly processes. The antidote to spaghetti is an aggressive and
    proactive refactoring effort focused on keeping programs small, effective, clean in source code, and most of
    all, coherent.</p>