Skip to content

Instantly share code, notes, and snippets.

@xamgore
Created February 22, 2020 14:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save xamgore/2d7b5ac0f57eacf5df9d84c916282b2e to your computer and use it in GitHub Desktop.
Save xamgore/2d7b5ac0f57eacf5df9d84c916282b2e to your computer and use it in GitHub Desktop.
<p>One of the easiest ways for an epithet to lose its value is for it to become over-broad, which causes it to mean
little more than “I don’t like this”. Case in point is the term, <strong>“spaghetti code”</strong>, which people
often use interchangeably with “bad code”. The problem is that not all bad code is spaghetti code. Spaghetti
code is an especially virulent but specific <em>kind</em> of bad code, and its particular badness is instructive
in how we develop software. Why? Because individual people rarely write spaghetti code on their own. Rather,
certain styles of development process make it increasingly common as time passes. In order to assess this, it’s
important first to address the original context in which “spaghetti code” was defined: the dreaded (and mostly
archaic)&nbsp;<strong>goto</strong> statement.</p>
<p>The goto statement is a simple and powerful control flow mechanism: jump to another point in the code. It’s what
a compiled assembly program actually does in order to transfer control, even if the source code is written using
more modern structures like loops and functions. Using goto, one can implement whatever control flows one needs.
We also generally agree, in 2012, that goto is flat-out inappropriate for source code in most modern programs.
Exceptions to this policy exist, but they’re extremely rare. Most modern languages don’t even have it.</p>
<p>Goto statements can make it difficult to reason about code, because if control can bounce about a program, one
cannot make guarantees about what state a program is in when it executes a specific piece of code. Goto-based
programs can’t easily be broken down into component pieces, because any point in the code can be wormholed to
any other. Instead, they devolve into an “everything is everywhere” mess where to understand a piece of the
program requires understanding all of it, and the latter becomes flat-out impossible for large programs. Hence
the comparison to spaghetti, where following one thread (or noodle) often involves navigating through a large
tangle of pasta. You can’t look at a bowl of noodles and see which end connects to which. You’d have to
laboriously untangle it.</p>
<p>Spaghetti code is code where “everything is everywhere”, and in which answering simple questions such as (a)
where a certain piece of functionality is implemented, (b) determining where an object is instantiated and how
to create it, and (c) assessing a critical section for correctness, just to name a few examples of questions one
might want to ask about code, require understanding the whole program, because of the relentless pinging about
the source code that answer simple questions requires. It’s code that is incomprehensible unless one has the
discipline to follow each noodle through from one side to the other.&nbsp;<em>That</em> is spaghetti code.</p>
<p>What makes spaghetti code dangerous is that it, unlike other species of bad code, seems to be a common byproduct
of software entropy. If code is properly modular but some modules are of low quality, people will fix the bad
components if those are important to them. Bad or failed or buggy or slow implementations can be replaced with
correct ones while using the same interface. It’s also, frankly, just much easier to define correctness (which
one must do in order to have a firm sense of what “a bug” is) over small, independent functions than over a
giant codeball designed to do too much stuff. Spaghetti code is evil because (a) it’s a very common subcase of
bad code, (b) it’s almost impossible to fix without causing changes in functionality, which will be treated as
breakage if people depend on the old behavior (potentially by abusing “sleep” methods, thus letting a
performance improvement cause seemingly unrelated bugs!) and (c) it seems, for reasons I’ll get to later, not to
be preventable through typical review processes.</p>
<p>The reason I consider it important to differentiate spaghetti code from the superset, “bad code”, is that I think
a lot of what makes “bad code” is subjective. A lot of the conflict and flat-out incivility in software
collaboration (or the lack thereof) seems to result from the predominantly male tendency to lash out in the face
of unskilled creativity (or a perception of such, and in code this is often an extremely biased perception): to
beat the pretender to alpha status so badly that he stops pestering us with his incompetent displays. The
problem with this behavior pattern is that, well, it’s not useful and it rarely makes people better at what
they’re trying to do. It’s just being a prick. There are also a lot of anal-retentive wankbaskets out there who
define good and bad programmers based on cosmetic traits so that their definition of “good code” is “code that
looks like I wrote it”. I feel like the spaghetti code problem is better-defined in scope than the larger but
more subjective problem of “bad code”. We’ll never agree on tabs-versus-spaces, but we all know that spaghetti
code is incomprehensible and useless. Moreover, as spaghetti code is an especially common and damaging case of
bad code, assessing causes and preventions for this subtype may be generalizable to other categories of bad
code.</p>
<p>People usually use “bad code” to mean “ugly code”, but if it’s possible to determine&nbsp;<em>why</em> a piece of
code is bad and ugly, and to figure out a plausible fix, it’s already better than most spaghetti code. Spaghetti
code is incomprehensible and often unfixable. If you know <em>why</em> you hate a piece of code, it’s already
above spaghetti code in quality, since the latter is just featureless gibberish.</p>
<p>What causes spaghetti code? Goto statements were the leading cause of spaghetti code at one time, but goto has
fallen so far out of favor that it’s a non-concern. Now the culprit is something else entirely: the modern
bastardization of object-oriented programming. Inheritance is an especially bad culprit, and so is premature
abstraction: using a parameterized generic with only one use case in mind, or adding unnecessary parameters. I
recognize that this claim– that OOP as practiced is spaghetti code– is not a viewpoint without controversy. Nor
was it without controversy, at one time, that <em>goto</em> was considered harmful.</p>
<p>One of the biggest problems in comparative software (that is, the art of comparing approaches, techniques,
languages, or platforms) is that most comparisons focus on simple examples. At 20 lines of code, almost nothing
shows its evilness, unless it’s contrived to be dastardly. A 20-line program written with goto will usually be
quite comprehensible, and might even be easier to reason about than the same program written without goto. At 20
lines, a step-by-step instruction list with some explicit control transfer is a very natural way to envision a
program. For a static program (i.e. a platonic form that need never be changed and incurs no maintenance) that
can be read in one sitting, that might be a fine way to structure it. At 20,000 lines, the goto-driven program
becomes incomprehensible. At 20,000 lines, the goto-driven program has been hacked and expanded and tweaked so
many times that the original vision holding the thing together has vanished, and the fact that a program can be
in a piece of code “from anywhere” means that to safely modify the code requires confidence quantified by “from
everywhere”. Everything is everywhere. Not only does this make the code difficult to comprehend, but it means
that every modification to the code is likely to make it worse, due to unforeseeable chained consequences. Over
time, the software becomes “biological”, by which I mean that it develops behaviors that no one intended but
that other software components may depend on in hidden ways.</p>
<p>Goto failed, as a programming language construct, because of these problems imposed by the unrestricted pinging
about a program that it created. Less powerful, but therefore more specifically targeted, structures such as
procedures, functions, and well-defined data structures came into favor. For the one case where people needed
global control flow transfer (error handling) exceptions were developed. This was a progress from the extreme
universality and abstraction of a goto-driven program to the concretion and specificity of pieces (such as
procedures) solving specific problems. In unstructured programming, you can write a Big Program that does all
kinds of stuff, add features on a whim, and alter the flow of the thing as you wish. It doesn’t have to solve “a
problem” (so pedestrian…) but it can be a meta-framework with an embedded interpreter! Structured programming
encouraged people to factor their programs into specific pieces that solved single problems, and to make those
solutions reusable when possible. It was a precursor of the Unix philosophy (do one thing and do it well) and
functional programming (make it easy to define precise, mathematical semantics by eschewing global state).</p>
<p>Another thing I’ll say about goto is that it’s rarely needed as a language-level primitive.&nbsp;One could
achieve the same effect using a while-loop, a “program counter” variable defined outside that loop that the loop
either increments (step) or resets (goto) and a switch-case statement using it. This could, if one wished, be
expanded into a giant program that runs as one such loop, but code like this is never written. What the fact
that this is almost never done seems to indicate is that goto is rarely needed. Structured programming thereby
points out the insanity of what one is doing when attempting severely non-local control flows.</p>
<p>Still, there was a time when abandoning goto was extremely controversial, and this structured programming idea
seemed like faddish nonsense. The objection was: why use functions and procedures when goto is strictly more
powerful?</p>
<p>Analogously, why use referentially transparent functions and immutable records when&nbsp;<em>objects</em> are
strictly more powerful? An object, after all, can have a method called <strong>run</strong> or
<strong>call</strong> or <strong>apply</strong> so it can be a function. It can also have static, constant
fields only and be a record. But it can also do a lot more: it can have initializers and finalizers and open
recursion and fifty methods if one so chooses. So what’s the fuss about this functional programming nonsense
that expects people to build their programs out of things that are much less powerful, like records whose fields
never change and whose classes contain no initialization magic?</p>
<p>The answer is that power is not always good. <em>Power</em>, in programming, often advantages the “writer” of
code and not the reader, but maintenance (i.e. the need to read code) begins subjectively around 2000 lines or 6
weeks, and <em>objectively</em> once there is more than one developer on a project. On real systems, no one gets
to be just a “writer” of code. We’re readers, of our own code and of that written by others. Unreadable code is
just not acceptable, and only accepted because there is so much of it and because “best practices”
object-oriented programming, as deployed at many software companies, seem to produce it. A more “powerful”
abstraction is more general, and therefore less specific, and this means that it’s harder to determine exactly
what it’s used for when one has to read the code using it. This is bad enough, but single-writer code usually
remains fairly disciplined: the powerful abstraction&nbsp;<em>might</em> have 18 plausible uses, but only one of
those is actually used. There’s a singular vision (although usually an undocumented one) that prevents the
confusion. The danger sets in when others who are not aware of that vision have to modify the code. Often, their
modifications are hacks that implicitly assume one of the other 17 use cases. This, naturally, leads to
inconsistencies and those usually result in bugs. Unfortunately, people brought in to fix these bugs have even
less clarity about the original vision behind the code, and their modifications are often equally hackish. Spot
fixes may occur, but the overall quality of the code declines. This is the spaghettification process. No one
ever sits down to write himself a bowl of spaghetti code. It happens through a gradual “stretching” process and
there are almost always multiple developers responsible. In software, “slippery slopes” are real and the
slippage can occur rapidly.</p>
<p>Object-oriented programming, originally designed to prevent spaghetti code, has become (through a “design
pattern” ridden misunderstanding of it) one of the worst sources of it. An “object” can mix code and data freely
and conform to any number of interfaces, while a class can be subclassed freely about the program. There’s a lot
of power in object-oriented programming, and when used with discipline, it can be very effective. But most
programmers don’t handle it well, and it seems to turn to spaghetti over time.</p>
<p>One of the problems with spaghetti code is that it forms incrementally, which makes it hard to catch in code
review, because each change that leads to “spaghettification” seems, on balance, to be a net positive. The plus
is that a change that a manager or customer “needs yesterday” gets in, and the drawback is what looks like a
moderate amount of added complexity. Even in the Dark Ages of goto, no one ever sat down and said, “I’m going to
write an incomprehensible program with 40 goto statements flowing into the same point.” &nbsp;The clutter
accumulated gradually, while the program’s ownership transferred from one person to another. The same is true of
object-oriented spaghetti. There’s no specific point of transition from an original clean design to
incomprehensible spaghetti. It happens over time as people abuse the power of object-oriented programming to
push through hacks that would make no sense to them if they understood the program they were modifying and if
more specific (again, less powerful) abstractions were used. Of course, this also means that fault for
spaghettification is everywhere and nowhere at the same time: any individual developer can make a convincing
case that his changes weren’t the ones that caused the source code to go to hell. This is part of why
large-program software shops (as opposed to small-program Unix philosophy environments) tend to have such
vicious politics: no one knows who’s actually at fault for anything.</p>
<p>Incremental code review is great at catching the obvious bad practices, like mixing tabs and spaces, bad variable
naming practices, and lines that are too long. That’s why the more cosmetic aspects of “bad code” are less
interesting (using a definition of “interesting” synonymous with “worrisome”) than spaghetti code. We already
know how to solve them in incremental code review. We can even configure our continuous-integration servers to
reject such code. As for spaghetti code, where there is no clear definition, this is difficult if not impossible
to do. Whole-program review is necessary to catch that, but I’ve seen very few companies willing to invest the
time and political will necessary to have actionable whole-program reviews. Over the long term (10+ years) I
think it’s next to impossible, except among teams writing life- or mission-critical software, to ensure this
high level of discipline in perpetuity.</p>
<p>The answer, I think, is that Big Code just doesn’t work. Dynamic typing falls down in large programs, but static
typing fails in a different way. The same is true of object-oriented programming, imperative programming, and to
a lesser but still noticeable degree (manifest in the increasing number of threaded state parameters) in
functional programming. The problem with “goto” wasn’t that goto was inherently evil, so much as that it allowed
code to become Big Code very quickly (i.e. the threshold of incomprehensible “bigness” grew smaller). On the
other hand, the frigid-earth reality of Big Code is that there’s “no silver bullet”. Large programs just become
incomprehensible. Complexity and bigness aren’t “sometimes undesirable”. They’re always dangerous. <a
href="http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.html">Steve
Yegge got this one right.</a></p>
<p>This is why I believe the Unix philosophy is inherently right: programs shouldn’t be vague, squishy things that
grow in scope over time and are never really finished. A program should do one thing and do it well. If it
becomes large and unwieldy, it’s refactored into pieces: libraries and scripts and compiled executables and
data. Ambitious software projects shouldn’t be structured as all-or-nothing single programs, because every
programming paradigm and toolset breaks down horribly on those. Instead, such projects should be structured as&nbsp;<em>systems</em>
and given the respect typically given to such. This means that attention is paid to fault-tolerance,
interchangeability of parts, and communication protocols. It requires more discipline than the haphazard sprawl
of big-program development, but it’s worth it. In addition to the obvious advantages inherent in cleaner, more
usable code, another benefit is that <em>people actually read code</em>, rather than hacking it as-needed and
without understanding what they’re doing. This means that they get better as developers over time, and code
quality gets better in the long run.</p>
<p>Ironically, object-oriented programming was originally intended to encourage something looking like small-program
development. The original vision behind object-oriented programming was not that people should go and write
enormous, complex objects, but that they should use object-oriented discipline&nbsp;<em>when</em> complexity is
inevitable. An example of success in this arena is in databases. People demand so much of relational databases
in terms of transactional integrity, durability, availability, concurrency and performance that complexity is
outright necessary. Databases are complex beasts, and I’ll comment that it has taken the computing world
literally&nbsp;<em>decades</em> to get them decent, even with enormous financial incentives to do so. But while
a database can be (by necessity) complex, the interface to one (SQL) is much simpler. You don’t usually tell a
database what search strategy to use; you write a declarative SELECT statement (describing what the user wants,
not how to get it) and let the query optimizer take care of it.&nbsp;<em><br>
</em></p>
<p>Databases, I’ll note, are somewhat of an exception to my dislike of Big Code. Their complexity is well-understood
as necessary, and there are people willing to devote their careers entirely to mastering it. But people should
not have to devote their careers to understanding a typical business application. And they won’t. They’ll leave,
accelerating the slide into spaghettification as the code changes hands.</p>
<p>Why Big Code? Why does it exist, in spite of its pitfalls? And why do programmers so quickly break out the
object-oriented toolset without asking first if the power and complexity are needed? I think there are several
reasons. One is laziness: people would rather learn one set of general-purpose abstractions than study the
specific ones and when they are appropriate. Why should anyone learn about linked lists and arrays and all those
weird tree structures when we already have ArrayList? Why learn how to program using referentially transparent
functions when objects can do the trick (and so much more)? Why learn how to use the command line when modern
IDEs can protect you from ever seeing the damn thing? Why learn more than one language when Java is already
Turing-complete? Big Code comes from a similar attitude: why break a program down into small modules when modern
compilers can easily handle hundreds of thousands of lines of code? Computers don’t care if they’re forced to
contend with Big Code, so why should we?</p>
<p>However, more to the point of this, I think, is hubris with a smattering of greed. Big Code comes from a belief
that a programming project will be so important and successful that people will just swallow the complexity– the
idea that one’s own DSL is going to be as monumental as C or SQL. It also comes from a lack of willingness to
declare a problem solved and a program finished even when the meaningful work is complete. It also comes from a
misconception about what programming is. Rather than existing to solve well-defined problems and then get out of
the way, as small-program methodology programs do, they become more than that. Big Code projects often have an
overarching and usually impractical “vision” that involves generating software for software’s sake. This becomes
a mess, because “vision” in a corporate environment is usually bike-shedding that quickly becomes political. Big
Code programs always reflect the political environment that generated them (<a
href="http://en.wikipedia.org/wiki/Conway%27s_law">Conway’s
Law</a>) and this means that they invariably look more like collections of parochialisms and inside humor
than the more universal languages of mathematics and computer science.</p>
<p>There is another problem in play. Managers love Big Code, because when the programmer-to-program relationship is
many-to-one instead of one-to-many, efforts can be tracked and “headcount” can be allocated. Small-program
methodology is superior, but it requires trusting the programmers to allocate their time appropriately to more
than one problem, and most executive tyrannosaurs aren’t comfortable doing that. Big Code doesn’t actually work,
but it gives managers a sense of control over the allocation of technical effort. It also plays into the
conflation of bigness and success that managers often make (cf. the interview question for executives, “How many
direct reports did you have?”) The long-term spaghettification that results from Big Code is rarely an issue for
such managers. They can’t see it happen, and they’re typically promoted away from the project before this
becomes an issue.</p>
<p>In sum, spaghetti code is bad code, but not all bad code is spaghetti. Spaghetti is a byproduct of industrial
programming that is usually, but not always, an entropic result of too many hands passing over code, and an
inevitable outcome of large-program methodologies and the bastardization of “object-oriented programming” that
has emerged out of these defective, executive-friendly processes. The antidote to spaghetti is an aggressive and
proactive refactoring effort focused on keeping programs small, effective, clean in source code, and most of
all, coherent.</p>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment