cshirky/itp_debugging.md

## itp_debugging.md

      
    Raw
  

              itp_debugging.md
            
          
    A BRIEF INTRODUCTION TO DEBUGGING

“As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right 
as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that 
a large part of my life from then on was going to be spent in finding mistakes in my own programs.”
— Maurice Wilkes, 1949

This is a guide to debugging your projects at ITP and beyond. It is not a guide to specific techniques for debugging Processing sketches or physical computing projects; it is a guide to the basic ideas and goals of debugging.
We’d love to tell you that having to find and fix your own errors is a phase, that debugging is something only novices have to do, but we’d be lying. Even the pros spend a lot of time figuring out what’s wrong and then fixing it. Our goal here is simply to help you figure out and fix low-level mistakes quickly, so that you can move on to making high-level mistakes instead.
To fix bugs, first you have to understand what they are. A bug is not just an error in your project; errors you can fix. Bugs are different. You have a bug when your project doesn’t behave the way you expect it to, for reasons you don’t understand. In The Cathedral and the Bazaar, Eric Raymond quotes Linus Torvalds as noting that debugging has two steps: finding the problem, and fixing it. “And I'll go on record”, said Torvalds, “as saying that finding it is the bigger challenge.''
Because a bug entails a gap between your assumptions about your project and the project itself, the most important part of debugging is updating your assumptions. You can’t fix what you don’t get. What follows is some brief advice for closing that gap, in four sections:

Preparation for Debugging
Looking for the Bug
Changing Things To Find The Bug
Defensive Driving

Preparation for Debugging

“The main challenge is not to get confused by the complexities of your own making.”
— E. W. Dijkstra

The steps listed in this section aren’t debugging techniques, they are ways to prepare yourself for the task.
###Preparation #1: Don’t panic. (Do take a deep breath.)
Debugging isn’t the process of fixing your project. It is the process of understanding your project well enough that you can fix it. This means that debugging is a form of education, so you need to be  in the right frame of mind to do it well. Unfortunately, by the time you’ve decided you have a bug, you are usually tired, anxious, and angry, the worst possible mindset for learning.
Errors tend to show up at deadlines, because that’s when you test most heavily. Deadlines are already stress-inducing events. A deadline with a project that doesn’t work is even more so. And worst of all is a looming deadline with a project that doesn’t work, for reasons you don’t understand. In this situation, the thing you have most control over is your grasp of the problem.
Take a deep breath, both literally and metaphorically. Get up and walk around. Have an unrelated conversation. Maybe even go home and go to sleep. (Bugs that are inexplicable at 2 in the morning often turn out to be trivial at 10.)
Like the Zen parable about the student who is told his education will take longer the more impatient he becomes to finish it, you will debug faster and more effectively if you are willing to slow down and think than if you are pushing to speed up. At best, a rush job will merely take longer; at worst, you will create new problems while you are impatiently attacking the existing ones, which, by definition, you don’t yet understand.
###Preparation #2: Take a snapshot.
Whatever you do, don’t go backwards.
It’s tempting, when trying to figure out a particularly thorny bug, to start altering or removing things willy-nilly, or trying several new things in quick succession, just to see if anything changes. And sometimes when you do this (and we have all done it), you will find that the bug is minor -- a missed semi-colon, a crossed wire -- but the damage you have done while looking for it is major.
When you are tracking down a bug, make sure you can restore your project to the state it was in before you started. Before you begin, save your code, photograph your installation, diagram the connections between the parts. Make sure that you can, at the very least, restore your project to the last state it was in before you started debugging.
###Preparation #3: Write a message asking for help.
Asking for help can seem like a solution of last resort -- after all, it’s your project. However, getting ready to ask for help can be a great first step. As you are contemplating your problem, think of someone who understands the kind of work you’re doing. Then write them a message in as much detail as they need to understand your problem, explaining what you think should happen, and what’s happening instead.
You may well not actually end up sending this message. If you explain your problem to someone else, you will not only end up taking a deep breath by default, you will have to describe your project clearly, and describe what isn’t working clearly. Frequently, when you do this, you get enough distance from your problem to understand what isn’t working and why (which is of course the goal of debugging in the first place.)
And of course, if you do end up asking for help, this is the message you’ll want to send. Writing it first just gives you a head start. As Jared Shiffman recommends “Thou Shalt Not Allow Thyself To Get Stuck For More Than One Hour Without Asking For Help.”
Looking for the Bug

“Finding your bug is a process of confirming the many things you believe are true, 
until you find one which is not true.”
-- Norman Matloff

Debugging requires a different mindset than designing. It’s a humbler and a more humbling activity than creating, less artist and more plumber. Your goal is to subject your problem to a set of observations and tests that are so simple, and isolate the combined elements so visibly, that you can find the place where your mental picture of the work is wrong. The ideal way to find a bug is by observing the system without changing anything.
That’s important enough to bear repeating: The ideal way to find a bug is by observing the system without changing anything.
This is not always possible, of course, but it should be your first line of defense: Look at the system, look at the code, look at the connections, and see if there is any obvious defect. If not, then you will have to start altering the system to locate the problem, but if you can find it through examination, it will save you time and trouble (as well as the risk of introducing new bugs while looking for the old one.)
What follows is a list of common-sense principles for examining your system for bugs. These are not specific to any technology; these are strategies for helping you find and understand what is keeping your project from doing what you expect.
###Looking #1: Is everything plugged in?
Is everything plugged in, in the largest possible senses of ‘plug’ and ‘in’? Does everything that needs power have power? Are any connecting cables -- USB, Ethernet, ribbon -- actually connected? Are the parts that are connected over a network -- Wifi, XBee, Bluetooth -- actually exchanging data?
Do you have any “black boxes” in your project, components (real or virtual) that you didn’t build and don’t have access to? This could be a local component -- a GPS device, a camera -- or a networked one -- an API, a remote server. Is the black box behaving as it should under ordinary conditions?
This kind of bug is sometimes jokingly called a “high-impedance air gap” (aka, the cable was not plugged in), and because it is a canonically stupid bug, it’s easy to overlook. Since you are a smart person, it is easy to believe that you wouldn't make dumb mistakes, a belief that create an additional barrier to finding dumb mistakes.
However, for most of the time you are working on a project, and for all of the initial stage, the picture in your head will be more complete than what you have actually built. Before you check anything complicated, be sure you are not imagining that things are as connected as they need to be.
###Looking #2: What just changed?
If your project worked, then stopped, what changed?
This strategy requires two things: first, you need to have a good sense of what you did, in what order. Second, you need to have a good sense when your project last worked the way you expected.
Figuring out what just changed can be surprisingly hard; one common source of bugs is the belief that changing one part of a system won’t affect other parts, or a belief that “just cleaning up” a program or circuit is somehow different from altering it. (Indeed, the belief that some changes are so small that they don’t count as changes is so common, it has its own entry in the Hacker’s Jargon file: I didn't change anything!)
You also have to know the last time things worked as expected. This can sometimes be hard, either because you waited too long between tests, so that dozens or hundreds of things changed, or because you didn’t test it thoroughly enough prior to finding the bug even to be able to say what was working.
Even with these difficulties, understanding what has changed between working and non-working states is essential. Isolating the problem means identifying the differences between the the old (working) state and the new (broken) one; the most recent changes are usually the cause of the bug.
###Looking #3: Read the code like a computer.
Another common source of bugs are typos -- you called a variable redoffset in one place and redOffset in another, or you spelled it redOffest. You forgot a semi-colon. You deleted a loop, but not its trailing brace. Whatever. The problem with these kinds of typos is that computers never ever do what you want. They only do what you tell them. Your job is to tell them, in mind-numbing detail, exactly what you want. A simple typo is as program-crashing as a terrible mistake.
When you have a bug like this, you have to read your code like the computer does. The computer doesn’t care, or even understand, what you meant to write. You probably meant to have a semicolon at the end of the line, or to spell redOffset correctly, but that’s not what the computer sees. There’s simply no substitute for picayune examination. Read your code out loud:
if, paren, X, greater than, 3, close paren, brace, return, X, equals, 0, semicolon, return, close brace...
###Looking #4: What do the error messages say?
Like Gary Larsen’s cartoon of the man berating his dog, who only hears “Blah blah Ginger blah blah blah blah Ginger”, error messages often read like Blah blah Failed blah blah Exiting blah blah blah Permanent Fatal Unrecoverable Shutdown.
Error messages can look like technical gobbledygook, but while they typically over-report the details, and describe internal program state you rarely have access to, the error messages are nevertheless trying to be helpful.
In particular, the first part of an error message -- the line at the top of the list of such messages -- usually describes the first thing that went wrong, and, for simple mistakes like unbalanced braces or missing semicolons, the error messages are often models of clarity and concision, as with Processing’s “Syntax error, maybe a missing semicolon?”
Even if you think you can’t or won’t understand the error messages, give them a careful look. Even messages that are mainly cryptic (as with the Java messages after Processing failures) often contain invaluable information at the top of the list.
###Looking #5: LMGTFY/RTFM
Sadly, many error messages are not sufficiently descriptive to be useful -- they can be both verbose yet uninformative, or, worse, brief yet uninformative. If you can’t piece through the error messages on your own, a good next step is to copy the error message exactly and google it.
If you are getting an error message, then other people have had this problem before you, and at least some of those people have talked about it in public, on sites like StackOverflow or Winprog. Even if the error message itself isn’t helpful, the conversations you can find your way to using that message as a clue can be.
Likewise, any programming language has detailed documentation. If you know you have a problem with serial communication from an Arduino, read the Arduino documentation; if it is with an array in Processing, read the Processing documentation. And remember that your goal here is not just finding a simple fix (though it often feels like that), but to improve your understanding of the tools you are using, both so that you don't make that mistake again, and so that you get better at reading and understanding the documentation.
Of all ways of updating your assumptions, and thus of transforming frustrating bugs into fixable errors, improving your familiarity with the available documentation, help resources, and relevant communities is probably the most valuable, since it doesn't just educate you about one specific bug, or even a whole class of them; it makes you better able to both avoid and solve future bugs.
Changing Things To Find The Bug

“Debugging is twice as hard as writing the code in the first place. Therefore, if you 
write the code as cleverly as possible, you are, by definition, not smart enough to 
debug it.”
— Brian W. Kernighan and P. J. Plauger, in The Elements of Programming Style.

If you’ve looked everything over, and found cables all snugly connected and semicolons happily hugging the ends of statements, then you have to start testing various hypotheses about the source of the bug.
This step -- changing your project to test it -- is the source of the worst debugging problems. 90% of the frustration of debugging comes from the difficulty understanding bugs. The other 90% comes from things you break as a side-effect of looking for them. You should therefore regard the process of change/test/fix as the method of last resort (even though you will use it frequently.)
When you start changing things, you want to make the minimum number of changes, and to leave the project in the most easily restorable state you can.
###Debugging #1: Can you work in one direction?
Imagine you have a light at home that doesn’t turn on. You could have a burned out bulb, a bad switch, a break in the wire, a bad plug, or an outlet with no electricity. If you test these possibilities willy-nilly, you may remain in a state of confusion -- if you flip the switch to test it without knowing if the power is out, or the bulb is burned out, you’re guessing, not testing.
Can you find a chain of events that need to unfold for the lightbulb to light, and then start testing from one end or the other? Is there power at the outlet? Or: Does the bulb work if you screw it into another lamp? Testing this way, you can at least work methodically -- if there is no power, then you may or may not have problems elsewhere (its possible for the power to be out and for the bulb to be burned out), but you know at least one thing you have to fix.
Complex systems with internal breakpoints and branches offer a modified version of this method -- start in the middle. If you have a variable coming in from the web, getting acted on, stored, combined with other variables and being returned, you could ask “Is this variable getting stored properly?” If so, then your problem appears after the database has recorded the value. If not, then at least one problem is upstream of the database. The best way to leave your system restorable after understanding your bug is to know which parts you don't have to check.
###Debugging 2: Process of Elimination
This is the meta-version of starting from one end. Make a list of things that can go wrong. Which of them can you check off? Because we are most focussed on the parts of the system we have been working on most recently, we often assume that other parts of the system are fine. But are they?
What aspects of your system could fail in the first place? How many moving parts do you have, literally or figuratively? Are you getting the input from the environment you need? Are you waiting for output that isn’t appearing? Make your own checklist: Power? Signal? Network? Inputs arriving? Right values? etc. Then start to test the elements one at a time.
Make sure, as you are eliminating possible sources of error, that you do so non-destructively. Comment out code rather than deleting it; unplug only one end of a connecting wire rather than removing the whole thing (or add a switch to allow you to selectively "unplug" the component you suspect.) After you find your bug, make restoring your project to a working state as easy on yourself as you can.
###Debugging 3: Write your own error reporting
Our technical systems tend to be fairly quiet when they work well, but they also tend to be quiet when they work badly. For many bugs, what you’ve built will go along happily doing something, just not what you want.
In case like these, you will need to add your own error messages. The canonical (and stupid) error reporting is adding println "got here!" inside a loop or test you think may not be executing. While this method can work, you can't use it twice in the same code (how will you know which here you got?), you can't use it to test anything other than 'works/doesn't work', and you'll have forgotten what it means the next day.
Home-grown error messages are another form of commenting, and, like comments, you want them to be short but informative. Not "got here!" but "loop: i=" + i. This latter will not only tell you whether the initial loop test passed, but what's happening as it runs, which helps test for "Off by 1" errors and the like. You can also put the value of more than one variable on a single line, and have more than one such error message printing at the same time.
Unlike comments, however, error reporting tends to be transient, testing for problems you will later fix.  It's best to label these error reports with a specific style of comment, ###BUG CHECK or similar, so you can later find and remove those tests.
###Debugging 4: Are you looking at what you are thinking about?
Take this pledge:
I, [your name here], promise that when the changes I am making don't show up in the 
program I am trying to fix, I will check whether...

[ ] I have saved the file
[ ] I have uploaded the saved version
[ ] The files I am editing and viewing have the same name

...before I pound my head on the keyboard in frustration.

Sometimes, your bug is that you are looking at something different than you are working on. If you change a file but don’t save it, then the version you are testing will always have the same bug. If you save a file but don’t upload it, the version on the server will always have the same bug. If you rename a file but keep testing the file with the old name, the version you test will always have the same bug. In these situations, 100% of your attempted fixes will fail.
The worst part of this sort of bug isn’t frustration and lost time. The worst part is that this mistake can make you stupider. If you think you see a bug in one of your loops, and you change it, but the change doesn’t fix the bug, you can end up deciding that loops don’t work the way you thought they did. If your actual mistake is about testing the wrong thing, you can end up convincing yourself that you know less about loops (or whatever) than you actually do. You won’t just have wasted time testing the wrong thing, you will have weakened your understanding of core concepts as a dreadful side-effect.
###Debugging 5: Can you create a spike test?
One theory of programming, called Agile, suggests that you should never start a task without knowing roughly how long it should take. One method of creating a good estimate for a new problem is something they call a ‘spike solution’, a piece of code that doesn’t solve your problem, but solves a related, smaller one, which will help you estimate how long it will take to solve the big one.
A similar process is available for debugging. If you have a variable that is not updating, or a sensor that is not triggering you can slowly comment out sections of the code or unplug connections you think might be responsible for the failure, but given that your problem is almost certainly has a single, fairly obvious source, you can also simply create a new tiny project that only does one thing -- test and updates the variable, or watches for sensor input, et cetera.
Creating a parallel test what you think is happening will make you take a deep breath and explain the problem to yourself. This is likely to be both more educational and less destructive than altering a partially working project. After learning what is wrong from your spike test, you can go back and look at the main body of the code with new eyes.
There is no hard and fast principle for when to change the project in place vs. when to create a spike test, but there are two rules of thumb:

You should stop fiddling with the existing project and instead create a spike test earlier than you think.
Rule #1 remains in effect even when you already know Rule #1.

Defensive Driving

“The cheapest, fastest, and most reliable components are those that aren't there.”
— Gordon Bell
“Deleted code is debugged code.”
— Jeff Sickel
“If you're willing to restrict the flexibility of your approach, you can almost always do something better.”
— John Carmack

In the same way that carrying an umbrella reduces the chance of rain, assuming in advance that you will have to debug anything you build will reduce both the number of bugs and the amount of time you spend on them.
###Defense 1: Assume failure
As you will have discovered by now, the chances of you having a bug approach 100% as the complexity of your project rises to even moderate levels. Things that happen 100% of the time are things you can plan for; you will spend at least some time on your project debugging. Might as well get good at it.
Almost all of the 'tech hygiene' you learn wouldn't be needed if most things mostly worked most of the time. The reason you should write descriptive code and comments, keep wires untangled, and alternate building and testing throughout is precisely because you know you will at some point have to stop making and start fixing, and when that happens, you want to keep the Artist::Plumber ratio high.
###Defense 2: Premature Optimization Is The Root of All Evil
This is a technological version of “The Perfect Is The Enemy of the Good”, first put forward by Don Knuth, the person who has thought harder than anybody about elevating the practice of creating software.
It’s tempting, as you build, to create sophisticated, compact versions of the work, to streamline code and move components off the breadboard. This can seem cost-free when you are doing it -- you keep your project the same, while making it sleeker, or faster, or whatever it is you are optimizing for.
However, optimization is emphatically not cost-free. Optimized code is often more difficult to read -- control flow is less obvious, clever routines are harder to understand -- and optimized hardware is more difficult to see or fix -- tightly bundled components are harder to inspect, hardened hardware can’t be changed as easily. Some optimizations are required -- sometimes things have to go on a custom board to fit in your enclosure, sometimes code has to be fast to be good -- but these instances are rarer than you think, especially in the prototyping stage.
The very seductiveness of optimization -- you’re making things better without making them different -- should be a clue that this kind of work often isn’t helping much. At best, it's a pleasing waste of time, like picking a screensaver. At worst, it can break things.
Ask yourself "What is the simplest thing that could possibly work?" Do that, and only after that should you worry about even essential optimizations, and you should get in the habit of regarding most optimizations, especially for prototype work, as unneeded unless proven otherwise.
###Defense 3: Test often/Only do one thing between tests.
Working on several things at once often feels like progress: “I’m fixing the if/then test, renaming database fields, changing the LED blink rate, and improving the interface! So productive!”
Unfortunately, some work is not progress. If you change more than one kind of thing between tests -- if you change if/then tests and database logic in one work session -- then you don’t just have two places to look for the error, you may in fact have introduced an error in the interaction between two or more other errors, and interacting bugs are far harder to find than isolated ones.
As you work, test everything that could possibly break, and only do one kind of work between tests. If you are changing if/then tests, don’t touch the database until after you test them, and vice-versa.
This will seem annoying at first. If your project has lots of interacting parts, it's tempting to think you can make all of them better in tandem. You can’t. Whatever short-term advantage you gain from working in several places at once, the long-term difficulty of fixing the problems introduced in this way will cost you that much time and more.
Change the program. Test. Change the database. Test. Change the interface. Test. Repeat chorus. In the short-term, you''ll catch errors more quickly. In the medium-term, you'll get good at making and running quick tests (an essential skill.) In the long-term, you'll learn as you go.
###Defense 4: Back up your work. (Use git. Then use github.)
The Circles of Debugging Hell are:
1. I have a bug. 
2. I have a bug, and I introduced new bugs looking for the old one. 
3. I have a bug, I introduced new bugs, and I destroyed my only working version trying to fix them.

If you've never caused major damage while trying to fix a minor bug, it's difficult to convey the feeling -- lets just say it's not a parade of rainbow-flavored unicorns -- but this is exactly what a narrow mindset of 'find and fix' can lead to.
It's easy to perseverate on isolating a minor bug so completely that you forget your original goal: create a flying robot army (or whatever.) As a result, you can end up deleting or disconnecting routines and components that have no easy means of being reconnected short of building the project up from scratch again.
Preparation #2, above, says "Take a snapshot", as a way of ensuring you don't go backwards, but as with testing as you go, not just when a bug makes you stop, the serious way to approach this problem is to take snapshots all the time, to link the acts of writing your code, testing your code, and backing up your code in a tight loop.
There used to be a whole literature on backing up and managing code, with subtle distinctions between different version control systems, but all that is over now. There is one right answer: use git.
Conclusion: Mindset first. Then tools.

It’s not a secret handed out at institutions of higher education, it’s just how things work: 
you begin with a lack of understanding about a topic, and a need to solve a problem in that 
topic area. The honest, sustainable means of doing so is to improve your understanding. 
This is achieved by:

Formulating a question which, when correctly answered, will improve your understanding in some way; 
then: Attempting to answer it.

Matt Gemmell, “What Have You Tried?”

This document presents strategy, not tactics. Debugging software and hardware require different tools and techniques, as do debugging local versus networked projects, debugging databases vs. interfaces, and so on. There are many specific tools to use in debugging -- oscilloscopes for visualizing the flow of current, code debuggers that will allow you to step through your code one at a time -- and many books written on the particulars of debugging individual programming languages.
There are also more complex strategies for producing fewer bugs in the first place. Pair programming involves two programmers working together on the same code, so they talk through potential issues before they arise. Test-driven development puts debugging at the heart of development; you write tests to check that the program behaves as expected before you start coding. And so on.
These are power tools, and many of them are great, but for most bugs you will face, especially early on, they can be overkill. When you need more complex tools or deeper process, you’ll know. Even the best tools and the most sophisticated techniques, however, require a user in the right frame of mind.
Debugging is more about updating your mind than updating your project. You never have a bug -- while you’re working, it’s always your bug, because it’s always related to the gap between the world and your picture of it. Your job, whatever tools and techniques you end up using, is to update that picture to reflect reality. Debugging is the act of turning mysteries into mere errors, and then fixing those errors.
“If you give someone a program, you will frustrate them for a day. 
If you teach them how to program, you will frustrate them for a lifetime.”
— David Leinweber