jorendorff/how-to-fix-bugs.md

## how-to-fix-bugs.md

      
    Raw
  

              how-to-fix-bugs.md
            
          
    How To Fix Bugs

A talk for new programmers about debugging.
Introduction

(The talk starts with some discussion about bugs.
This happens before the first slide.)


What is programming like?


How much of the effort is fighting syntax errors
and trying to figure out what error messages mean?
(You'll do less and less of that as you gain experience.)


How much of the effort is fixing bugs?


What are some bugs you've found?


When you find a bug, and you figure out how to fix it, how do you feel?


What not to do

You're writing a program, and it isn't doing what it's supposed to do.
Now what?
Let me first make a list of things you shouldn't do.
(slide: "what not to do" and the three points below, revealed one by one)
Feel discouraged.

Don't feel like you just aren't cut out for programming.
The last thing in the world you should feel discouraged about is having a bug.
We all have bugs.
And the great thing about having a bug is
you can find it.
And you can understand it.
And you can fix it.
And that's going to make you feel good.
Keep at it.
Smush the code around until it goes away.

You'll want to go to the part of your code you understand the least,
and where you therefore think the problem probably is,
and just change stuff around until it works.
We know this is not good, right?
If it ever works, it'll work by chance.
It might take a long time.
And you won't have learned anything.
(If you feel the urge to do this, take a break instead.)
Fix it without understanding.

One step up from that is when you know you've got
a particular chunk of code that's wrong.
(slide: a bunch of code, highlighted region, red arrow, "bug is somewhere in here")
You don't understand exactly why it's wrong,
or exactly which part is wrong,
but you know which part of the program contains the bug.
(slide: animates to illustrate the kind of change described below)
At this point there's a temptation to add some more code
to make it work in this one case,
without first understanding the bug.
You'll just add an if statement. It'll say,
"if we've got the one buggy case, chunk-of-brand-new-code-that-works;
otherwise, all-your-existing-code".
Done!
Why is this bad?
Well: what happened to the bug?
Did you remove it?
No. This code still contains the bug.
That is, there's something wrong in here, and you don't know what it is.
And: what about the new code?
Should you be more confident about the new code than the old code?
Why? Because it's new?
Because you haven't seen it go wrong yet?
And: how many times can you do this?
It makes your code longer and more complex every time.
Sometimes the fix to a bug is new code.
More often the correct fix is to change buggy code.
Either way, if you don't take the time to understand the bug first,
you're headed for trouble.
Debugging techniques

(slide showing the three bad ideas above)
So these are the pitfalls.
The rest of this talk is about what I do,
when I have a bug,
which is pretty much all the time.
(blank slide)
Finding a bug is a detective story.
(random scrolling code, perhaps)
Somewhere in all this code,
these characters supposedly familiar to you,
characters you probably entered yourself,
is somebody who's up to no good.
The murderer is among us, right?
Hiding in plain sight.
We are going to be gathering clues.
We are going to go down a lot of blind alleys.
We will question our assumptions.
We will drink a lot of coffee.
We will reach a deeper understanding of the characters in the case.
We will uncover their dirty secrets,
and in the end we will get our bug.
(blank slide)
Like any good detective, we start at the scene of the crime.
Get steps to reproduce

If you ever have the good fortune to work with a QA engineer
(QA for "quality assurance"),
they will send you bug reports
that contain STR, which stands for "steps to reproduce".
What does this mean?
Reproducing a bug is just making it happen again.
A bug is reproducible if there's a simple set of steps
that you can carry out and see the buggy behavior.
That's what steps to reproduce are.
They're the main ingredient of a good bug report.
(The other ingredients are
"what I expected" and "what actually happened".)
Maybe all you have to do is type rake, and there's the bug.
A test fails. It's that simple.
That would be ideal.
But sometimes you're just running your program and poking at things--
you're not running tests at the moment--
or maybe a customer is running the program and sees some weird behavior.
So now you have to do a little bit of work,
not much work,
to make sure there's really a bug,
and understand what the buggy behavior is
and exactly what circumstances cause the bug to happen.
I don't know if this has ever happened to you, but sometimes I see a bug,
and I start looking into it, and then I change some stuff.
And I run the program again, and the bug doesn't happen.
I didn't fix it; I know there's still a bug;
but it isn't happening anymore.
Well, that's not good.
So then I change all the stuff back to the way it was, and run the program again,
and the bug still doesn't happen.
It mysteriously disappeared.
And I don't even really know what exactly I changed.
This isn't good.
In fact, this is stupid.
You never want this to happen.
If you were in a detective novel, this would be like
having the dead body disappear while you're in another room.
You come back, you're like, "...Oh.
Well gee, I think a crime was committed here,
but I didn't take any pictures or collect any evidence..."
Embarrassing, right?
The first order of business is to get steps to reproduce.
Make a test case

Can you make it into a test case?
If you can't, it's because you're not set up for testing all the things.
You need to fix that.
Figure out how you can write an automated test for this bug.
Make a test case.
If nothing else, at least you'll know when it's fixed!
Is the test case minimal? Eliminate inessential parts of the test case
to focus on what exactly is busted.
Get more information

At this point you have a minimal test case. Great!
Now you can start gathering clues.
There are two key things to realize about the situation you're in.


The reason there's still a bug, the reason it's not clear what's
wrong, is that you can't tell what the program is doing.
It's lack of information.
All you see is the program's output.
You don't see which statements executed in what order,
what the intermediate values of all the variables are.
Somewhere, something clearly went in an unexpected direction.
If you can just get the right insight
into what your code is doing when it runs,
you'll find the bug.


The other problem that can make it hard to see a bug
is a flawed assumption.
Bugs happen because something unexpected is happening,
and what you think you know about the system
can blind you to the problem.
There's no cure for this, but it helps to have a habit
of checking your assumptions as you go.
We'll talk about how to do that in a minute.


From here, you just have to figure out all the clever tricks
for checking your assumptions,
and getting more information,
and getting the exact information you need.
Here are a few:


Learn to read a stack trace.  A stack trace is information about
what your program is doing. Don't waste it!


Dump the data. Insert a print statement
(or whatever the equivalent is in your preferred language)
and run it again.


Tie off the loose ends.
When you see something funny,
even if it's probably not related to your bug,
make a note of it and try to figure out how it happened.
Something funny happening is a clue, and you are a gumshoe.
Track that down.


Work backwards.
Suppose you've got a bug, you've got a test case,
and you can run the code and see the buggy output.
By the time you're looking at that buggy output,
it is probably too late
to get any useful debugging information out of your code.
Computers are very good at throwing away information;
if the actual error was introduced a millisecond ago,
the trail is pretty cold.
But maybe you can add a print statement
a little earlier in the code and run it again,
and get information about what happened earlier.
("ok -- at this point in the code, is everything correct,
or have we already screwed up?")
Even better than working backwards:
binary search across the code in question.


Learn the tools. They'll change your life.
This is a whole talk, but let me just show you one thing...
(JS debugger demo)


Stepping and watching variables change is incredibly useful.


Breakpoints and re-running the code from the beginning
makes it so much easier to find the point where things go wrong.


Run a little code in the console, see what the key value is.


Watch HTTP traffic.
This is great for checking your assumptions and detecting trouble.
Incidentally it is also a great way to find out
why your web site is so slow to load.


Another powerful tool at your disposal is version control.
Is it a regression? Now that you have a test,
you can use git bisect
to find the exact code change that broke it.
That is a huge clue.
It's like knowing the last person to see the victim alive.


Writing debuggable code

There are a few things you can do to make your code easier to debug.


Check in the test alongside the fix.
That way, if it ever happens again, you'll know right away.


Look for other places where the code might have the same mistake.
Now is a great time to find and fix those!


Write methods to dump useful data.
Check them in, even though your program doesn't call them.
They'll come in handy next time you have a bug.
(There's always a next time.)


Use assertions!
Generally, detect mistakes as soon as possible and throw an error.
This "as soon as possible" is important because
computers are great at destroying all evidence of wrongdoing,
which makes debugging impossible.
"As soon as possible" means "close to the cause";
it means that the stack trace is more likely to point to
the exact line of code where the problem is.
(May need to clarify that I mean assertions are great in regular code,
not just in tests.)
Nice places for assertions include:


at the top of a method, to check that the arguments are sane.


at the end of any complex code, to make sure the result is what we wanted.
For example: When there are several different cases you have to handle,
but ultimately they're all trying to achieve one thing,
and it's something you can assert,
assert it!


any time you notice there's a common mistake you seem to keep making,
if there's a place where you can put an assertion
to catch that mistake as soon as possible,
do it!


That said, most programmers don't put assertions in every method.
They're visual clutter.
Only assert things that are not obvious.
And work to make your code obviously correct wherever you can.


Make your code as obviously correct as you can.
The key useful thing a program must do is: it has to run.
But there are other useful things a program can do.
A good program helps the reader see that it's correct.
This is easier said than done,
but one thing you can definitely do is,
when you know there's a part of your program
that even you don't understand,
think about how to rewrite it
or otherwise make it clearer.