slavapestov/CodeReview.md Secret

## CodeReview.md

      
    Raw
  

              CodeReview.md
            
          
    Introduction

The primary goals of code review are the following:


To ensure that coding conventions and defensive programming
practices are actually enforced.


To ensure that as code evolves, a consistent guiding philosophy is
followed even if multiple people work on it or the original authors
leave.


To ensure that each piece of code is understood by at least two
people. One of the worst "anti-patterns" in a team environment is when
all changes to a certain module have to be done by one person, who
invariably ends up overworked and delaying everything else (or worse,
burns out as a result of this, and quits!).


To have an extra pair of eyes comb over each piece of code, to catch
both obvious and subtle problems.


To ensure a correspondence between logical changes and patches. If a
logical change is broken up into too many small patches, or if a
single patch is too large, the reviewer will have a hard time.


It is also important to realize what code review is not. In
particular, code review is not a substitute for proper testing!
Make time for code review

You're up against a deadline and you need to get two more bug fixes
and refactorings in before you can ship. Your coworker is furiously
trying to add that last remaining widget to the dashboard. You don't
have time to wait for a review from them, and they don't want to wait
for a review from you. This is a familiar feeling for many, but you
must resist the temptation to cut corners and skip review or "rubber
stamp" patches without studying them in detail.
If you're waiting on a review, you can go work on the next patch in
the series, or if you feel the review might radically change the
direction of the patch, work on something else altogether.
If you have too many incoming reviews and you don't have time to look
at all of them, nominate another reviewer instead of just letting them
sit in your queue. But really, you should make time to catch up on
code reviews instead of pushing back all the time. It is best to spend
an hour or two reviewing several patches and really thinking them
through.
On the flip side, if you sent a patch for review and have not received
a response in a timely manner, feel free to ping the reviewer, walk
over to their desk, or even escalate to their manager, depending on
the severity of the situation.
How to pick reviewers

Code review systems either ask you to nominate reviewers when sending
a patch out for review, or have a configured set of reviewers for each
module. In both cases, each piece of code should have an owner who can
make decisions about the future direction of the code, and promptly
review any changes. You should also have as reviewers the owners of
any big interfaces your change is consuming or exporting.
Some code review systems have two types of approvals, sometimes called
+1 and +2. +1 means "the part of the code I understand looks good". +2
means "everything looks good". You might give a +1 if you own a small
piece of the code being changed, and a +2 if you are the overall lead
for the project.
If you're working on a module and can't find anyone willing to review
it, this is a major red flag. Perhaps you should take ownership of
this module, and find someone else to mentor and eventually help you
with the reviews! But really part of what makes a good manager is
preventing these situations from occurring by keeping track of who
owns what as people leave and join the team.
Knowing where to draw the line

When reviewing code, do not approach it with a mindset of "if I was to
write this code, what would it look like." and proceed to nitpick
every single subjective decision based off that. Instead, think "if I
was maintaining this code, what changes would I like to see". Don't
get attached to one particular approach. One great thing about code
review is you get to see other ways of solving a problem.
Sometimes, the reviewer may suggest refactorings and changes that the
author feels are beyond the scope of a single patch. In this case, the
author should feel free to push back and address the changes in a
future patch, or just file a bug to look at later. Probably if a
review is taking more than 4 days, or involves more than 4 revisions
of the patch on the part of the author, there is a problem. Either the
patch is too big, or the reviewer is being too pedantic. In either
case, the changes can be addressed in future patches.
Reviewing the commit message

When reviewing a change, it is equally important to review the commit
message itself. Even if the code is beautifully written and clean, you
might not understand why the change was made if you're trying to
bisect an obscure regression a year later. We've all been there.
Here is a bad commit message:
Starting work on YahooBus::barf(), don't forget to clean this
up to fix a tricky race conition later. Note the sutble usage
of 'if'.

This commit message gives no indication as to why the change is being
done in the first place. It doesn't explain what features or
subsequent changes this enables. It alludes to potential problems in
the code, suggesting that this change is not self-contained and
incomplete on its own. Finally, there are two typos in there!
Here is a better commit message:
Initial implementation of YahooBus::barf(), and tests.

Due to an influx of social media iOS app startups, our
neighborhood has been taken over by smartphone-toting hipsters,
driving up the cost of housing and creating demand for $4 toast.
We wish to stage a demonstration where we climb on top of a
Yahoo shuttle bus and proceed to vomit on the windshield of the
bus.

Note that while the tests pass, there is a subtle race condition
in YahooBus::BarfThread::notify(); if the 'complete' flag is set
before this method is called, the BarfThread will hang forever.
Fixing this requires changing the semantics of
ThreadPool::enqueue() and will be done in a subsequent patch.

Tested:
- new unit tests pass
- manually ran integration test and observed vomit on windshield

Note that unlike the first commit message, this one has the following:


First paragraph is a single line with a summary of the change.


Clear justification is provided for the change.


The potential issue is clearly explained.


A summary of how the change was tested.


Proper grammar and spelling.


Not every commit message needs all of these, and some are trivial
enough that a one-line summary is all that is needed.
Reviewing comments

Reviewing comments is very important, because a misleading comment can
give a false sense of security and is worse than no comment at all.
Is the comment in the right place? Sometimes during development we
move code around but forget to move a comment.
Does the comment actually say anything useful? Bad comments allude to
something only the author of the code knows, for example:
// Watch out, subtle logic below!

Or they repeat an obvious property of the code:
/* Add op to retry queue */
list_add_tail(&op->list, &dev->retry_queue);

Can the comment be turned into an assertion?
// Note that @tier must be between -1 and 3

Can the comment be eliminated entirely by cleaning up some code or
renaming things?
// When we say 'frobnitz' below we really mean 'storage device'

Opinions differ on TODO comments -- but in some cases it is clear that
a TODO should really be a bug in a bug tracker or a todo list item in
a document somewhere. The reviewer should point these cases out.
Each major data type should have a comment explaining how it is
implemented, what it is used for, and what the main operations are.
When a new lock is added, comments should clearly document what state
the lock is protecting. Lock order should be documented somewhere
obvious, like at the top of a file.
Header files that are meant for consumption by customers or other
projects should also have more attention paid to documenting the
interface.
Reviewing assertions

Assertions should be a frequently-used tool in any programmer's
toolbox. They are essentially executable comments. In the Linux
kernel, examples of assertions include BUG(), BUG_ON(),
WARN_ON(), lockdep_assert_held(), etc.
Assertions can be used to validate function parameters, return values
from other functions, as well as global state before and after an
operation. The latter is the most useful form of assertion.
Generally, you want to implement your interface in a way that can fail
gracefully when given bad inputs, usually by returning an error code
(of course there are exceptions; in C, you cannot really detect a
"bad" pointer for example). Code should also try to not pass invalid
data to the layers below.
The reviewer should look at each exported function and think through
the control flow. If I pass in an invalid parameter, will it fail in a
deterministic manner or just write past the end of an array somewhere?
In the kernel, the same set of error codes are used for many different
things. For example, a common pattern is:
ret = foo_bar_baz();
if (ret)
    return ret;

ret = bing_bong_bang();
if (ret)
    return ret;

If I call this function and it returns -EIO, how do I konw which one
of foo_bar_baz() and bing_bong_bang() returned the error code? In
general, I don't, and the reviewer should keep an eye out for this
type of ambiguity. It can be solved by adding log messages or
counters. Sometimes it is obvious what went wrong and this type of
thing is overkill. The reviewer should think through all the
possibilities.
The big picture

Before diving down into individual functions, take a look at the
overall changes. Are the new functions in the right place in each
file?
Are any new abstractions or concepts being introduced? Are they
properly named? Can they be simplified or eliminated away? Check for
code duplication, either in the patch itself or between something in
the patch and another piece of code the reviewer knows about.
The author of the patch might be focused on solving their particular
problem but the reviewer should have a more long-term view. Ask
yourself this: if the code were to receive 10 more patches like this
one, would it be a jumble of overlapping concepts with unclear
interactions, or would it actually get simpler over time?
When the reviewer points out anything they don't understand about the
code, the submitter should try to address the questions by refactoring
the code and adding comments, rather than just answering them inline
in the code review. Remember, the reviewer will not be the last person
to look at the code. If they have questions while reviewing the code
that are not answered in the code itself, the next person to maintain
the code will have the same questions (or worse, they won't ask and
instead make potentially incorrect assumptions).
Looking at individual functions

When the reviewer is trying to get a big picture overview of the
patch, it is okay to skim the patch at first and glaze over some of
the details. However at some point you really do need to read the code
line by line and understand it all. If you're not willing to do that,
either nominate another reviewer or give the patch a "+1", ensuring
the author must find someone else willing to do a detailed review who
can give a "+2".
Can you see any functions whose return values are obviously being
ignored incorrectly?
Look at error paths in the function. Does it look like every
reasonable error is handled? Do the error paths perform the right
cleanup steps in the right order? Did the author write sufficient
tests for these? Is there an obvious way to perform fault injection?
Related to this are paired calls. If the function takes a lock, does
it release the lock at the right time? What about memory allocations?
Indeed, is the function taking the correct locks in the correct order?
If it wasn't, would a maintainer be able to figure out which locks to
take by reading comments explaining the data structures in play here?
Is the function called from any special contexts that the author might
not be aware of? For example, in the Linux kernel, watch out for
changes that introduce sleeping to existing functions. They might be
called from atomic context.
If something goes wrong at runtime, does the code give enough
indication? The reviewer should pay close attention to error messages,
counters and so on.
Little things

Keep an eye out for one-off debugging code, formatting errors, or
changes that are not supposed to be part of the patch. These are all
very easy to miss for the author. After several rounds of refactoring
the patch, one's eyes tend to glaze over such details.
Reviewing tests

The reviewer should ensure that any new features being added are
actually covered by tests. Are the new tests sufficiently general? Do
they depend too much on implementation, are they flaky due to timing?
Remember that tests are actually another form of documentation for the
code under test. Test code should be held to the same high standard as
the rest of the codebase when it comes to reusability and abstraction.
As such, they should be reviewed in detail as well. In fact, by
looking at the tests you can often notice if the author has a
misunderstanding about the requirements of the code they are
implementing. If the tests don't make sense, there is a good chance
the code doesn't either.
Parting Thoughts

Code review is an art, not a science. All of the above are based on
the limited experience and subjective preferences of the author of
this article. Depending on the project and the team, you might be able
to disregard most of the advice above and focus on a completely
different set of concerns.
So if you only take away one lesson from this article, I hope it is
the following: code review is critical for ensuring code quality. Make
time for code review. It might appear to be a pointless busywork at
first, but you will learn to appreciate it once as soon as you have to
work on a codebase more than a year old, where the original authors
have moved on to other projects, and customer requirements have
changed several times leading to a build-up of complexity.