Skip to content

Instantly share code, notes, and snippets.

@slavapestov
Forked from erg/CodeReview.md
Last active February 14, 2019 12:44
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slavapestov/9ddc0a3b79b7be8a69e38f50e72ee46f to your computer and use it in GitHub Desktop.
Save slavapestov/9ddc0a3b79b7be8a69e38f50e72ee46f to your computer and use it in GitHub Desktop.
Code Review

Introduction

The primary goals of code review are the following:

  • To ensure that coding conventions and defensive programming practices are actually enforced.

  • To ensure that as code evolves, a consistent guiding philosophy is followed even if multiple people work on it or the original authors leave.

  • To ensure that each piece of code is understood by at least two people. One of the worst "anti-patterns" in a team environment is when all changes to a certain module have to be done by one person, who invariably ends up overworked and delaying everything else (or worse, burns out as a result of this, and quits!).

  • To have an extra pair of eyes comb over each piece of code, to catch both obvious and subtle problems.

  • To ensure a correspondence between logical changes and patches. If a logical change is broken up into too many small patches, or if a single patch is too large, the reviewer will have a hard time.

It is also important to realize what code review is not. In particular, code review is not a substitute for proper testing!

Make time for code review

You're up against a deadline and you need to get two more bug fixes and refactorings in before you can ship. Your coworker is furiously trying to add that last remaining widget to the dashboard. You don't have time to wait for a review from them, and they don't want to wait for a review from you. This is a familiar feeling for many, but you must resist the temptation to cut corners and skip review or "rubber stamp" patches without studying them in detail.

If you're waiting on a review, you can go work on the next patch in the series, or if you feel the review might radically change the direction of the patch, work on something else altogether.

If you have too many incoming reviews and you don't have time to look at all of them, nominate another reviewer instead of just letting them sit in your queue. But really, you should make time to catch up on code reviews instead of pushing back all the time. It is best to spend an hour or two reviewing several patches and really thinking them through.

On the flip side, if you sent a patch for review and have not received a response in a timely manner, feel free to ping the reviewer, walk over to their desk, or even escalate to their manager, depending on the severity of the situation.

How to pick reviewers

Code review systems either ask you to nominate reviewers when sending a patch out for review, or have a configured set of reviewers for each module. In both cases, each piece of code should have an owner who can make decisions about the future direction of the code, and promptly review any changes. You should also have as reviewers the owners of any big interfaces your change is consuming or exporting.

Some code review systems have two types of approvals, sometimes called +1 and +2. +1 means "the part of the code I understand looks good". +2 means "everything looks good". You might give a +1 if you own a small piece of the code being changed, and a +2 if you are the overall lead for the project.

If you're working on a module and can't find anyone willing to review it, this is a major red flag. Perhaps you should take ownership of this module, and find someone else to mentor and eventually help you with the reviews! But really part of what makes a good manager is preventing these situations from occurring by keeping track of who owns what as people leave and join the team.

Knowing where to draw the line

When reviewing code, do not approach it with a mindset of "if I was to write this code, what would it look like." and proceed to nitpick every single subjective decision based off that. Instead, think "if I was maintaining this code, what changes would I like to see". Don't get attached to one particular approach. One great thing about code review is you get to see other ways of solving a problem.

Sometimes, the reviewer may suggest refactorings and changes that the author feels are beyond the scope of a single patch. In this case, the author should feel free to push back and address the changes in a future patch, or just file a bug to look at later. Probably if a review is taking more than 4 days, or involves more than 4 revisions of the patch on the part of the author, there is a problem. Either the patch is too big, or the reviewer is being too pedantic. In either case, the changes can be addressed in future patches.

Reviewing the commit message

When reviewing a change, it is equally important to review the commit message itself. Even if the code is beautifully written and clean, you might not understand why the change was made if you're trying to bisect an obscure regression a year later. We've all been there.

Here is a bad commit message:

Starting work on YahooBus::barf(), don't forget to clean this
up to fix a tricky race conition later. Note the sutble usage
of 'if'.

This commit message gives no indication as to why the change is being done in the first place. It doesn't explain what features or subsequent changes this enables. It alludes to potential problems in the code, suggesting that this change is not self-contained and incomplete on its own. Finally, there are two typos in there!

Here is a better commit message:

Initial implementation of YahooBus::barf(), and tests.

Due to an influx of social media iOS app startups, our
neighborhood has been taken over by smartphone-toting hipsters,
driving up the cost of housing and creating demand for $4 toast.
We wish to stage a demonstration where we climb on top of a
Yahoo shuttle bus and proceed to vomit on the windshield of the
bus.

Note that while the tests pass, there is a subtle race condition
in YahooBus::BarfThread::notify(); if the 'complete' flag is set
before this method is called, the BarfThread will hang forever.
Fixing this requires changing the semantics of
ThreadPool::enqueue() and will be done in a subsequent patch.

Tested:
- new unit tests pass
- manually ran integration test and observed vomit on windshield

Note that unlike the first commit message, this one has the following:

  • First paragraph is a single line with a summary of the change.

  • Clear justification is provided for the change.

  • The potential issue is clearly explained.

  • A summary of how the change was tested.

  • Proper grammar and spelling.

Not every commit message needs all of these, and some are trivial enough that a one-line summary is all that is needed.

Reviewing comments

Reviewing comments is very important, because a misleading comment can give a false sense of security and is worse than no comment at all.

Is the comment in the right place? Sometimes during development we move code around but forget to move a comment.

Does the comment actually say anything useful? Bad comments allude to something only the author of the code knows, for example:

// Watch out, subtle logic below!

Or they repeat an obvious property of the code:

/* Add op to retry queue */
list_add_tail(&op->list, &dev->retry_queue);

Can the comment be turned into an assertion?

// Note that @tier must be between -1 and 3

Can the comment be eliminated entirely by cleaning up some code or renaming things?

// When we say 'frobnitz' below we really mean 'storage device'

Opinions differ on TODO comments -- but in some cases it is clear that a TODO should really be a bug in a bug tracker or a todo list item in a document somewhere. The reviewer should point these cases out.

Each major data type should have a comment explaining how it is implemented, what it is used for, and what the main operations are.

When a new lock is added, comments should clearly document what state the lock is protecting. Lock order should be documented somewhere obvious, like at the top of a file.

Header files that are meant for consumption by customers or other projects should also have more attention paid to documenting the interface.

Reviewing assertions

Assertions should be a frequently-used tool in any programmer's toolbox. They are essentially executable comments. In the Linux kernel, examples of assertions include BUG(), BUG_ON(), WARN_ON(), lockdep_assert_held(), etc.

Assertions can be used to validate function parameters, return values from other functions, as well as global state before and after an operation. The latter is the most useful form of assertion.

Generally, you want to implement your interface in a way that can fail gracefully when given bad inputs, usually by returning an error code (of course there are exceptions; in C, you cannot really detect a "bad" pointer for example). Code should also try to not pass invalid data to the layers below.

The reviewer should look at each exported function and think through the control flow. If I pass in an invalid parameter, will it fail in a deterministic manner or just write past the end of an array somewhere?

In the kernel, the same set of error codes are used for many different things. For example, a common pattern is:

ret = foo_bar_baz();
if (ret)
    return ret;

ret = bing_bong_bang();
if (ret)
    return ret;

If I call this function and it returns -EIO, how do I konw which one of foo_bar_baz() and bing_bong_bang() returned the error code? In general, I don't, and the reviewer should keep an eye out for this type of ambiguity. It can be solved by adding log messages or counters. Sometimes it is obvious what went wrong and this type of thing is overkill. The reviewer should think through all the possibilities.

The big picture

Before diving down into individual functions, take a look at the overall changes. Are the new functions in the right place in each file?

Are any new abstractions or concepts being introduced? Are they properly named? Can they be simplified or eliminated away? Check for code duplication, either in the patch itself or between something in the patch and another piece of code the reviewer knows about.

The author of the patch might be focused on solving their particular problem but the reviewer should have a more long-term view. Ask yourself this: if the code were to receive 10 more patches like this one, would it be a jumble of overlapping concepts with unclear interactions, or would it actually get simpler over time?

When the reviewer points out anything they don't understand about the code, the submitter should try to address the questions by refactoring the code and adding comments, rather than just answering them inline in the code review. Remember, the reviewer will not be the last person to look at the code. If they have questions while reviewing the code that are not answered in the code itself, the next person to maintain the code will have the same questions (or worse, they won't ask and instead make potentially incorrect assumptions).

Looking at individual functions

When the reviewer is trying to get a big picture overview of the patch, it is okay to skim the patch at first and glaze over some of the details. However at some point you really do need to read the code line by line and understand it all. If you're not willing to do that, either nominate another reviewer or give the patch a "+1", ensuring the author must find someone else willing to do a detailed review who can give a "+2".

Can you see any functions whose return values are obviously being ignored incorrectly?

Look at error paths in the function. Does it look like every reasonable error is handled? Do the error paths perform the right cleanup steps in the right order? Did the author write sufficient tests for these? Is there an obvious way to perform fault injection?

Related to this are paired calls. If the function takes a lock, does it release the lock at the right time? What about memory allocations? Indeed, is the function taking the correct locks in the correct order? If it wasn't, would a maintainer be able to figure out which locks to take by reading comments explaining the data structures in play here?

Is the function called from any special contexts that the author might not be aware of? For example, in the Linux kernel, watch out for changes that introduce sleeping to existing functions. They might be called from atomic context.

If something goes wrong at runtime, does the code give enough indication? The reviewer should pay close attention to error messages, counters and so on.

Little things

Keep an eye out for one-off debugging code, formatting errors, or changes that are not supposed to be part of the patch. These are all very easy to miss for the author. After several rounds of refactoring the patch, one's eyes tend to glaze over such details.

Reviewing tests

The reviewer should ensure that any new features being added are actually covered by tests. Are the new tests sufficiently general? Do they depend too much on implementation, are they flaky due to timing?

Remember that tests are actually another form of documentation for the code under test. Test code should be held to the same high standard as the rest of the codebase when it comes to reusability and abstraction. As such, they should be reviewed in detail as well. In fact, by looking at the tests you can often notice if the author has a misunderstanding about the requirements of the code they are implementing. If the tests don't make sense, there is a good chance the code doesn't either.

Parting Thoughts

Code review is an art, not a science. All of the above are based on the limited experience and subjective preferences of the author of this article. Depending on the project and the team, you might be able to disregard most of the advice above and focus on a completely different set of concerns.

So if you only take away one lesson from this article, I hope it is the following: code review is critical for ensuring code quality. Make time for code review. It might appear to be a pointless busywork at first, but you will learn to appreciate it once as soon as you have to work on a codebase more than a year old, where the original authors have moved on to other projects, and customer requirements have changed several times leading to a build-up of complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment