Skip to content

Instantly share code, notes, and snippets.

@yelirekim
Last active April 10, 2019 05:20
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yelirekim/6236f5be3202e11ce336 to your computer and use it in GitHub Desktop.
Save yelirekim/6236f5be3202e11ce336 to your computer and use it in GitHub Desktop.
What the commit?

#What is a commit?

In the simplest literal terms possible, a commit represents a change to lines of code in a revision control system, along with a description. The mechanics of most VCSes don't prescribe anything beyond that simple model. It's a concept that has been in use (and misuse) for a little over 40 years, and a central concept in every major version control system ever released. Given that, and the fact that you're reading this document, and the fact that this document is in a revision control system, you're probably already familiar with this concept.

You are probably also familiar with people making "bad commits", but what you might not be familiar with is a team that makes only "good commits", and more particularly the organizational implications of doing so. This document aims to explore those implications.

In order to get a good conceptual framework for how this is helping us, we'll need to explore some of the problems common to software engineering when you use version control systems.

##What problems are we aiming to solve?

The ideas presented in this document represent a radical shift to the way most organizations produce and release software. However, these ideas aren't being proposed out of the blue. They fix very real problems. You'll notice that actually most of these problems are centered around branches, that's because the most radical thing this document describes is a militant aversion to published branches. We'll get to that later though.

  • Dependency sprawl

    The most common problem that users of version control run into, specifically with branches, is the problem of having parts of dependent functionality existing on one branch, and needed on another. It's often infeasible to then merge these branches because one of them may not be ready for release.

  • Implementation divergence

    In long running branches, it's common for classes or functions called from (or by) the changes made in the branch to modify their implementation. This is a different problem from the classic, obvious "I need to resolve these merge conflicts" problem. It means that some of the fundamental assumptions made by the code you just wrote are incorrect, and often leads to the need for further broad changes to your branch. This problem then compounds itself, because you need to stay branched longer. Even worse, this easily produces bugs which are hard to see at the time. Possibly even worse than that, it creates uncertainty among people working on related swathes of code, and discourages refactoring.

  • Duplication of effort

    The effect of releasing, testing, and working on multiple branches trickles down into a pretty wide variety of the activities central to creating and releasing software. You need to maintain separate machines for deployments and test each one individually. You also need to reintegrate from upstream occasionally, and do this for every branch you maintain. Sometimes an upstream merge to one branch means something different than it does to another, and now those separate reintegration efforts have created an additional problem, which is that they will again conflict when they are pulled back into the upstream.

  • Poor reviewability

    Finally, we get back to the topic of commits. The biggest problem this document will discuss, and the core mechanism by which we can improve the quality of our commits, is code review. Branching, and specifically reviewing the implications of a large merge, is often terribly burdensome and error prone.

##That sounds awful, what can we do about this?

Not to worry, and I'm glad you say that, because I have just the thing we need, I have a plan. It's not a novel plan, but it's a good plan: get rid of published branches.*

We can solve all of the problems above, and more, by just getting rid of branches and working directly on master. However, that obviously raises all sorts of questions, the most notable being: what do you do about releases? Well it's pretty easy, you just have to make sure that your only branch (master) is always in a releasable state.

When considering the challenges this presents, at first glance, it might be easy to conclude that getting rid of branches isn't such a good idea. But it turns out that all of the steps you need to take in order to make working without branches a reality also directly and substantially improve your software, and improve the processes you use to build and release that software. If you put aside those steps for a moment, and just think about what it means to be highly confident that all of the commits made by everyone in the organization (and the features they contain) are releasable at any time, it's also easy to see the business value in this sytem.

So, we want to make sure master is in a releasable state at all times, but we also want to commit to master at all times, how do we do that? Well, since master is made up of atomic commits, we just have to make sure that each commit represents a change to the software such that it is still in a releasable state.

*Unless you're doing something really really seriously serious, and will be doing it for a while, and it's impossible to break the work down into smaller releasable chunks.

#What is a good commit?

For our purposes we're going to use the following definition of a "good commit".

A good commit has two qualities about it.

  1. It is the embodiment of a granular "idea" being introduced into the codebase, nothing more than a single idea, and not half of an idea.
  2. Thorough, reproducible effort has been put into ensuring that this new idea is the only thing being added to the codebase, and that it behaves as expected.

This is a purposefully terse definition so that it's easy to remember. The important thing about this definition is that it describes commits which we can be highly confident about applying to master, and still maintaining a releasable state. There are a few further implications, and there are many possible ways to support the goal of making a good commit.

##One idea is one commit

The generic concept of an "idea" is nebulous, but so is the generic concept of "any change you might want to make to some software". We'll define what an idea is for our own purposes anyways.

An idea is a tangible, coherent improvement to the software that can be well summarized in a single tweet.

Note that this is somewhat different from a "feature", since traditionally engineering teams talk about features as improvements to the software that are visible to the end user. An idea, on the other hand, may just be an improvement to the code itself, with no visible effect on how the software operates at all. In that case, the tangibility of the improvement is visible to you, the developer. This is a critical distinction, since it allows you to make many small commits as improvements to the capabilities of the codebase, then finally make another small commit that aggregates the capabilities of those previous small commits into a user visible feature.

The mantra of "one idea is one commit" has been shamelessly stolen from Phabricator Flavor Text, which is recommended reading since it inspired a lot of the ideas put forth here.

##Don't break anything

Now we get to the best part, we've been avoiding a pretty obvious elephant in the room: you need to verify that published commits won't introduce regressions. There's an obvious solution to this problem that you're probably not considering, just always write code that works and doesn't break any of the other code, you idiot.

###Unit testing

The best first line of defense here is to have extremely fast, automated testing in place which detects when you've introduced a regression. This is actually obvious, but easier said than done. This will present a pretty big upfront cost to most organizations which attempt to switch from a branching model to the model presented here.

There is an interesting side effect on the behavior of teams who operate like this though: they come to view unit test coverage at a relatively granular level as mandatory, out of sheer necessity. When you hide behind branches you tend to wait until right before you merge to do regression testing. If you have no branches, it forces the process of QA to be a rolling effort, and can really only be efficiently done when unit testing alerts you to the vast majority of regressions automatically.

###Code review

The second line of defense before you push a commit is having another engineer look at its contents. The primary goal of doing this is to get independent verification that you're making a good commit. The reviewer should, at minimum, feel confident that they comprehend the implications of what is being changed, and that the idea described in the commit message is the entirety of what the diff shows being changed.

This is a large leap to make, both conceptually and workflow wise, for developers who are used to committing and pushing frequently to branches. It means that when you start work on a commit, you really aren't done with it until someone else has looked over it. Logistically, this also requires you to decouple your implementations as much as you can, so that while you're waiting on one commit to be reviewed, you can work on others. It turns out that this kind of upfront decoupling actually produces better software, but it is a discipline that requires practice to get right.

#What the commit?

So, broadly, we're solving problems that software teams begin to run into when they scale. These problems are largely related to the complexity of having many parallel branches open at once. The end solution is to get rid of branches, and you can do that eventually by getting good at writing commits.

The best way to make progress towards actually doing this is to focus on your commits individually, focus on making them good. Here are some guidelines.

  • Keep the commit as small as possible

    When you don't change a whole lot, it's significantly easier to determine if you've broken anything. It's also significantly easier for a reviewer to keep the scope of your changes in their head, and provide meaningful feedback. You should keep commits very small, but not so small that they provide no meaningful benefit to anyone.

  • Keep Jeffry in mind

    Jeffry is going to review all of your code, and he's a really nice guy. Even though you've been working on the same projects with him for years, he doesn't usually have a firm grasp of what sorts of changes might be made to the software, or why. He's also really lazy. You should take extra special care to write reviewable code for him. This means taking time to craft a short summary message for the title of your commit which tersely describes the change, then expound on that title in the rest of the commit message in order to give Jeffry context as to why the change is being made, or any other pertinent information that he might need in order to understand what's going on. You should also be sure your commit doesn't get too long or complicated, else Jeffry may take a while to get back to you on it.

  • Verify your ideas yourself

    For changes to code which have no user facing visibility, this usually means writing a unit test that shows what you're claiming to do is actually happening, or at least checking that your implementation is already partially covered by existing tests, which are all passing. For user facing changes, this will mean manually going through the steps a user would take in order to see this feature working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment