Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@technosophos
Last active December 8, 2015 04:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save technosophos/b09927e848c29ddb0b4d to your computer and use it in GitHub Desktop.
Save technosophos/b09927e848c29ddb0b4d to your computer and use it in GitHub Desktop.
Three Reasons Not to Check Vendor Into VCS

Losing the VCS

When you check vendored code into VCS, that code is no longer tracked in its own git repo (since you must remove the .git directory to check it in, or else treat it as a submodule). We thus lose all of the assistance the VCS was giving us. This leads to three problems:

  1. Updates require full checkouts every single time. Any time you need to update a dependency, you have to check out that dependency, along with all of those that it requires. Then you must do all the version management, only to remove most of what you checked out at the end. The unfortunate result is that for many teams, they never update vendored packages, and security and stability issues go unnoticed. For us, one of our core dependencies is Kubernetes. So updating that one dependency requires checking out around 200 additional packages. Thus, updating is a non-trivial (quite time-consuming) task.

  2. Working with dependent packages is harder. Say I have two packages: A and B. And A depends on B. Working on problems between A and B gets a little tedious, since I can't just cd into vendor/B and git checkout my working B branch. Instead, I have to make my changes to B, commit them upstream, then destroy my local copy, clone B (and any of its dependencies), test and repeat.

  3. We end up with the same vendoring tree problem we ran into with tree-based versioning, where if A depends on B and C, and B vendors C, A must have two incompatible (by gopath) versions of C. Glide solves this by collapsing dependencies for us, but that basically entails not checking in those dependencies (or else dynamically removing subtree dependencies on build). Feel free to test this out. It's a noxious anti-pattern that has some really odd side effects (try mocking etcd's Client library to see why this is frustrating).

Reasons to Grab the Dependencies as VCS Instances

We track our dependencies as full Git repos built by the developer because:

  • It keeps the repo cleaner and smaller
  • It avoids the nasty dependency nesting issue in (3)
  • It makes it easier to work with dependencies over the long term, since each dependency is managed simply by using the VCS tools to update it, set versions, etc.
  • It provides us with all the underlying capabilities of the VCS, like version checking, bisecting, working with branches and tags, and so on.

As far as I can tell, the only downside to this is that initial checkouts from scratch take longer. Yes, it's a pain (especially if you don't use a local cache), but just that first time. The alternative of checking code in comes with more difficulties, and difficulties that aren't solved by grabbing a cup of coffee and waiting.

While Glide supports checking in vendor directories, we often warn people about what they're getting into if they decide to check the code in. It might seem simpler at first, but it tends to lead to longer term maintenance issues.

Ways to Speed Up the Process

The following are a few simple ways to speed up fetching dependencies:

  • Use a local cache (good when many projects use the same dependencies, which is our case)
  • Keep a tarball of a project's core dependency VCS trees (see below)
  • Copy in and out of GOPATH. It's clunky, and not as good as a cache (because it makes your entire Go environment more volatile)... but it can work
  • Possibly prebuilding .a or .so files might make it easier to share. Not sure how much gain you'd get from this, though.

For large dependency trees, I suggested the possibility of storing a "starter tarball" that contains a big tarred copy of all ofthe Git repos. The reason this is different than checking these into the VCS is that these are entire repositories that are "reconstituted" in full with no lossy side effects. Really, the tarball idea is a prototype for what Gem/Bundler did to expedite installing the same sets of dependencies for common platforms like rails.

Once again, it's important to note that the core problem that leads us to this conundrum (just like the monorepo vs. microrepo conundrum) is that our most common VCS tool, Git, foists this choice on us. If it had decent support for nested VCS trees, this would not be a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment