Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
"Vendoring" is a vile anti-pattern

"Vendoring" is a vile anti-pattern

What is "vendoring"?

From a comment on StackOverflow:

Vendoring is the moving of all 3rd party items such as plugins, gems and even rails into the /vendor directory. This is one method for ensuring that all files are deployed to the production server the same as the dev environment.

The activity described above, on its own, is fine. It merely describes the deployment location for various resources in an application.

However, many programmers have begun to commit the result of the above procedure to their own source code repositories.

That is to say, they copy all of the files at some version from one version control repository and paste them into a different version control repository.

You should have flinched reading the previous description. This practice is more vile than extending existing code by duplicating whole functions instead of using inheritance and abstraction, and for the same reasons.

Why is "vendoring" bad?

Extracting code from a version control repository to be archived in a compressed file or stored in a particular directory for deployment is not bad.

Extracting code from a version control repository to be re-comitted to a different version control repository is evil.

When you copy code between repositories:

  1. all history, branch, and tag information is lost
  2. pulling updates is impossible
  3. it invites modification, divergence, and unintentional forks
  4. it wastes space
  5. it is excruciatingly tedious to discover later which version of the code was the canonical source of the copied version, unless the person doing it went out of their way to document that information

What you should do instead

Use git submodules

Git's submodule mechanism stores a URL and a commit hash in your repository, which itself is under version control.

(TODO: explain more, examples of work-alikes from different VCSs)

Use an approximation of git submodules

If you can't use git submodules, use a script that deploys your third-party resources at the appropriate time, from a canonical source.

(TODO: describe more, provide examples)

None of those suggestions work for me

If you have a situation where it seems like "vendoring" is really the best way to deploy your code, contact me, call me an idiot, and describe why. I'm sure there's a better way, and where there's not, it's a bug. I hope to eventually document all such situations to prevent people from falling into this bad habit.

Guilty parties

ngrilly commented May 6, 2014

Vendoring is still useful when you want to be able to build and deploy your application even when one dependency is not available (its server is down or the dependency was deleted by its author...). How would you solve that without vendoring?

Owner

datagrok commented Jun 15, 2014

@ngrilly create a cache or a mirror between the installer and the dependency's server. Or, if we're talking about a piece of software provided to a third party to install, provide the dependency's packages along with your own.

I suppose if those practices fall under your definition of the term "vendoring," then it's okay.

The practice I'm railing against is the intentional breaking of the association between source code files and their canonical version control repositories, by copying files that are under version control into packages or different version control systems without making careful references back to the origin repos (as git submodules do).

rbucker commented Jan 27, 2015

there are too many reasons why vendoring is not going to work with golang. In particular there are (1) versions of the dependencies (2) cross repo server(bitbucket vs github) (3) cross repo type (bzr vs git)

gepoch commented Apr 6, 2015

@datagrok

Another guilty party: bower

As far as Go is concerned, you may be aware that the community is taking steps to adopt vendoring in a more official capacity

But I think the direction that Go is taking things, may actually address some of your counterarguments. Responding to your list:

  1. True! Kind of. It still exists, just not in your repository. Go would maintain a URL for the repository and ref that your code was retrieved from, so you'd be able to find it if you needed to. And really, isn't this is actually the same boat that you would have been in, had you installed it via a package manager? This basically falls under your "using an approximation of git submodules".
  2. I would say, the metadata file + godep (or whatever tool is eventually added in the toolchain) makes it quite easy to update. You just change the ref in the metadata to point to the tag you want, run godep, and commit. This is undermined by 3, of course :)
  3. This remains a vulnerable point. Here, you're basically only protected by idioms. (DON'T TOUCH THE VENDOR FOLDER!) Especially worrisome is the laziness of programmers. Rather than go through the trouble of a pull request, they may just fix it locally, and commit it. HOPEfully things are passed up the chain.
  4. Well.... Space is cheap. Git is very clever at packing things over the wire. Installed dependencies "waste" space also. True, but I don't know how much this is really going to hurt me in the real world.
  5. Going back to one, making the metadata encoding explicit from the start on the happy path of the toolchain fixes this, but I'm still afraid of 3.

I definitely haven't made my mind up on this. We're working on coming up with Bower usage standards at my company at the moment, and the vendoring debate is looming large.

While I agree that careless vendoring is pretty terrible, careful well designed vendoring seems to defeat a number of these concerns. Although you're always going to lack the certainty that you're running an honest-to-god cannonical version of something. This strikes me as the only issue that you can't quite work around..

Thoughts?

pote commented Jan 30, 2016

I agree completely. Is this gist a draft for a blog post? I'd very much like to link to it once there's a final version published somewhere.

Some code isn't provided in a repository form in the first place. I don't see the advantage in downloading an unchanging zip over http everytime the build runs.

capnslipp commented Apr 1, 2017

This post reeks of either theory-based software engineering or software dev inexperience.  In the real-world, vendoring works far better than any practical alternative at making sure the build works reliably & consistently on every team member's machine— which is an essential component for delivering products that meet-or-exceed expectations on-or-ahead of time.

Nearly all your “bad” reasons (1, 2, & 5) apply only to willy-nilly by-hand vendoring.  In the real-world, vendoring is done with the aid of a dependency tool that tracks the current vendored version, and can update based on the original source repo's version-tag or SHA.  git-subtree, bundler, Carthage are a few examples of tools that either keep a ….resolved file or store the appropriate source-repo metadata in the vendoring-commit.  Personally, I highly recommend git-subtree— it's now officially part of Git, it's not dependent on any language/library/OS/toolchain other than Git itself, and it handles updates flawlessly (because it uses the same SHA-diffing algorithm that Git uses for everything).

“Bad” reason #4 is a moot issue— the types of things one wishes to vendor are typically small (i.e. code libraries; not images/audio/video/PSDs/etc.), Git in general does a fantastic job of space-saving, and modern computers have tons of space and speed relative to that which is required for software projects & libraries.

“Bad” reason #3 is complete B.S.  Vendoring is the packing-in of the source of another repo at a specific commit— not making changes to that code.  If this is a real concern to you, seriously, you're doing it wrong.  In the real-world if you need to make modifications to the library you're vendoring one forks the library repo, making the changes in the fork, and then vendors their fork.  When it comes time to update the vendored library, the normal habits of maintaining a fork apply— rebase your forked changes forward onto the latest from the origin repo; run all library tests and test against your project; push to your fork; vendor into the project.  Furthermore, git-subtree offers the ability to push commits that have affected the vendored code back to the repo they were vendored from, making unruly-devs-who-can't-follow-rules a non-issue.

So I'm sorry, the drawbacks you've listed aren't actually problems with vendoring at all, and vendoring provides a great solution for multi-person projects — especially one where some members aren't programming-tech-savvy — to just get the job done.  Above all else, vendoring ensures that source control holds to it's advertised benefit that one can checkout an old version of the codebase and see things exactly as they were at the time.  Without vendoring, all too often “going back in time” leads to projects that won't build or have bugs that didn't exist prior because of incompatible minor version bumps in libs between libs & tools, because of lack of availability of lib versions (e.g. deleted branches), because of API-OS-toolchain-lib incompatibilities, etc.  Vendoring is a It Just Works™ solution.

I implore you to educate yourself on how many of the software development dependency managers have implemented reliable vendoring, and the diligent consideration and work those projects have put in to specifically circumvent the problems you're FUDding over.  The only community-harmful vileness present here is your gist-post of misinformation.

P.S. git-submodule is a horrible tool to choose for vendoring.  It's been criticized quite a bit for the number a ways it make SCM more difficult; I'm not going to go into that here.  This blog post is specifically oriented to using git-subtree instead of a git-submodule.  Seriously, please go and read why git-submodule will cause you more headaches than it's worth.  You'll be doing yourself — and everyone else who still believes that hunk of Git legacy is worthwhile — a huge favor by understanding its faults and quitting promoting it as a viable solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment