Skip to content

Instantly share code, notes, and snippets.

@datagrok
Last active November 3, 2023 17:37
Show Gist options
  • Save datagrok/8577287 to your computer and use it in GitHub Desktop.
Save datagrok/8577287 to your computer and use it in GitHub Desktop.
"Vendoring" is a vile anti-pattern

"Vendoring" is a vile anti-pattern

What is "vendoring"?

From a comment on StackOverflow:

Vendoring is the moving of all 3rd party items such as plugins, gems and even rails into the /vendor directory. This is one method for ensuring that all files are deployed to the production server the same as the dev environment.

The activity described above, on its own, is fine. It merely describes the deployment location for various resources in an application.

However, many programmers have begun to commit the result of the above procedure to their own source code repositories.

That is to say, they copy all of the files at some version from one version control repository and paste them into a different version control repository.

You should have flinched reading the previous description. This practice is more vile than extending existing code by duplicating whole functions instead of using inheritance and abstraction, and for the same reasons.

Why is "vendoring" bad?

Extracting code from a version control repository to be archived in a compressed file or stored in a particular directory for deployment is not bad.

Extracting code from a version control repository to be re-comitted to a different version control repository is evil.

When you copy code between repositories:

  1. all history, branch, and tag information is lost
  2. pulling updates is impossible
  3. it invites modification, divergence, and unintentional forks
  4. it wastes space
  5. it is excruciatingly tedious to discover later which version of the code was the canonical source of the copied version, unless the person doing it went out of their way to document that information

What you should do instead

Use git submodules

Git's submodule mechanism stores a URL and a commit hash in your repository, which itself is under version control.

(TODO: explain more, examples of work-alikes from different VCSs)

Use an approximation of git submodules

If you can't use git submodules, use a script that deploys your third-party resources at the appropriate time, from a canonical source.

(TODO: describe more, provide examples)

None of those suggestions work for me

If you have a situation where it seems like "vendoring" is really the best way to deploy your code, contact me, call me an idiot, and describe why. I'm sure there's a better way, and where there's not, it's a bug. I hope to eventually document all such situations to prevent people from falling into this bad habit.

Guilty parties

@gepoch
Copy link

gepoch commented Apr 6, 2015

@datagrok

Another guilty party: bower

As far as Go is concerned, you may be aware that the community is taking steps to adopt vendoring in a more official capacity

But I think the direction that Go is taking things, may actually address some of your counterarguments. Responding to your list:

  1. True! Kind of. It still exists, just not in your repository. Go would maintain a URL for the repository and ref that your code was retrieved from, so you'd be able to find it if you needed to. And really, isn't this is actually the same boat that you would have been in, had you installed it via a package manager? This basically falls under your "using an approximation of git submodules".
  2. I would say, the metadata file + godep (or whatever tool is eventually added in the toolchain) makes it quite easy to update. You just change the ref in the metadata to point to the tag you want, run godep, and commit. This is undermined by 3, of course :)
  3. This remains a vulnerable point. Here, you're basically only protected by idioms. (DON'T TOUCH THE VENDOR FOLDER!) Especially worrisome is the laziness of programmers. Rather than go through the trouble of a pull request, they may just fix it locally, and commit it. HOPEfully things are passed up the chain.
  4. Well.... Space is cheap. Git is very clever at packing things over the wire. Installed dependencies "waste" space also. True, but I don't know how much this is really going to hurt me in the real world.
  5. Going back to one, making the metadata encoding explicit from the start on the happy path of the toolchain fixes this, but I'm still afraid of 3.

I definitely haven't made my mind up on this. We're working on coming up with Bower usage standards at my company at the moment, and the vendoring debate is looming large.

While I agree that careless vendoring is pretty terrible, careful well designed vendoring seems to defeat a number of these concerns. Although you're always going to lack the certainty that you're running an honest-to-god cannonical version of something. This strikes me as the only issue that you can't quite work around..

Thoughts?

@pote
Copy link

pote commented Jan 30, 2016

I agree completely. Is this gist a draft for a blog post? I'd very much like to link to it once there's a final version published somewhere.

@jsrodman
Copy link

Some code isn't provided in a repository form in the first place. I don't see the advantage in downloading an unchanging zip over http everytime the build runs.

@capnslipp
Copy link

capnslipp commented Apr 1, 2017

This post reeks of either theory-based software engineering or software dev inexperience.  In the real-world, vendoring works far better than any practical alternative at making sure the build works reliably & consistently on every team member's machine— which is an essential component for delivering products that meet-or-exceed expectations on-or-ahead of time.

Nearly all your “bad” reasons (1, 2, & 5) apply only to willy-nilly by-hand vendoring.  In the real-world, vendoring is done with the aid of a dependency tool that tracks the current vendored version, and can update based on the original source repo's version-tag or SHA.  git-subtree, bundler, Carthage are a few examples of tools that either keep a ….resolved file or store the appropriate source-repo metadata in the vendoring-commit.  Personally, I highly recommend git-subtree— it's now officially part of Git, it's not dependent on any language/library/OS/toolchain other than Git itself, and it handles updates flawlessly (because it uses the same SHA-diffing algorithm that Git uses for everything).

“Bad” reason #4 is a moot issue— the types of things one wishes to vendor are typically small (i.e. code libraries; not images/audio/video/PSDs/etc.), Git in general does a fantastic job of space-saving, and modern computers have tons of space and speed relative to that which is required for software projects & libraries.

“Bad” reason #3 is complete B.S.  Vendoring is the packing-in of the source of another repo at a specific commit— not making changes to that code.  If this is a real concern to you, seriously, you're doing it wrong.  In the real-world if you need to make modifications to the library you're vendoring one forks the library repo, making the changes in the fork, and then vendors their fork.  When it comes time to update the vendored library, the normal habits of maintaining a fork apply— rebase your forked changes forward onto the latest from the origin repo; run all library tests and test against your project; push to your fork; vendor into the project.  Furthermore, git-subtree offers the ability to push commits that have affected the vendored code back to the repo they were vendored from, making unruly-devs-who-can't-follow-rules a non-issue.

So I'm sorry, the drawbacks you've listed aren't actually problems with vendoring at all, and vendoring provides a great solution for multi-person projects — especially one where some members aren't programming-tech-savvy — to just get the job done.  Above all else, vendoring ensures that source control holds to it's advertised benefit that one can checkout an old version of the codebase and see things exactly as they were at the time.  Without vendoring, all too often “going back in time” leads to projects that won't build or have bugs that didn't exist prior because of incompatible minor version bumps in libs between libs & tools, because of lack of availability of lib versions (e.g. deleted branches), because of API-OS-toolchain-lib incompatibilities, etc.  Vendoring is a It Just Works™ solution.

I implore you to educate yourself on how many of the software development dependency managers have implemented reliable vendoring, and the diligent consideration and work those projects have put in to specifically circumvent the problems you're FUDding over.  The only community-harmful vileness present here is your gist-post of misinformation.

P.S. git-submodule is a horrible tool to choose for vendoring.  It's been criticized quite a bit for the number a ways it make SCM more difficult; I'm not going to go into that here.  This blog post is specifically oriented to using git-subtree instead of a git-submodule.  Seriously, please go and read why git-submodule will cause you more headaches than it's worth.  You'll be doing yourself — and everyone else who still believes that hunk of Git legacy is worthwhile — a huge favor by understanding its faults and quitting promoting it as a viable solution.

@benhardy
Copy link

benhardy commented Feb 12, 2018

First off, I can confirm that git-submodule is awful to work with. :-)

I've extensively worked with both vendoring and cached repo based build tools like Maven. The OP is clearly frustrated but is also not using best vendoring practices, as pointed out (but that's no excuse for ad-hominem attacks). Still, if I may chip in my 2c this late, I don't think vendoring is an inherently superior method of dependency management.

_vendoring works far better than any practical alternative at making sure the build works reliably & consistently on every team member's machine— _

Vendoring is no more effective for achieving build reproducibility than, say, a repository based system with fixed version numbers such as Maven.

Vendoring is no more effective for building offline than any other build system with a cache. Vendoring is a cache, Maven's ~/.m2/repository is a cache, etc. Being offline only matters when you're changing a dependency, and you're hosed with both methods in that case.

Vendoring is not necessary for diagnosing problems in library code. Any decent IDE will have a debugger that will give you the option of pulling down library source when stepping through.

More subjectively speaking -

  • The remote possibility of forking a library is no reason to clutter up my project with not just its source, but that of all its transitive dependencies. They can be unrelated things and that's OK. Vendoring is handy in this case for the code you want to execute being right in place, unless you're using compiled libraries, which is pretty normal outside of the interpreted language miniverse.
  • When I'm doing a code review, the last thing I want to have to deal with is reams of vendor code. I don't want to waste time that could be spent delivering business value looking at irrelevant commits and files.
  • Try to introduce all the fancy git tricks you want, the reality of this industry is that the vast majority of us are working on legacy projects that we didn't start, so we can't usually boil the ocean on dependency management, especially with large projects.

... Vendoring is a It Just Works™ solution.

It's one.

The back-in-time problem has no reasonable panacea, precisely because of the _ API-OS-toolchain-lib incompatibilities_ mentioned. If some of my team's machines have all moved on to different OS versions or patch levels, 100% old build reproducibility "just working" cannot be guaranteed. Even so, for me personally, this isn't a big or frequent enough problem for me to want to clutter up my repos with vendor code, because the proffered advantages of doing so aren't worth the cost to me.

Every tool and technique has flaws and tradeoffs to make. Including the ones that you and I am familiar and efficient with. This little cognitive bias is worth considering: Golden Hammer.

@datagrok
Copy link
Author

Nearly all your “bad” reasons (1, 2, & 5) apply only to willy-nilly by-hand vendoring. In the real-world, vendoring is done with the aid of a dependency tool ... If this is a real concern to you, seriously, you're doing it wrong.

I wrote this four years ago in a fit of frustration because i kept encountering instances of "doing it wrong" -- what @capnslipp calls "willy-nilly by-hand vendoring," complete with local modifications to the vendored code, not in a forked repo, that he says is 'B.S." that you don't find in the real world. One such occurrence of this essentially locked my team's primary application to an outdated version with security holes, since we couldn't easily upgrade its dependencies. I was horrified, and since I made the noise about it, I was assigned the ticket to clean up the mess.

I couldn't find any tools at that time that would manage the problem in an automated way (for the language we were using), but I did see many instances of questionable vendoring practices being used in open source projects. I became concerned that if people didn't find a different mechanism or at least do it with some kind of proper record-keeping, I'd keep encountering that sort of mess.

One of the reasons this remained an obviously unfinished gist rant and not properly published anywhere is because I never found the time to do that exhaustive survey of software development dependency managers which might (in 2014?) have quelled my fears. My career went a different direction and I haven't had to deal with the issue since then, so I promptly forgot this existed.

I can't blame @capnslipp too much for the combative attitude and casting aspersions on me, since I set the mood with that antagonistic title and strongly-worded assertions. I regret the tone I used. This whole thing probably could have been: "don't do vendoring by-hand; use a dependency-tracking tool instead, and don't touch the vendor folder."

I was recently thinking to update this gist to that end, and include the helpful advice that people have commenters have given over the years. But if most shops are now aware of and following vendoring best-practices it's probably better just to delete it.

@kingdonb
Copy link

I want to thank you for writing this, I wrote a ham-handed proposal for vendoring in our current framework to help enforce a "build-release-run" separation, like you get on 12-factor platforms, and I knew it was a bad idea to cobble this onto our existing legacy bespoke deployment infrastructure, but I couldn't quite bring myself to spend any time articulating why (so I almost didn't even share the idea.)

But then I plugged into the Google and came here, to see all of the reasons I subconsciously already knew that my idea was bad, but neatly articulated and so, I commend you, saved me the time of debunking my own straw-man. Vendoring your gems is not bad. Vendoring your gems in the project repo, definitely bad. That vendor/cache should be in gitignore. What I was proposing was meant to be an example of "how we could improve things slightly, without code changes to our deployment systems, simply by asking developers to change their habits a little bit – but WAIT! Don't actually do this, it's a totally half-baked idea and there are many ways it can go wrong, the quick wins and illusion of forward progress is a tempting oasis, and it isn't real."

We're hoping to adopt a Platform of some kind off-the-shelf and I was intending that people would see, while yes, we could address each problem with our current system, one by one, many have already done this and we should learn from their experience before we repeat those mistakes. We could boil the ocean, or we could pick something off-the-shelf like we planned to do in our project charter.

@crd
Copy link

crd commented Aug 26, 2019

I can't blame @capnslipp too much for the combative attitude and casting aspersions on me, since I set the mood with that antagonistic title and strongly-worded assertions. I regret the tone I used. This whole thing probably could have been: "don't do vendoring by-hand; use a dependency-tracking tool instead, and don't touch the vendor folder."

Just here to say that I appreciated how well @datagrok handled the criticism -- nice to see.

@LambertGreen
Copy link

Thanks for not deleting this. I am investigating options in this space, and it is all new to me, and so it was great to find this post and the discussions.

@sparr
Copy link

sparr commented May 31, 2023

History can be preserved in a few different ways, most of which are given as answers here https://stackoverflow.com/questions/1365541/how-to-move-some-files-from-one-git-repo-to-another-not-a-clone-preserving-hi

@James-E-A
Copy link

I'd love to use git submodule for this, except none of the major git server software actually mirrors the pinned commit. If the linked remote goes down or erases the commit I depended on for whatever reason, that'll break clone --recurse on my own repo, which is very cringe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment