Skip to content

Instantly share code, notes, and snippets.

@sdboyer
Last active September 30, 2019 05:30
Show Gist options
  • Save sdboyer/b0813bf2b9dba58a335a85092085472f to your computer and use it in GitHub Desktop.
Save sdboyer/b0813bf2b9dba58a335a85092085472f to your computer and use it in GitHub Desktop.

When the manifest file of my project/the root project, call it A, contains a dep/constraint statement for another project, say X, that doesn't appear in A's import statements, there are three modes in which tools can interpret that:

  1. X must be present in the result (A's lock+vendor), and it must meet the version constraint.
    • (There's a variant where it works like this when it’s the root manifest giving the dep on X, but if it’s a non-root manifest, things work like 2)
  2. X needn't be present in the result, and even if it is, it needn't meet the constraint. (Basically, the constraint is cruft and we ignore it)
  3. X needn't necessarily be present in the result, but if it is (because some other dep actually does import X), then it must meet A's stated constraint.
    • (Again, there's a variant where it works like this for the root manifest, but like 2 for a non-root manifest.)

The first mode's biggest benefit is up-front intuitiveness for users. As we've discussed, it corresponds nicely to a get or add-type command: having X in the manifest is sufficient to guarantee that X will be in the results, so running get becomes sorta 'fire and forget': X will always be a part of the lock (and vendor), until rm'd. That's probably the closest to Go folks' mental model right now, which I believe is very much oriented towards concrete questions of "what's on my disk." The tradeoff, of course, is that X HAS to be explicitly rm'd in order to go away - no automatic removal when it's no longer in the import graph.

Unfortunately, that's structuring an opportunity for cruft, and dependency hell, right into the tool.

There are a number of lesser arguments for why this might be a problem, but the crucial one is the situation it creates for A's dependers, not A itself. Say we're a project, B, and we depend on A. If A has one of these stale dep declarations on X, as well as a direct or transitive (not through A) dependency on X, then we have a new failure mode: B's dep resolution could fail due to conflicts on X that literally do not matter because A doesn't actually need X.

To solve this problem, B's author has two choices:

  • Set an override for the constraints on X in order to supercede A. Drawbacks:
  • B's author has extra legwork to do to keep track of whether A ever fixes the stale dep
  • B's override will also mask any new/future deps that might declare a constraint on X
  • Any of B's would have to pull that override on X up for their own dep resolution to work
  • Fork A, fix the bad declaration, then swap in the fork
  • All of the drawbacks with overrides
  • Also, less possibility of even notifying the user that A may have fixed its stale dep, because the fork will presumably not automatically chase changes in A

Forcing the user to confront these choices is a circle of dependency hell, even when conflicts are real. Doing so for a "phantom" conflict, like A -> X, makes it even worse. And worse still is that the opportunity for phantom conflicts cascades into any deps that X has: if X -> Q, and B -> Q, then that could also be a conflict.

But wait! It gets even worse!

Let's say A is a project containing a tree of packages - say A and A/foo - with just a single manifest at the root (as I believe our discussion of agrees on). Specifically:

  • A's manifest declares deps/constraints on X, Y, and Z
  • A imports A/foo and Y
  • A/foo imports Z

Not a crazy case, especially if A is a main pkg, and A/foo is a supporting lib.

Now, suppose that B imports A/foo, but not A. X is still clearly in the "constrained but not imported" category, Y in the "constrained and imported," but Z is in between: "constrained and imported in A's tree, but not by a package we actually use." In mode 1 thinking, do we still bring in both X and Z? Or maybe we analyze the whole A tree, see that Z is actually used by A, but B doesn't need that, so...in a tremendously weird outcome, we skip Z, but still get X and its deps?

That seems bizarre. And yet, we're still not done! (Just one more layer of fugliness, I promise)

X, of course, could be a tree of packages, too - X, X/bar, X/baz. If A's manifest declares a dependency on X, but its packages don't import it, then we have no guidance on which of those packages we should actually look in for imports. So, what to do? Follow the imports in all of its packages? (glide used to do something like that; when they fixed it, my dep count on one project went from 17 to 3, and install time from 1m to 6s)

Maybe we ignore all the imports, and only look at the declarations in A's manifest? Well, that's a problem, because if we agree that repo is the unit of exchange, and that we don't allow packages at different versions from one repo, then manifests should only declare deps on the root import path (i.e. A's manifest declares its dep on X, never X/bar and/or X/baz); allowing manifest deps on packages just creates new failure modes. The only other option would be to explicitly include used subpackages in the manifest...at which point we've fully duplicated the import graph's information, effectively meaning Go developers would have to write all their import paths twice.

This is absurd.

--

There are two fundamental problems here, both of which arise from two distinct concepts being complected together. The first are requirements and constraints; the second, projects and packages.

Import declarations are requirements. I don't think there can be much debate on that. There is no situation in which a spec-compliant Go compiler will see an import statement, fail to find the corresponding code, then say, "it's cool, we can still proceed." Imports are also not constraints (except gopkg.in, ugh). The question is, what are manifest dep declarations - constraints, or constraints AND requirements? Mode 1 says the latter; modes 2 and 3 say the former. My heavy preference is for the former, because it will make manifest and import declarations orthogonal.

Now, Jess (rightly) raised the issue of my sketched-out tool having more than one way of doing things on Thursday (both add/rm and sync), highlighting that this runs counter to the Go ethos of orthogonality. And I agree - those commands, as proposed, have weird overlap. But I keep returning to the general approach because I'm focused on this deeper orthogonality, not in the user-facing commands, but in system structure and state. It seems to me that we have wiggle room on what the commands are and how to make them intuitive, whereas we have rather immutable choices (unless I've missed something?) when it comes to orthogonality in the system's primitives.

The second part is the conflation of project and package, which IMO just falls out from what we already know. We're accustomed to go get's behavior, which allows you to specify a package name. But, as I pointed out above, if we accept repos/trees as the unit of exchange and versioning, and disallow using packages from the same unit at different versions, then manifest declarations should only point at projects (i.e., their root import path). That makes them a critically inadequate means of expressing requirements.

Even if we aim for orthogonality here, there's no reason that a get/add command couldn't still fetch down a repo and update the manifest. It'd just drop it the repo in a central cache. Maybe, it could even write out the lock and vendor to include that project - perhaps comforting to the user? idk. I wrote an issue on gps back in June specifically about this: sdboyer/gps#42

Also, there's no reason there can't be other directives in the manifest that behave as requirements. In fact, doing so is necessary to meet some of our use cases - if we have a case where we need to ensure a main package from another project, e.g., tinylib/msgp.

Ultimately, this is about choosing between an up-front command UX problem - which we can actually do something about - versus a down-the-road complexity growth and absurdity problem, over which I think we'd have very little control.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment