Hare I give a brief overview of three types of dependency resolution algorithms that are common: naive
, backtracking
and multiple versions
. I present a few scenarios that showcase difficulties with dependency resolution, and how the resolution algorithm deals with them.
I've been reading from
- http://technosophos.com/2015/09/02/dont-let-go-be-condemned-to-repeat-past-mistakes.html
- https://iscinumpy.dev/post/bound-version-constraints/
- https://news.ycombinator.com/item?id=12187888
- https://go.dev/blog/versioning-proposal
Further reading https://semver.org/
TL;DR; A package version consists of three parts <major>.<minor>.<patch>
, so package Foo
version 1.2.3
is major version 1
, minor version 2
and patch version 3
. If a new bug is discovered and fixed in Foo 1.7.1
then Foo 1.7.2
is released with the new bug-fix. If instead a new feature is added then Foo 1.8.0
is released. If Foo
is overhauled and the API is changed such that it is no longer compatible with the old one then Foo 2.0.0
is released.
There is more to semver than that, especially worth noting is that behavior is different for versions of the form 0.*.*
If I know that my package works with Foo 1.7.3
then, assuming that Foo
follows semver I can assume that it will be compatible with a future Foo 1.8.0
release as well, in fact, I can assume that it will be compatible with versions >=1.7.3
and <2.0.0
.
Let's say Foo 1.7.3
exposes two functions: bool isEven(int)
and bool isOdd(int)
. Our package depends on Foo
for the isEven
function. In the future, the author of Foo
removes isOdd
(just use ! isEven
instead) and releases Foo 2.0.0
.
According to semver we should depend on Foo >=1.7.3, <2.0.0
only, but in practice, although there was a braking change in Foo 2.0.0
, our code wasn't affected by that, so we can actually depend on Foo >=1.7.3, <3.0.0
. In fact, maybe isEven
is such a core part of Foo
that we can count on it never getting changed, then we can depend on Foo >=1.7.3
! Semver compatibility and actual comparability is not always the same thing!
flat dependency hierarchy's is how pip
based package managers work, imagine our package:
# fpm.toml
name = "my-pkg"
[dependencies]
pkg-a = ...
pkg-b = ...
Let's also assume that both pkg-a
and pkg-b
depend on shared common-utils
pacakge. In a flat dependency hierarchy we have this:
my-pkg
+- pkg-a
+- pkg-b
+- common-utils
This means that pkg-a
and pkg-b
must use the same version of common-utils
.
Cargo
and npm
based package managers use a nested dependency hierarchy, with the example above, it means that our dependencies look like this instead:
my-pkg
+- pkg-a
| +- common-utils
+- pkg-b
+- common-utils
So pkg-a
and pkg-b
can have their own versions of common-utils
. It's still possible though that they end up using a shared version, but the possibility of two different versions exists.
TODO Prompted by this article https://go.dev/blog/versioning-proposal
This is how pip
does it[citation needed]
Let's say we have the following dependency tree
my-pkg
+- pkg-a >=1.0.0 <2.0.0
| +- common-utils >=1.0.0 <2.0.0
+- pkg-b >=1.1.0 <2.0.0
+- common-utils >=2.0.0 <3.0.0
We can see that we cannot pick a version of common-utils
that makes everybody happy: If we pick common-utils 1.2.3
then that won't be compatible with pkg-b
. If we pick common-utils 2.0.1
then that won't be compatible with pkg-a
.
A naive approach will either fail here, leaving it up to the user to manually pick a version of common-utils
, or it would just pick some version of common-utils
anyways, which may or may not work, if it doesn't work then the user is in a thought spot.
This is how conda
does it[citation needed]
If we run into the scenario above a backtracking resolution algorithm will see if there are older versions pkg-a
and pkg-b
such that we can satisfy all the constraint. For instance it could be that the slightly older pkg-b 1.0.0
depends on the older common-utils 1.2.3
. We successfully found versions that all agree with one another and end up with the dependencies
pkg-a >=1.0.0 <2.0.0
pkg-b 1.0.0
common-utis 1.2.3
This is how cargo
and npm
does it.
Instead of backtracking like the earlier example we simply allow pkg-a
and pkg-b
to each have their own version of common-utils
that only they can see. We would en up with the dependencies
pkg-a >=1.0.0 <2.0.0
pkg-b >=1.1.0 <2.0.0
common-utis >=1.0.0 <2.0.0
common-utis >=2.0.0 <3.0.0
Here I will list some common (and uncommon) packaging scenarios, and how the three resolution algorithms handles them.
This is the exact example I used to describe the algorithms above, I defer to that.
We have
my-pkg
+- pkg-a >=1.0.0 <2.0.0
| +- common-utils >=1.0.0 <2.0.0
+- pkg-b >=1.1.0 <2.0.0 +- common-utils >=2.0.0 <3.0.0
but actually pkg-a
is compatible with 2.*.*
versions of common-utils
in practice.
Naive algorithm either does the right thing (pick a shared 2.*.*
version of common-utils
) by accident, or it fails, in which the user has to manually pick a version for common-utils
.
Backtracking might succeed by picking old versions of pkg-a
and/or pkg-b
, or it might fail. This is problematic in the case that old versions of the packages have security issues that haven't been patched (because they're old)
Multiple versions will just pick two different versions of common-utils
.
In some sense the original problem is that the version constraints on common-utils
where too narrow, had they been relaxed, we would have been able to pick a 2.*.*
, or even 3.*.*
version of common-utils
, and it could have just worked, of course it could also have caused problem. But it's generally easier/more intuitive for a user to manually select a version of common-utils
(the naive case) than to override version constraints on indirect dependencies (backtracking and multiple versions case).
Won't go into details here, I don't find it super important, but basically https://xkcd.com/1172/
Imagine my-pkg
depends on common-utils 1.0.0
which exposes a meters
and inches
datatype my-pkg
re-exposes the meters
type and uses it in the functions signature meter measure_distance(A x, B y)
.
Now common-utils 2.0.0
is released, and it as prefixed all SI-units, to now we have si_meters
. Would it be a breaking change for my-pkg
to update it's dependency and expose si_meter measure_distance(A x, B y)
instead?
I don't know how the different resolution schemes handle this.
Imagine the situation above but instead of prefixing SI units we're actually prefixing imperial units, so the 2.0.0
change is renaming inches
to I_inches
, so the signature of my-pkg
's measure_distance
function remains unchanged, is it a breaking change for my-pkg
do upgrade from common-utils 1.0.0
to 2.0.0
?