Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save srcspider/17591a43cf6b1b13c542 to your computer and use it in GitHub Desktop.
Save srcspider/17591a43cf6b1b13c542 to your computer and use it in GitHub Desktop.
Draft Spec for Community Dependency Management for the Go Language
Specifically written for the Go team. Feel free to fork.
It's unproductive to have preconcieved notions so the following assumes
"any language" and ignores "current language limitations" though Go and node
among others are references quite a bit in some examples where real world
situations are relevant to the point made. Also, the most unideal circumstances
are taken as the "default working circumstances," since that's what serves the
community best.
Community Dependency Management
-------------------------------
A depependency management system should do the following:
- pull all the dependencies
- error out when the dependencies are not met
- provide dependency safe-states (commonly .lock files)
- isolate dependency persistence to the project directory
- read/parse remote resources (this includes git, svn, etc repos)
And optionally, and very desirably
- "help" easily and reliably get dependencies (including private ones)
- provide an easy to blog/writeout dependency grabbing
syntax (eg. cmd install A --save)
- not force the user to use only one source for dependencies (github is not
the end all be all, same for any sort of "central system"), though having an
official source can be helpful in a lot of situations
Actors
======
The following are responsible for "managing dependencies"
- the "dependency system"
- the language (LANGUAGE)
- the dependencies (LIB)
- the developer trying to manage his dependencies (USER)
Dependency resolution is a problem that effects everyone. It is not something
that any one entity can or should be responsible on it's own. The language is
included because a language is nothing but a toy if it can not be used in a
real world production environment or can only be used under very specific
circumstances.
There is a 5th actor involved in the process,
- the developer that shortcircuits/sabotages your system for the purpose of
"getting the job done," be that with good intentions or malecious laziness
(aka. the BADUSER)
I mentioned this seperatly as this only matters for one thing: how strict or
permisive you make any part of the dependency system.
The rule of thumb is this: if you are NOT sure it's impossible for a BADUSER
to sabotage a part of your system, make that part of the system permisive
so the that the BADUSER stays as a (good) USER and therefore control stays in
the hands of the dependency system and not "undefined" as that doesn't help
anyone (except the BADUSER, which doesn't care)
Bad practices when done by enough users become defactor standards.
The only ones who stand to lose are the dependency system which now has to be
clasified by the community as having inconsistent/undefined behaviour for all
the cases (much as it might defend itself as X, Y being BADUSERs), and the poor
saps who have to come in after to clean the mess (the good USERs), because you
the dependency system can not help them.
An example: if you strictly do not allow packages that only differ in a bugfix
version that two dependencies ask for, then a malecious user might just forge
the version of one of them in some way resulting in your system now not having
any clue what it is doing; if instead you allow it but show a warning then
everything works and therefore the BADUSER doesn't have to be a BADUSER and if
in the future a good USER comes in to pickup the work and wants/knows how to
fix it they can, as opposed to recieving a project filled with hacks of which
you the dependency system, by not offering the option, are partially
responsible for.
Real world example 1: consider jslint vs jshint. jshint was more or less
developed as a "version with hacks" of jslint and is now more or less defacto
standard with jslint almost never being mentioned as being used. Consider the
implications of that on a packaging tool, your version of the packaging tool
would become unusable if a "hacks version" becomes defacto standard by virtue
of the official version being "unusable" in realworld circumstances.
Real world example 2: coffeescript, sass, etc; life becomes complicated when
the community has to fix "your problems"
Problem 1: Project A depends on B, C and D
==========================================
The USER needs to be able to specify within Project A dependencies in a
preferably human readable format (if possible one that supports comments), to
project B, C and D. We'll refer to this as the DEPMAP to avoid confusion with
other mappings.
In addition,
- the method by which the user specifies his dependencies needs to be capable
of getting persisted with other project source files in a source version
control system
Note: only the project directory (the location of go source files) is
safe enough to assume as "sacred Go land," the dependency system should
not assume anything more. It is not even safe to assume that the
sacred Go land is at the root of the source version control; since
there are patterns where it is not.
eg. server + frontend in the same project under the same source
version control, both in distinct directories that are built into
deployable versions in some other directory in the project via a
3rd party build system not controlled by Go
- the USER needs to have precise control over which VERSION of his
dependencies he is pulling. If he is pulling a "non-version" identifier then
said identifier should just be exactly what it is (eg. master, dev,
[commithash], etc)
- the USER needs to be able to distinguish "trusted" from "untrusted" versions,
which is to say specify that they want version 1.2.3 but do not trust the
authors of the library to properly version or know the authors use a
different versioning system that just looks the same; suggested syntax
#1.2.3, will resolve the same as 1.2.3 only any version logic will not be
executed on it, the version will be interpreted verbatim the same as
"master" or any other symbol
- the USER needs to be able to specify an array of acceptable versions; this
is to empower the user to "manually" achieve single dependency parity by
saying they accept any of a list of symbols (eg. master, dev, etc) which can
be used by the versioning system to mitigate conflicts with
other dependencies
- a LIB needs to be able to specify an array of acceptable versions; this is
to allow for multi-major-version-compatibility; for example a LIB might
depend on a utility library X, in the lib you would have X -> ^1.0. In time
the version of X advances to 2.0 due to changes to some utility functions,
but LIB doesn't actually break from the changes since the functions it
depends on haven't changed and it's tests all pass when using both 1.0 and
2.0 of X, unfortunately the LIB author now has to force their users to
do a lot of undesirable things to help both his X v1.0 users and users who
need compatibility to v2.0. Ideally the author should be able to say he is
compatible with both ^1.0 and ^2.0 of X and not change his version or do
any other unecesary task.
- the dependency system MUST try as hard as possible to avoid fooling the user
of the "stable" nature of the dependencies he is pulling
Example of (very common) bad behavior:
- pulling master branches as default, as if those are stable
- pulling the last tagged version as if there is never going be be
backwards incompatiblity between tags
- making ANY sort of assumtion on behavior/conventions/etc the
dependency is using
It should be noted that forcing some "almighty standard" and then
allowing only based on that is very unproductive in a real world
environment outside of completely closed ecosystems; such as those of
very large corporations (ie. google, facebook, etc). The USER needs to
have the ability to pull from anywhere since denying that right will
just force them to hack their way around it.
- the dependency system must have the ability to pull all dependencies (in
preferably clear source format if available) into the project directory so
that the USER has the choice of saving his dependencies with the project.
No system on Earth is foolproof and for some users even a small downtime in
the service is catastrophic.
- the dependency system should write a "exact dependency mapping" file
(LOCKFILE) after every "dependency refresh." Whenever another USER on
another machine asks for the dependencies to be resolved (assuming
dependencies weren't pushed with the project altogheter) the dependency
system MUST use the LOCKFILE to resolve the dependencies, if the state
of the DEPMAP has not change relative to the state for which the LOCKFILE
was created. The USER of course can explicitly specify to ignore the
LOCKFILE and reprocess all dependencies if she/he wishes.
LOCKFILEs are meant to be commited. By commiting them the user thereby
ensures his team members have a consistent copy (unless there's a error
with LIB which is not entirely the dependency systems problem), as well
as ensure there is a history of "stable states" of what is posibly very
volatile dependencies.
Dependency system is responsible for storing some basic consistency
information with the LOCKFILE and warning the user when they install
the dependencies but get a slightly different version then what was
recorded in the lockfile as corresponding to the given symbol.
eg. LOCKFILE has dependency A as 1.2.4 and a checksum of 1010101 if it
gets a checksum of 100000 after pulling version 1.2.4 into the project
then it warns the user
- the dependency system should allow the USER to split off the dependencies
into a seperate distinct source path (eg. vendor/ etc); this is so that the
USER when pulling 3rd party dependencies he/she wishes to recieve only
upstreem changes and not change themselves can easily communicate to his
colleagues this fact though the project structure
In an ideal world the user would have full control over where each
and every dependency goes, but this is not explicitly required and many
dependency systems are very bad at supporting this.
Optionally,
- the USER would much appreciate the ability to specify a smart range of
acceptable versions (eg. 1.*, <=2.2, ^1.2.* ie. 1.2.* to <2.0, etc) as
well as "tooling dependencies" (eg. golang >=1.3, optipng >=1.*). If tooling
dependencies are allowed then it may be wise to force golang version
constraints always be provided for libraries, so that the standard library
of the language can be mitigated just as any other dependency (see problem 3)
As a sidenote, people expect 1.* but ^1.0 (ie. any version above 1.0 up
to but not including 2.0) is much clearer and does the same thing;
authors of libraries might have a harder time providing incomplete or
incorect information (ie. 1.* instead of ^1.2 for example) if the star
syntax is just plain not supported (for libraries).
- the USER would much appreciate the ability to specify dependencies that are
environment specific (ie. dev, staging, etc), so as to avoid the workload
when deploying to an environment that doesn't need them (realworld: when
using build tools a node project can have easily 30 dev depencies and 5
actual "production" dependencies). This is even more useful if we consider
such headaches as dependency conflics, less dependencies less conflicts.
- when possible the dependency system should try caching dependencies
globablly on the system to avoid network activity (some servers ironically
can have crappy behavior when it comes to retrieving files from exotic
sources such as github, etc). The user needs to have the choice of both
ignoring the cache as well as purging it at will (as that's been known
to cause problems in other dependency systems)
- it would be nice for the USER to be able to "correct" the dependencies of
his dependencies both as a means of applying security hotfixes, applying a
custom version he/she maintains that his dependencies can't officially use,
or just fixing his dependency tree manually
Problem 2: Project A depends on B, B dependeds on C and D
=========================================================
The dependency system should be able to read into B, check if B has a DEPMAP
and resolve B's dependencies. (finer points will be treated in other problems)
The same applies to if C itself depends on E, etc.
If A depends on B and B depends on C and C dependends on A, the dependency
system should error out. Same for any other case of circular dependency. It is
very important that the system inform the user what the circular dependency is
and NOT just dump a load of internal variables or states on the user.
Example of good user presentation:
Error, circular dependency was detected:
A -> B
B -> C
C -> A
Please fix and try again. Bye.
In a more complex case the example would show a tree of dependencies and
highlight in color the key points (ie. A -> B, B -> C and C -> A) so that
the user can visualize the error.
Problem 3: Project A depends on B and C, which both depend on D
===============================================================
There are multiple problems here, depending on the version of D, but most boil
down to the same thing.
It's important to note that is a good idea, albeit not required if you wish to
just deny all, to have the LANGUAGE understand package version (as a hidden
part of the package name), both for the sake of interoperability of different
versions of the same dependency within different packages that need it but also
because it's useful information when debugging. It's also "good idea" for the
dependency system to be able to notify the user of "updated versions" whenver
it can, especially for cases such as the user having a dependency on 1.2.3
(either due to LOCKFILE or DEPMAP) and 1.2.4 being available, since 1.2.4 may
be a critical security update (this applies to all dependencies, even
dependencies of dependencies, not just root).
Before continuing its important to first verify the source of D, if the source
of D can't be indentified as the same or compatible source then the USER should
be notified though a warning and offered help on how they can configure the
DEPMAP to identify the two sources of D as the same source if they believe it
to be identical. D from two sources follows the same case as D with different
symbols.
If D has two different sources then it's treated as two seperate entities so
each version is independent of the other and the symbols in the LANGUAGE are
considered different.
If D is the same version (B -> Dv1.0, C -> Dv1.0) both B and C should just be
linked to the same D (v1.0). The LANGUAGE should see the same D.
If one package can be coerced into the other, ie. one package specifies 1.1 but
the other specifies 1.* or similar then the 1.* is corced into 1.1 by the
dependency system; it is the LIB's responsibility to provide accurate
dependencies, if the intention was "any version of 1.0 so long as it's higher
then 1.2" then it should have specified ^1.2. It is the dependency systems
responsibility to provide support for the LIB to specify the correct intent.
If D differs in only bugfix versions (B -> Dv1.0.1, C -> Dv1.0.2) then B and C
should be linked to the highest bugfix version, unless the USER specifies in
his DEPMAP he doesn't want that behavior. If the versions cause incompatibility
then it's a LIB problem, not a dependency system problem. A bugfix release
should be considered compatible by default unless otherwise specified.
If D differs in minor version but not major version (B -> Dv1.1.1, C -> Dv1.2.1)
then the dependency system must first check if D specifies it can only exist
as a single dependency (ie. it MUST be a single D dependency), and if so the
system fails providing a map of the user of how D asks to be unique and which
packages are trying to use inconpatible versions. Otherwise, if D accepts to be
multiple entities the dependency system must ask the LANGUAGE to perform a
DEPEMDEMCY LOGIC MAP (DLM) on the two D versions (explained later). If the DLM
fails the dependency system fails, if the DLM doesn't fail then the LANGUAGE
needs to treat the two as different symbols. The USER may specify that the
dependency system should forcefully coerce "feature versions" into a single
entity, in which case the LANGUAGE just sees v1.2.1 of D and everything is the
same symbol.
If D differs in major version or is just different symbols then a DEPENDENCY
LOGIC MAP (DLM) is asked from the LANGUAGE. If it fails the dependency system
fails, if it doesn't fail then the LANGUAGE treats both as different symbols.
### Algorythm for forming a DEPENDENCY LOGIC MAP (DLM) for package X
We'll consider we have two versions of X, version X1 and version X2.
Start with all .go files.
Naive loop:
1. if a file imports "the package X#" then it is part of the map
2. the package version it imports is considered the SEMANTIC INPUT
3. anything that exports the package out is considered OUTPUT
4. repeat from (1) with every package that has OUTPUT as relative to
"the package X#" until it's pointless to continue
After doing the above two times (once for X1 and another time for X2), you now
have every package as reciving X1 or X2 and either outputing one of them or
not outputing any one of them.
So now you just check if there is a package that accepts both as input. If you
find one then then algorythm FAILs and you print a map to the user of how the
two would get used by a single package simultaniously. If you don't the
algorythm has passed, since the two versions of X1 and X2 won't ever exist
in the same scope and hence them not being able to exist in the program
simultaniously is only a LANGUAGE problem.
It should be noted that in the realworld this theoretical problem is very
rare as most "shared dependencies" tend to be in the form of "utility libraries"
and libraries will typically export a "universal" resource. If the PHP
ecosystem is any indication when it does happen its not such a "world ending
problem" that the USER can't simply work around it themselves to an extent,
much like if you had two seperate packages with identical package name. A lot
of libraries (of any language) also consider it "sexy" to be able to claim
"we dont depend on anything" either in the spirit of avoiding the problem or
just because it allows them to be consistent (though this depends on the
ecosystem and popularity)
Problem 4: Project A depends on B, C and D, which all depend on E
=================================================================
Same solution as 3. The dependency system needs to be able to apply the logic
of all problems so far both on any number of dependencies as well as any
depth in the dependency tree.
Algorythm wise the problem so long as it can be solved for 2 can be solved for
any number greater then 2. The process is as follows if you have 3:
- solve E for B and C
- solve E for B and D (tacking into account solution of B and C)
- solve E for C and D (tacking account both previous steps)
Or if we take a hard example:
Let B be incompatible with C
Let C be compatible with D
- solve E for B and C: Eb, Ec
- solve E for B and D: Ed
- solve E for C and D: Ec, Ed become Ecd
Result: Eb, Ecd
This is just a naive algorythm to prove it is possible; better algorythms may
be applicable in practice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment