sdboyer/verselect-info.md

## verselect-info.md

      
    Raw
  

              verselect-info.md
            
          
    The primary purpose of a version selection algorithm is to create a transitively complete list of packages@versions (henceforth, atoms), such that a compiler or interpreter can take the list of atoms and produce working software.
Version selection algorithms aren't magic. They can only be as good as the information they operate on. And there's lots of different kinds of information we could feed in to such an algorithm.
Historically, the main focus has been on information that is more or less provided by humans - for example, negations/exclusions, like min or max version constraints. Relying on humans to make these statements has all the attendant problems of any system reliant on human input: that input is, at times, wrong both major directions:

False positives: Humans omit important information, wittingly or not. As a result, it is possible to select a non-working set of atoms.
False negatives: Humans include incorrect information, usually innocently. As a result, there are working sets of atoms that are impossible to select.

False negatives are generally much worse than false positives. The latter can generally be resolved by re-running a CLI command, whereas the former must either live permanently as a [patch] or [replace, or someone has to roll a new release. (This is one core problem with MVS - the lack of minimum versions creates a strong tendency towards false negatives)
Cargo currently has some issues with the first item, in the form of stale minimums. Because there is no part of the day-to-day workflow that checks whether a declared minimum is still accurate, it's easy for the true minimum to quietly drift ahead of the stated minimum over time.
One solution folks have already looked at is to run tests against the bottom of ranges. (Composer facilitates something similar, too). Test results are another interesting bit of information, but like explicit declarations, they can be wrong in both directions:

False positives: Testing can never be complete, as Dijkstra would relish in reminding us.
False negatives: Tests can be flaky. So can testing environments.

If we could devise a check that only had false positives, though, it would be in a different category from these other options. It could establish real, trustworthy bounding boxes on compatibility ranges, and leave the other methods to operate within those bounds.
Avoidance of false negatives is exactly the sort of lower-bound assurance that type systems are supposed to provide. (Yes, it's type systems 101 that all type checkers will fail some valid programs - false negatives - but our standard here is "works," and something that fails the typechecker does not "work."). The goal, then, would be to devise a lightweight form of type analysis that can make a (fast) assessment of whether a package, A, could possibly work with a particular atom of B- a pairwise check. It should look something like this:

Parse A and find all references to symbols exported by B
Check for the existence of a corresponding symbol in the atom of B, and any child symbols
Check arities of function calls
Do as much type checking as can be done solely with the information available in A and B. (So, if B has a function signature with a type from C, just assume it's right)

The second item, the existence check, is the bit that is most likely to help with the problem of stale minimums: if A  references a symbol that does not exist in an atom of B, it's a near-certainty that that atom is too old - that the minimum is stale.
Added bonus: type analysis is static, so there's no need to execute foreign, potentially untrustworthy code to get an answer.
The goal here would be to find an algorithm for doing such lightweight type analysis that this can potentially be done on the fly - and/or, done centrally, cached, and reported with other compatibility information provided from a registry.
If the analysis is fast, it would be feasible to run this check as part of a release process, or even in a background thread while running the version selection algorithm during normal operations, or even as part of selection algorithm itself. If a stale minimum can be unambiguously identified, the user can be informed of the need for corrective action.