Those of you at the Haskell Implementor's Workshop this year got the pleasure of seeing Mark Lentczner give a great presentation on GPS (GHC/Platform/Stackage) Haskell. (No, this isn't about global positioning satelites, it's about creating Haskell package sets.) A lot of the ideas we came up with at ICFP required some adjustments to how Stackage works before we're ready for the next stage. Now that those changes are in place, I'd like to start sharing what Mark, Duncan, and I discussed at ICFP this year.
This is the first blog post on the topic, and the goal here is to establish what problem we're trying to solve. It's vital that we come to an understanding of the problem we're trying to solve, so that:
- We can have an intelligent discussion about whether it's the right problem to solve, divorced from any talks about solutions.
- There's a clear standard by which we can measure whether a solution is sufficient.
- We don't mistakenly dismiss a good solution because it ignores a different problem.
I'm going to break this down into three groups: library users, library authors, and distribution maintainers. These three groups will certainly overlap quite a bit (I personally fall into all three), but my claim is that there are different problems present for each group. I'll also try to define what I mean by each of these terms as we move along.
I'm also going to compare how our current solutions stack up for the problems stated. In this sense, I'll discuss the Hackage/cabal-install combination, Haskell Platform, and curated package sets (e.g. Stackage or Linux distribution packages).
A library user is someone who is writing some code that takes advantage of an open source Haskell library. This might be in the context of writing an application, or when writing another library, though we'll be focusing mostly on people writing applications. The user may be writing this application/library for either personal or professional use, and the code may be open source or proprietary.
The first thing the user will need to do is install all dependencies for the code he/she is working on. This is where the most commonly cited problem- "cabal hell"- rears its ugly head. As far as solutions go:
- Hackage/cabal-install: The dependency solver tries its best to resolve this issue, but hard evidence is that there are still many occurrences of the problem.
- Haskell Platform: If you're only using packages included in the platform, the problem is solved. And better yet: you don't even have to build any of the dependencies. But as soon as you break out of the platform dependencies, you're back to the Hackage/cabal-install world. In fact, in many cases, you'll now be worse off than with plain Hackage, because HP constrains some dependencies to versions older than what the newest versions of packages on Hackage support.
- Curation: I'll claim that curated solutions fully solve this specific problem, with the caveat that it depends on the library coverage provided by the curated set.
Say our user has written some code which uses package foo, and used version
1.0. All is going along well with the application, and in the meanwhile, a new
version of foo (1.1) is released, which pulls in newer versions of dependencies
as well (suppose it requires bar 2.2, where our user has been using bar 2.1).
These updates don't affect our user at all, because he/she is either using
curation or some system like cabal freeze
.
One day, someone discovers a bug has been in foo since version 1.0. The author quickly releases a patch for this bug on the 1.1 series (1.1.0.1). However, the author doesn't release anything in the 1.0 series.
- Hackage: the user will likely try to upgrade to version 1.1.0.1, which may have breaking API changes. Worse, this will demand pulling in new versions of deeper dependencies, like bar, which also introduce API changes. This bug fix could end up requiring major changes to the user's codebase. And both those changes and the upgrades themselves may introduce new bugs of their own.
- Haskell Platform/curation: both of these approaches theoretically offer a solution to this problem, in the form of maintainers releasing bugfix patches. In practice, that isn't really done.
Similar to that last point, a package author could elect to release a 1.0.0.1 bug fix release as well. However, practice demonstrates that this rarely happens.
Suppose our user wants to add some code to his/her codebase, and wants to use a new feature introduced in version 1.0.1 of the foo package.
- Hackage: this is trivial generally, just increase the lower bound in your cabal file (and hope it all still compiles).
- Haskell Platform: as long as the package isn't in the platform, no problem. For packages in the platform, this is very difficult.
- Curation: usually this means upgrading to a whole new package set, which may introduce other complications. (This isn't universally true; Stackage's cabal.config approach makes it easy to upgrade just a single package.)
Let's focus on someone who writes libraries to be used by a large number of users, most likely as an open source library. Today, library authors have most of the same problems as library authors, plus a new one: having to support a large number of user configurations. For example:
- Operating system (Linux, Mac, Windows, FreeBSD)
- GHC versions (HEAD, 7.8, 7.6, 7.4 and earlier)
- Haskell Platform or not
- Theoretically any possible combination of dependency versions
In many cases this plethora of choice isn't a problem. Much library code can be completely unaware of the OS. But GHC version (and especially libraries shipped with GHC) can have a huge impact. And the versions of dependent libraries leads to a large maintenance burden.
- Hackage: Doesn't really do anything to address this problem. Theoretically this is the purview of the Package Versioning Policy.
- Haskell Platform: In theory, HP should solve this problem to a large extent. There should be a few well-defined sets of platforms that a user may be using, and library authors just need to test on those. In practice, it's not so rosy:
- There are not enough packages in the platform currently to be a complete basis for work. As a result, authors need to depend on other packages retrieved from Hackage, which leads us back to that situation.
- Emperically, it seems the majority of Haskell users are not limiting themselves to versions of packages provided by the platform.
- The fact that some packages in the platform are constrained to relatively old versions further encourages people to ignore platform guidelines.
- Curation: If everyone switched over to a single curated system, then the problem would be solved. But:
- Emperically, there does not seem to be large buy-in yet on using a curated solution.
- There are a large number of curated selections available, so library authors end up needing to support almost as many combinations as with Hackage.
The most obvious people in this group are people maintaining Haskell package sets for a Linux distribution or for Stackage. However, there are really far more people in this category. Just about every company doing professional Haskell development has a person or team whose job it is to maintain the toolchain that the developers will use. These maintainers run into all of the same problems as users, but usually at a larger scale: trying to synchronize requirements from many different users on different projects.
To clarify this further: if you're an individual working on a project, and you have one project using foo 1.0 and another using foo 1.1, you'll probably just use a sandbox and be done. However, at the distribution level, this usually isn't an acceptable solution. Instead, we want to be able to have a single set of packages used for all builds.
Furthermore, each team working on a distribution tends to come up with its own set of tooling, and redoing work being performed by another group. And quite a bit of work that we'd like to do- such as backporting bugfixes to old versions of libraries- is simply too large a task for any of these individual groups to undertake.
I hope this blog post fleshes out the problems GPS Haskell is designed to solve, and kicks off a good community discussion about these issues. It's likely that some people don't care about some of the issues raised here, and others think other problems are more important. What's important is that we come to a consensus of a subset of problems that can address most people's problems most of the time, and leads to a consistent solution.
Mark's presentation gives quite a bit of the details of the solution we're thinking about already; I hope to expand on that in the next blog post.