Create a gist now

Instantly share code, notes, and snippets.

@snoyberg /part1.md Secret
Created Nov 30, 2014

Embed
What would you like to do?
Two part blog post on GPS Haskell/LTS Haskell/Stable Stackage

Those of you at the Haskell Implementor's Workshop this year got the pleasure of seeing Mark Lentczner give a great presentation on GPS (GHC/Platform/Stackage) Haskell. (No, this isn't about global positioning satelites, it's about creating Haskell package sets.) A lot of the ideas we came up with at ICFP required some adjustments to how Stackage works before we're ready for the next stage. Now that those changes are in place, I'd like to start sharing what Mark, Duncan, and I discussed at ICFP this year.

This is the first blog post on the topic, and the goal here is to establish what problem we're trying to solve. It's vital that we come to an understanding of the problem we're trying to solve, so that:

  1. We can have an intelligent discussion about whether it's the right problem to solve, divorced from any talks about solutions.
  2. There's a clear standard by which we can measure whether a solution is sufficient.
  3. We don't mistakenly dismiss a good solution because it ignores a different problem.

I'm going to break this down into three groups: library users, library authors, and distribution maintainers. These three groups will certainly overlap quite a bit (I personally fall into all three), but my claim is that there are different problems present for each group. I'll also try to define what I mean by each of these terms as we move along.

I'm also going to compare how our current solutions stack up for the problems stated. In this sense, I'll discuss the Hackage/cabal-install combination, Haskell Platform, and curated package sets (e.g. Stackage or Linux distribution packages).

Library users

A library user is someone who is writing some code that takes advantage of an open source Haskell library. This might be in the context of writing an application, or when writing another library, though we'll be focusing mostly on people writing applications. The user may be writing this application/library for either personal or professional use, and the code may be open source or proprietary.

Finding a cohesive package set

The first thing the user will need to do is install all dependencies for the code he/she is working on. This is where the most commonly cited problem- "cabal hell"- rears its ugly head. As far as solutions go:

  • Hackage/cabal-install: The dependency solver tries its best to resolve this issue, but hard evidence is that there are still many occurrences of the problem.
  • Haskell Platform: If you're only using packages included in the platform, the problem is solved. And better yet: you don't even have to build any of the dependencies. But as soon as you break out of the platform dependencies, you're back to the Hackage/cabal-install world. In fact, in many cases, you'll now be worse off than with plain Hackage, because HP constrains some dependencies to versions older than what the newest versions of packages on Hackage support.
  • Curation: I'll claim that curated solutions fully solve this specific problem, with the caveat that it depends on the library coverage provided by the curated set.

Getting bugfix updates

Say our user has written some code which uses package foo, and used version 1.0. All is going along well with the application, and in the meanwhile, a new version of foo (1.1) is released, which pulls in newer versions of dependencies as well (suppose it requires bar 2.2, where our user has been using bar 2.1). These updates don't affect our user at all, because he/she is either using curation or some system like cabal freeze.

One day, someone discovers a bug has been in foo since version 1.0. The author quickly releases a patch for this bug on the 1.1 series (1.1.0.1). However, the author doesn't release anything in the 1.0 series.

  • Hackage: the user will likely try to upgrade to version 1.1.0.1, which may have breaking API changes. Worse, this will demand pulling in new versions of deeper dependencies, like bar, which also introduce API changes. This bug fix could end up requiring major changes to the user's codebase. And both those changes and the upgrades themselves may introduce new bugs of their own.
  • Haskell Platform/curation: both of these approaches theoretically offer a solution to this problem, in the form of maintainers releasing bugfix patches. In practice, that isn't really done.

Similar to that last point, a package author could elect to release a 1.0.0.1 bug fix release as well. However, practice demonstrates that this rarely happens.

New features

Suppose our user wants to add some code to his/her codebase, and wants to use a new feature introduced in version 1.0.1 of the foo package.

  • Hackage: this is trivial generally, just increase the lower bound in your cabal file (and hope it all still compiles).
  • Haskell Platform: as long as the package isn't in the platform, no problem. For packages in the platform, this is very difficult.
  • Curation: usually this means upgrading to a whole new package set, which may introduce other complications. (This isn't universally true; Stackage's cabal.config approach makes it easy to upgrade just a single package.)

Library authors

Let's focus on someone who writes libraries to be used by a large number of users, most likely as an open source library. Today, library authors have most of the same problems as library authors, plus a new one: having to support a large number of user configurations. For example:

  • Operating system (Linux, Mac, Windows, FreeBSD)
  • GHC versions (HEAD, 7.8, 7.6, 7.4 and earlier)
  • Haskell Platform or not
  • Theoretically any possible combination of dependency versions

In many cases this plethora of choice isn't a problem. Much library code can be completely unaware of the OS. But GHC version (and especially libraries shipped with GHC) can have a huge impact. And the versions of dependent libraries leads to a large maintenance burden.

  • Hackage: Doesn't really do anything to address this problem. Theoretically this is the purview of the Package Versioning Policy.
  • Haskell Platform: In theory, HP should solve this problem to a large extent. There should be a few well-defined sets of platforms that a user may be using, and library authors just need to test on those. In practice, it's not so rosy:
    • There are not enough packages in the platform currently to be a complete basis for work. As a result, authors need to depend on other packages retrieved from Hackage, which leads us back to that situation.
    • Emperically, it seems the majority of Haskell users are not limiting themselves to versions of packages provided by the platform.
    • The fact that some packages in the platform are constrained to relatively old versions further encourages people to ignore platform guidelines.
  • Curation: If everyone switched over to a single curated system, then the problem would be solved. But:
    • Emperically, there does not seem to be large buy-in yet on using a curated solution.
    • There are a large number of curated selections available, so library authors end up needing to support almost as many combinations as with Hackage.

Distribution maintainers

The most obvious people in this group are people maintaining Haskell package sets for a Linux distribution or for Stackage. However, there are really far more people in this category. Just about every company doing professional Haskell development has a person or team whose job it is to maintain the toolchain that the developers will use. These maintainers run into all of the same problems as users, but usually at a larger scale: trying to synchronize requirements from many different users on different projects.

To clarify this further: if you're an individual working on a project, and you have one project using foo 1.0 and another using foo 1.1, you'll probably just use a sandbox and be done. However, at the distribution level, this usually isn't an acceptable solution. Instead, we want to be able to have a single set of packages used for all builds.

Furthermore, each team working on a distribution tends to come up with its own set of tooling, and redoing work being performed by another group. And quite a bit of work that we'd like to do- such as backporting bugfixes to old versions of libraries- is simply too large a task for any of these individual groups to undertake.

Next time

I hope this blog post fleshes out the problems GPS Haskell is designed to solve, and kicks off a good community discussion about these issues. It's likely that some people don't care about some of the issues raised here, and others think other problems are more important. What's important is that we come to a consensus of a subset of problems that can address most people's problems most of the time, and leads to a consistent solution.

Mark's presentation gives quite a bit of the details of the solution we're thinking about already; I hope to expand on that in the next blog post.

Last time, I described a number of problems library users, library authors, and distribution maintainers face when dealing with the Haskell library ecosystem. Today, I'd like to propose a solution to this, which is basically an expansion of Mark's GPS Haskell presentation. I'm going to start with the lower-level mechanisms, and then explain how individuals would interact with the system.

As I explained last time, a number of the problems we're trying to solve are already solved by having some form of curation, whether it be Haskell Platform or Stackage/Linux distributions. Each has an advantage over the other: Haskell Platform gives a single target platform for library users and authors to converge upon. Stackage and distros give a much wider collection of libraries.

There are still two problems that plagues both groups: bug fixes and stability. Haskell Platform has not yet made a bug fix release for any of its releases (and not due to lack of bugs in the platform libraries). In the Stackage/distro world, as mentioned previously, your choices are either to stick with an old set of packages with bugs, or upgrade to a new set of packages that may completely break APIs, losing stability. We can't yet have our cake and eat it too.

The first part of this proposal is that we begin minting stable package snapshots. This will be similar in notion to what Haskell Platform provides: a single, universally available set of packages that all distributions can use. However, it will augment the current Haskell Platform process in a few ways:

  • There will be no subjective acceptance criteria to getting a package into this set. Haskell Platform requires certain quality bars to be met. These bars should remain, but only in an advisory role to indicate which packages are recommended. Any package that can build with the rest of the set can be included. (This is basically the Stackage acceptance criteria.)
  • There will be agressive point releases to allow inclusion of bug fixes.
  • The process will be as automated as possible, avoiding the need for library authors to be involved directly in most cases.

To flesh it out further: imagine we take a Stackage snapshot on January 1, 2015. Using the standard Stackage rules, it will get the newest versions of all packages, unless an explicit upper bound disallows them. We'll take that package set, and now call it a stable release. The nomenclature we've been using is "GPS Haskell 1.0". Let's suppose this includes:

  • foo-2.4.1
  • bar-3.2.2
  • baz-5.1.9

A few days later, the following are released to Hackage:

  • foo-2.4.1.1 is released with a bug fix
  • bar-3.2.3 is released with a new feature, which doesn't break backwards compatibility
  • baz-5.2.0 is released, which is a breaking change

Under the currently available systems, Haskell Platform users will need to wait six months to get any of these changes. Stackage users would be able to upgrade almost immediately, but they'd need to take the baz-5.2.0 breaking change together with the bug fix and non-breaking change. Neither situation is ideal.

Instead, GPS Haskell releases will be supported going forward. We'll have an automated system that detects that foo and bar have been released with backwards-compatible APIs, and therefore bump their versions. baz, on the other hand, had a breaking change, and therefore its new version will not be included.

(You may ask: how do we know about the breaking change? For the short term, I'd say let's just depend on authors to put in sensible version numbers. Long term, we can have tooling that detects PVP version number compliance.)

After determining these updated version numbers, we'll go through a normal build/test process on these new packages. If there's a successful build, then this new set of packages will be minted as GPS Haskell 1.1. This process will continue for the entire support window for GPS Haskell 1 (see below for those details).

Backporting bug fixes

This system is nice because, for the most part, library authors don't need to do anything new to support it. Simply releasing your code to Hackage is all that's necessary to get bug fixes/feature additions to your users. There is one tweak, however. Going back to that baz example above, suppose that a bug is discovered in baz that affects baz 5.1 and 5.2. In today's world, a library author would likely release a new version in the 5.2 series (e.g., 5.2.0.1) and tell users to upgrade. However, GPS Haskell users wouldn't get that bugfix.

This isn't a problem created by GPS Haskell, it's a problem exposed by GPS Haskell. Many people working on a codebase want to be able to get bug fixes without needing to rewrite their code at all. GPS Haskell is now encouraging this behavior. The goal would be that, in a situation like this, the library author would release 5.2.0.1 with the bugfix, and backport the bugfix to 5.1.9.1 so that all GPS Haskell users get it immediately, without code change on their parts.

GPS Haskell is offering a strong advantage here for library authors. Instead of needing to guess which version of their packages they need to continue to support, there will be a single (or at least small number, see time periods below) stable version to be supported, plus whatever the newest version is. This provides a clear contract between authors and users.

How does this stack up?

Let's see how this addresses our concerns from the previous blog post.

  1. Cohesive package set: since we'll be using the same curation technique as Stackage/distros use already, this problem is still fully addressed.
  2. Getting bugfix updates: now solved.
  3. New features: new features that come with a breaking API change will not be available. But all other new features will become immediately available. I'd argue this is the best possible balance we can make between stability and features. Some people will still need to take package versions outside of the package set some times, and we should fully support that, granting that it will lessen the stability guarantees.
  4. Library authors: a drastically reduced set of library versions to maintain.
  5. Distribution maintainers: regularly released package sets will make it possible for distributions to provide both stable and recent package sets to their users. Distributions wanting to remain on the bleeding edge instead will of course be able to do so, GPS Haskell just gives them an extra option.

How will I use this?

The next few sections are certainly the most fluid, so take them as strawmen arguments, not solid proposals.

Today, we (mostly) recommend people to download the Haskell Platform. This constrains versions of platform packages to specific versions, by including them in the global package database. Instead, Haskell Platform should provide a basic GHC plus cabal-install installation, with an optional binary package set component. One exception would be that, on Windows, it should continue to provide binary versions of packages which are very difficult to install (e.g., network).

cabal-install will have some interactive features for selecting the package set to use. It will default to the latest GPS Haskell set, though you can choose an older set, use Hackage directly, choose an unstable Stackage set, etc. When using GPS Haskell, it will detect when a new point release is available, and provide you with the option to upgrade to it.

Teams working together should always ensure they are using the exact same point release to guarantee identical APIs. Upgrading point releases should be a painless process, but the team should make sure they upgrade in unison.

Library authors won't need to change much of their behavior beyond the bugfixes mentioned above. One other recommended practice that will help the system run smoothly is maximizing the version range of dependencies that your code works with. This will ensure that it's easy to upgrade your package to a new point release without pulling in a dependency for a new major release of a dependency.

Time periods

The numbers below are purely strawmen. We need to find numbers that work for everyone. I've chosen values on the short side to be closer to the way the Hackage ecosystem works today, but I hope that over time we're able to extend the values a bit with library author assistance.

  • How often are point releases made? We'll try to make a new build every week. Build failures will possibly push out point release dates a bit. If there's a major vulnerability detected, we can always push through a point release faster.
  • How often are major releases made? We should target every 3-4 months. The idea here is that people today are used to getting new major versions fairly frequently, so we want to fill that desire.
  • How long is the support window on a release? The main goal is to have this be a bit longer than the major release window, so that users have a grace period for transitioning. However, the longer the support window, the more burden is placed on library authors to backport bugfixes. Therefore, I'd like to propose:
    • Official support window starts at one month beyond the release of the next major version.
    • In cases of major vulnerabilities, we can always try to push out an extra point release after the support window closes.

Next steps

That's it, the cards are on the table. The major next step is to discuss this as a community. There are a number of tooling issues to be addressed following decisions on the points above. Duncan very much wants the functionality for specifying package sets to be part of Hackage itself, whereas I don't care much where that functionality is provided. There are lots of nice things that could be done to cabal-install to make GPS Haskell usage very easy, but even without those changes, there are more manual processes we can use to start trying this out. And getting the Haskell Platform installer modified to support all this would be nice, but is also a non-essential first step.

So I'll propose- as I'm wont to do- an experimental method for testing out these ideas:

  • Create a new repository for GPS Haskell that has the package list for each release made.
  • Set up the necessary tooling to automatically generate a set of packages to be considered for a point release.
  • Automate the testing of these package sets, and generate new Stackage bundles for each GPS Haskell snapshot it creates.
  • Publish these Stackage snapshot URLs in a single location (maybe with an RSS feed?) so that people can start experimenting with GPS Haskell immediately.

I want to make this clear: I consider the tooling aspect of this change the unimportant one. Figuring out the right set of policies and community standards is far more interesting and complicated. I want to avoid, as much as possible, letting incidental tooling complexity get in the way of what I believe can greatly increase Haskell's appeal to people looking to use it in production and commercial settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment