snoyberg/part1.md Secret

## part1.md

      
    Raw
  

              part1.md
            
          
    Those of you at the Haskell Implementor's Workshop this year got the pleasure
of seeing Mark Lentczner give a great presentation on GPS (GHC/Platform/Stackage)
Haskell. (No,
this isn't about global positioning satelites, it's about creating Haskell
package sets.) A lot of the ideas we came up with at ICFP required some
adjustments to how Stackage works before we're ready for the next stage. Now
that those changes are in place, I'd like to start sharing what Mark, Duncan,
and I discussed at ICFP this year.
This is the first blog post on the topic, and the goal here is to establish what problem we're trying to solve. It's vital that we come to an understanding of the problem we're trying to solve, so that:

We can have an intelligent discussion about whether it's the right problem to solve, divorced from any talks about solutions.
There's a clear standard by which we can measure whether a solution is sufficient.
We don't mistakenly dismiss a good solution because it ignores a different problem.

I'm going to break this down into three groups: library users, library authors,
and distribution maintainers. These three groups will certainly overlap quite a
bit (I personally fall into all three), but my claim is that there are
different problems present for each group. I'll also try to define what I mean
by each of these terms as we move along.
I'm also going to compare how our current solutions stack up for the problems
stated. In this sense, I'll discuss the Hackage/cabal-install combination,
Haskell Platform, and curated package sets (e.g. Stackage or Linux distribution
packages).
Library users

A library user is someone who is writing some code that takes advantage of an
open source Haskell library. This might be in the context of writing an
application, or when writing another library, though we'll be focusing mostly
on people writing applications. The user may be writing this
application/library for either personal or professional use, and the code may
be open source or proprietary.
Finding a cohesive package set

The first thing the user will need to do is install all dependencies for the
code he/she is working on. This is where the most commonly cited problem-
"cabal hell"- rears its ugly head. As far as solutions go:

Hackage/cabal-install: The dependency solver tries its best to resolve this issue, but hard evidence is that there are still many occurrences of the problem.
Haskell Platform: If you're only using packages included in the platform, the problem is solved. And better yet: you don't even have to build any of the dependencies. But as soon as you break out of the platform dependencies, you're back to the Hackage/cabal-install world. In fact, in many cases, you'll now be worse off than with plain Hackage, because HP constrains some dependencies to versions older than what the newest versions of packages on Hackage support.
Curation: I'll claim that curated solutions fully solve this specific problem, with the caveat that it depends on the library coverage provided by the curated set.

Getting bugfix updates

Say our user has written some code which uses package foo, and used version
1.0. All is going along well with the application, and in the meanwhile, a new
version of foo (1.1) is released, which pulls in newer versions of dependencies
as well (suppose it requires bar 2.2, where our user has been using bar 2.1).
These updates don't affect our user at all, because he/she is either using
curation or some system like cabal freeze.
One day, someone discovers a bug has been in foo since version 1.0. The author quickly releases a patch for this bug on the 1.1 series (1.1.0.1). However, the author doesn't release anything in the 1.0 series.

Hackage: the user will likely try to upgrade to version 1.1.0.1, which may have breaking API changes. Worse, this will demand pulling in new versions of deeper dependencies, like bar, which also introduce API changes. This bug fix could end up requiring major changes to the user's codebase. And both those changes and the upgrades themselves may introduce new bugs of their own.
Haskell Platform/curation: both of these approaches theoretically offer a solution to this problem, in the form of maintainers releasing bugfix patches. In practice, that isn't really done.

Similar to that last point, a package author could elect to release a 1.0.0.1 bug fix release as well. However, practice demonstrates that this rarely happens.
New features

Suppose our user wants to add some code to his/her codebase, and wants to use a new feature introduced in version 1.0.1 of the foo package.

Hackage: this is trivial generally, just increase the lower bound in your cabal file (and hope it all still compiles).
Haskell Platform: as long as the package isn't in the platform, no problem. For packages in the platform, this is very difficult.
Curation: usually this means upgrading to a whole new package set, which may introduce other complications. (This isn't universally true; Stackage's cabal.config approach makes it easy to upgrade just a single package.)

Library authors

Let's focus on someone who writes libraries to be used by a large number of
users, most likely as an open source library. Today, library authors have most
of the same problems as library authors, plus a new one: having to support a
large number of user configurations. For example:

Operating system (Linux, Mac, Windows, FreeBSD)
GHC versions (HEAD, 7.8, 7.6, 7.4 and earlier)
Haskell Platform or not
Theoretically any possible combination of dependency versions

In many cases this plethora of choice isn't a problem. Much library code can be completely unaware of the OS. But GHC version (and especially libraries shipped with GHC) can have a huge impact. And the versions of dependent libraries leads to a large maintenance burden.

Hackage: Doesn't really do anything to address this problem. Theoretically this is the purview of the Package Versioning Policy.
Haskell Platform: In theory, HP should solve this problem to a large extent. There should be a few well-defined sets of platforms that a user may be using, and library authors just need to test on those. In practice, it's not so rosy:

There are not enough packages in the platform currently to be a complete basis for work. As a result, authors need to depend on other packages retrieved from Hackage, which leads us back to that situation.
Emperically, it seems the majority of Haskell users are not limiting themselves to versions of packages provided by the platform.
The fact that some packages in the platform are constrained to relatively old versions further encourages people to ignore platform guidelines.


Curation: If everyone switched over to a single curated system, then the problem would be solved. But:

Emperically, there does not seem to be large buy-in yet on using a curated solution.
There are a large number of curated selections available, so library authors end up needing to support almost as many combinations as with Hackage.


Distribution maintainers

The most obvious people in this group are people maintaining Haskell package
sets for a Linux distribution or for Stackage. However, there are really far
more people in this category. Just about every company doing professional
Haskell development has a person or team whose job it is to maintain the
toolchain that the developers will use. These maintainers run into all of the
same problems as users, but usually at a larger scale: trying to synchronize
requirements from many different users on different projects.
To clarify this further: if you're an individual working on a project, and you
have one project using foo 1.0 and another using foo 1.1, you'll probably just
use a sandbox and be done. However, at the distribution level, this usually
isn't an acceptable solution. Instead, we want to be able to have a single set
of packages used for all builds.
Furthermore, each team working on a distribution tends to come up with its own
set of tooling, and redoing work being performed by another group. And quite a
bit of work that we'd like to do- such as backporting bugfixes to old
versions of libraries- is simply too large a task for any of these individual
groups to undertake.
Next time

I hope this blog post fleshes out the problems GPS Haskell is designed to
solve, and kicks off a good community discussion about these issues. It's
likely that some people don't care about some of the issues raised here, and
others think other problems are more important. What's important is that we
come to a consensus of a subset of problems that can address most people's
problems most of the time, and leads to a consistent solution.
Mark's presentation gives quite a bit of the details of the solution we're
thinking about already; I hope to expand on that in the next blog post.

  
## part2.md

      
    Raw
  

              part2.md
            
          
    Last time, I described a number of problems
library users, library authors, and distribution maintainers face when dealing
with the Haskell library ecosystem. Today, I'd like to propose a solution to
this, which is basically an expansion of Mark's GPS Haskell presentation. I'm
going to start with the lower-level mechanisms, and then explain how
individuals would interact with the system.
As I explained last time, a number of the problems we're trying to solve are
already solved by having some form of curation, whether it be Haskell Platform
or Stackage/Linux distributions. Each has an advantage over the other: Haskell
Platform gives a single target platform for library users and authors to
converge upon. Stackage and distros give a much wider collection of libraries.
There are still two problems that plagues both groups: bug fixes and stability.
Haskell Platform has not yet made a bug fix release for any of its releases
(and not due to lack of bugs in the platform libraries).  In the
Stackage/distro world, as mentioned previously, your choices are either to
stick with an old set of packages with bugs, or upgrade to a new set of
packages that may completely break APIs, losing stability. We can't yet have
our cake and eat it too.
The first part of this proposal is that we begin minting stable package
snapshots. This will be similar in notion to what Haskell Platform provides: a
single, universally available set of packages that all distributions can use.
However, it will augment the current Haskell Platform process in a few ways:

There will be no subjective acceptance criteria to getting a package into this set. Haskell Platform requires certain quality bars to be met. These bars should remain, but only in an advisory role to indicate which packages are recommended. Any package that can build with the rest of the set can be included. (This is basically the Stackage acceptance criteria.)
There will be agressive point releases to allow inclusion of bug fixes.
The process will be as automated as possible, avoiding the need for library authors to be involved directly in most cases.

To flesh it out further: imagine we take a Stackage snapshot on January 1,
2015. Using the standard Stackage rules, it will get the newest versions of all
packages, unless an explicit upper bound disallows them. We'll take that
package set, and now call it a stable release. The nomenclature we've been
using is "GPS Haskell 1.0". Let's suppose this includes:

foo-2.4.1
bar-3.2.2
baz-5.1.9

A few days later, the following are released to Hackage:

foo-2.4.1.1 is released with a bug fix
bar-3.2.3 is released with a new feature, which doesn't break backwards compatibility
baz-5.2.0 is released, which is a breaking change

Under the currently available systems, Haskell Platform users will need to wait
six months to get any of these changes. Stackage users would be able to upgrade
almost immediately, but they'd need to take the baz-5.2.0 breaking change
together with the bug fix and non-breaking change. Neither situation is ideal.
Instead, GPS Haskell releases will be supported going forward. We'll have an
automated system that detects that foo and bar have been released with
backwards-compatible APIs, and therefore bump their versions. baz, on the other
hand, had a breaking change, and therefore its new version will not be
included.
(You may ask: how do we know about the breaking change? For the short term, I'd
say let's just depend on authors to put in sensible version numbers. Long term,
we can have tooling that detects PVP version number compliance.)
After determining these updated version numbers, we'll go through a normal
build/test process on these new packages. If there's a successful build, then
this new set of packages will be minted as GPS Haskell 1.1. This process will
continue for the entire support window for GPS Haskell 1 (see below for those
details).
Backporting bug fixes

This system is nice because, for the most part, library authors don't need to
do anything new to support it. Simply releasing your code to Hackage is all
that's necessary to get bug fixes/feature additions to your users. There is one
tweak, however. Going back to that baz example above, suppose that a bug is
discovered in baz that affects baz 5.1 and 5.2. In today's world, a library
author would likely release a new version in the 5.2 series (e.g., 5.2.0.1) and
tell users to upgrade. However, GPS Haskell users wouldn't get that bugfix.
This isn't a problem created by GPS Haskell, it's a problem exposed by GPS
Haskell. Many people working on a codebase want to be able to get bug fixes
without needing to rewrite their code at all. GPS Haskell is now encouraging
this behavior. The goal would be that, in a situation like this, the library
author would release 5.2.0.1 with the bugfix, and backport the bugfix to
5.1.9.1 so that all GPS Haskell users get it immediately, without code change
on their parts.
GPS Haskell is offering a strong advantage here for library authors. Instead of
needing to guess which version of their packages they need to continue to
support, there will be a single (or at least small number, see time periods
below) stable version to be supported, plus whatever the newest version is.
This provides a clear contract between authors and users.
How does this stack up?

Let's see how this addresses our concerns from the previous blog post.

Cohesive package set: since we'll be using the same curation technique as Stackage/distros use already, this problem is still fully addressed.
Getting bugfix updates: now solved.
New features: new features that come with a breaking API change will not be available. But all other new features will become immediately available. I'd argue this is the best possible balance we can make between stability and features. Some people will still need to take package versions outside of the package set some times, and we should fully support that, granting that it will lessen the stability guarantees.
Library authors: a drastically reduced set of library versions to maintain.
Distribution maintainers: regularly released package sets will make it possible for distributions to provide both stable and recent package sets to their users. Distributions wanting to remain on the bleeding edge instead will of course be able to do so, GPS Haskell just gives them an extra option.

How will I use this?

The next few sections are certainly the most fluid, so take them as strawmen
arguments, not solid proposals.
Today, we (mostly) recommend people to download the Haskell Platform. This
constrains versions of platform packages to specific versions, by including
them in the global package database. Instead, Haskell Platform should provide a
basic GHC plus cabal-install installation, with an optional binary package set
component. One exception would be that, on Windows, it should continue to
provide binary versions of packages which are very difficult to install (e.g.,
network).
cabal-install will have some interactive features for selecting the package set
to use. It will default to the latest GPS Haskell set, though you can choose an
older set, use Hackage directly, choose an unstable Stackage set, etc. When
using GPS Haskell, it will detect when a new point release is available, and
provide you with the option to upgrade to it.
Teams working together should always ensure they are using the exact same point
release to guarantee identical APIs. Upgrading point releases should be a
painless process, but the team should make sure they upgrade in unison.
Library authors won't need to change much of their behavior beyond the bugfixes
mentioned above. One other recommended practice that will help the system run
smoothly is maximizing the version range of dependencies that your code works
with. This will ensure that it's easy to upgrade your package to a new point
release without pulling in a dependency for a new major release of a
dependency.
Time periods

The numbers below are purely strawmen. We need to find numbers that work for
everyone. I've chosen values on the short side to be closer to the way the
Hackage ecosystem works today, but I hope that over time we're able to extend
the values a bit with library author assistance.

How often are point releases made? We'll try to make a new build every week. Build failures will possibly push out point release dates a bit. If there's a major vulnerability detected, we can always push through a point release faster.
How often are major releases made? We should target every 3-4 months. The idea here is that people today are used to getting new major versions fairly frequently, so we want to fill that desire.
How long is the support window on a release? The main goal is to have this be a bit longer than the major release window, so that users have a grace period for transitioning. However, the longer the support window, the more burden is placed on library authors to backport bugfixes. Therefore, I'd like to propose:

Official support window starts at one month beyond the release of the next major version.
In cases of major vulnerabilities, we can always try to push out an extra point release after the support window closes.


Next steps

That's it, the cards are on the table. The major next step is to discuss this
as a community.  There are a number of tooling issues to be addressed following
decisions on the points above. Duncan very much wants the functionality for
specifying package sets to be part of Hackage itself, whereas I don't care much
where that functionality is provided. There are lots of nice things that could
be done to cabal-install to make GPS Haskell usage very easy, but even without
those changes, there are more manual processes we can use to start trying this
out. And getting the Haskell Platform installer modified to support all this
would be nice, but is also a non-essential first step.
So I'll propose- as I'm wont to do- an experimental method for testing out
these ideas:

Create a new repository for GPS Haskell that has the package list for each release made.
Set up the necessary tooling to automatically generate a set of packages to be considered for a point release.
Automate the testing of these package sets, and generate new Stackage bundles for each GPS Haskell snapshot it creates.
Publish these Stackage snapshot URLs in a single location (maybe with an RSS feed?) so that people can start experimenting with GPS Haskell immediately.

I want to make this clear: I consider the tooling aspect of this change the
unimportant one. Figuring out the right set of policies and community standards
is far more interesting and complicated. I want to avoid, as much as possible,
letting incidental tooling complexity get in the way of what I believe can
greatly increase Haskell's appeal to people looking to use it in production and
commercial settings.