Skip to content

Instantly share code, notes, and snippets.

@taktoa
Last active October 26, 2019 04:18
Show Gist options
  • Star 20 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save taktoa/a59400fd3e1c400835b60c416ad33952 to your computer and use it in GitHub Desktop.
Save taktoa/a59400fd3e1c400835b60c416ad33952 to your computer and use it in GitHub Desktop.
A rant about pain points in Haskell, written as a response to https://redd.it/7rwuxb

I started writing this polemic to answer your question, but I ended up touching on most of my gripes with Haskell in general, not just in a corporate context.

GHC

GHC is a modern compiler with an amazing RTS and tons of features, but I have some issues with it.

Monolithic and Hard to Contribute To

I think GHC is a really messy codebase and would benefit greatly from being cleaned up, refactored, and split into multiple tools rather than being one monolithic compiler (in other words, I want GHC to be written more like LLVM). Ideally, a cleaned up GHC would be written in "modern Haskell", using the full facilities of Hackage, but I think this might cause some issues when bootstrapping GHC. I also think it might be worth getting rid of some of the more antiquated thesisware GHC extensions, depending on how large the reverse dependency transitive closure of the libraries that use them on Hackage is and how hard it would be to rewrite those libraries to not use those GHC extensions.

Bootstrapping

I don't think bootstrapping GHC from an environment containing only a C compiler should require multiple versions of GHC. As a Nix user, my view is that any policy about how you build your code should be part of your codebase, and that extends to how you bootstrap a compiler. Versions of GHC are a "meta-level" notion not contained within the codebase, so the fact that you have to manually retrieve them and build them in order is problematic in my view. Personally, I would prefer a situation where we "compile" a reasonably modern GHC into a single file containing GHCi bytecode, write an interpreter for GHCi bytecode in portable C, and then check that GHCi bytecode file and interpreter into the GHC codebase. From an auditability point-of-view, I think that this solution isn't so great for preventing trusting-trust attacks (since the bytecode file is basically an opaque blob), but practically everyone already bootstraps GHC starting from an old binary distribution of GHC, so we're already living in a world susceptible to those kinds of attacks. The real solution to bootstrapping without being as susceptible to trusting-trust is to write a tower of compilers and interpreters for successively more complex languages, with the bottom of the tower written in auditable C, but that's such a huge effort that AFAIK no one ever does it.

Template Haskell

The fact that Template Haskell is built into GHC as opposed to being a separate preprocessing step executed via another binary (distributed with GHC) is a complete travesty. It really does not make any sense to couple the GHC that is used to evaluate Template Haskell to the GHC that used to compile the library or executable; this has caused massive problems with cross-compilation and with GHCJS. There is a workaround for this issue called iserv, but I don't think it is philosophically the right way to deal with it. Admittedly, there is a fairly significant problem with separating TH out this way, which is that it requires some way of pretty-printing the generated Haskell, but I think that problem should be dealt with by creating a versioned binary representation for the GHC Haskell AST, and then modifying the GHC frontend so that it can accept this binary format rather than a Haskell source file. Then the TH execution utility could just output this binary format rather than Haskell source, sidestepping the issue of pretty-printing a TH-modified AST (though admittedly a separate tool for pretty-printing these binary AST files would also be pretty useful). Going back to my point about splitting GHC up into smaller tools, this binary AST would also mean that we could make a ghc-parse executable that converts a (potentially TH-containing) Haskell source file into a binary AST file, and this would be extremely useful for editor tooling, though as usual the devil is in the details of versioning, stability, and ease of parsing this binary AST format. I would probably advocate shipping a tool along with GHC that, in addition to (slowly) pretty-printing the binary AST as Haskell source, can convert the AST to a few non-binary formats like s-expressions and JSON. All of these tasks would be eminently doable if it weren't for the messiness of the GHC codebase and the general conservatism of the project, which decreases the number of people who are willing to contribute to it.

Cross-Compilation

I wish GHC were a native cross-compiler. At the very least, I would like to know how to "cross-compile" Haskell programs to Windows binaries (I use air quotes because it's the same architecture, just a different binary format and libc). I really wish the Nix GHC wrapper machinery supported Windows cross-compilation, assuming it is possible (if it is not possible, why not? the code for generating a Windows PE file is in the GHC codebase, there's no good reason to disable it just because you're not building GHC for Windows, is there?).

Miscellaneous

I wish GHC had a WebAssembly backend (this is different from WebGHC, which involves translating GHC's LLVM output to WebAssembly using emscripten).

Infrastructure / Nix

I am fairly satisfied with the Nix/Haskell situation, but I have a few gripes. I think it would be extremely useful if we could build Haskell projects incrementally using Nix, as /u/dmjio mentioned in his comment, and I in fact spent the most of summer 2017 working on implementing that using Nix's import-from-derivation (IFD) feature. However, I concluded that although it is in principle possible to implement this feature using IFD, it would be very difficult to do it in a way that isn't brittle, and the least brittle way of implementing it would require fairly significant reengineering of Cabal to enable it to generate a static description of its build plan upon configuration. Ultimately, I've come to the conclusion that we will need Recursive Nix, the ability to run nix-build inside a nix-build sandbox ala recursive make, before we can start incrementalizing Nix build processes, including Haskell's. I wrote a long comment on the issue for Recursive Nix talking about this experience if you want to read more about it.

As I said in the GHC section, the cross-compilation situation with Nix + Haskell is pretty non-existent, though that's not to imply that it's particularly existent with other toolchains. I know that /u/Sonarpulse is working on fixing the nixpkgs cross-compilation infrastructure in general, and I hope that the Nix GHC wrapper gets the same treatment at some point.

I think the corporate users of Nix + Haskell should probably band together to create a single Hydra build farm that is better suited to our needs than the main NixOS build farm. In particular, I think it would be really nice if we had a fork of nixpkgs for which we could easily fix upstream Haskell packages (e.g.: by adding native dependencies or haskell.lib.dontCheck or whatever), and for which we could be proactive about security issues (since we only need to care about server use-cases). Then peti (or, if he doesn't want to, someone else) could upstream our fixes every so often, reducing the workload on both ends. I also think it'd really be nice to have a GHC 8.2 Haskell package set that is compiled with and without profiling / debugging.

Nix itself has some warts, particularly in the performance of its evaluator, but that is worth a whole other rant.

Module system

Haskell has no way to make nested modules, no way to do qualified exports, and it doesn't even have C++ style namespaces.

Any of these features would allow you to, for example, export a symbol called Text from Data.Text that can be indexed as a module (e.g.: Text.pack), thus allowing you to simply import Data.Text without any qualification needed. Moreover, this would allow custom preludes to subsume all the stuff you normally import; as it stands, you can add as much stuff to a custom prelude as you want, but it has to be in the same namespace, or else every time the prelude is used there must be multiple imports.

I really want a world where you can just import MyPrelude and have BS.*, LBS.*, Text.*, LText.*, Set.*, Map.*, etc. already in scope. Plenty of languages already have this feature; I don't understand why Haskell is so antiquated in this regard.

There are three main workarounds for this issue (i.e.: solutions that don't involve modifying GHC).

The first is to make all the functions that are likely to clash polymorphic enough that they can be used in both contexts. This is the approach taken by things like mono-traversable and most custom preludes. This works for some functions, but if taken to an extreme I think it reduces type inference, makes code more complicated, slows down compile times, and slows down code at runtime (due to dictionary passing).

The second solution is to maintain a strict import discipline. This is what I currently do, and it results in insanely long import lists that contain a lot of pairs of lines like

import           Data.Foo (Foo)
import qualified Data.Foo as Foo

This is obnoxious and repetitive, but it's the most sensible solution given the current library ecosystem.

The third workaround is to adopt an OCaml-style convention where we have one type or typeclass per module that is always named T or C respectively. This means that you could just do import qualified Data.Foo as Foo without also having to import the type named Foo, since it will now be named Foo.T rather than the obnoxious Foo.Foo. The only Haskell programmer I know of that has adopted this convention is Henning Thielemann (example: numeric-prelude). The main problem with this approach, besides some people generally not liking it, is that Haddock generates really confusing documentation, since AFAIK Haddock doesn't ever display the qualification of a symbol. If the Haddock problem were fixed, I'd be willing to adopt this convention, but it seems like a real uphill battle convincing everyone else to do it too, so I think this is probably best for company-internal code where this style could be enforced.

It is also worth mentioning that nested modules / namespaces would be pretty useful for Haskell's record problem, given that they would allow you to more easily namespace record accessors. For the most part, though, this would only fix Haskell records for the consumer of a record; it would be fairly boilerplatey to write records this way. For example, you would define Data.Foo like:

module Data.Foo (type Foo.Foo, module Foo) where

module Foo (Foo (..), new) where
  data Foo
    = Foo
      { bar  :: Bar
      , baz  :: Baz
      , quux :: Quux
      }

  new :: Bar -> Baz -> Quux -> Foo
  new = Foo

and then you would import Data.Foo to use the following names: Foo (type), Foo.new, Foo.bar, Foo.baz, and Foo.quux. This isn't perfect, but it's better in some ways than the current situation, and I can imagine that the changes that would need to be made in GHC to support nested modules would be fairly conducive to adding Agda-style support for records that automatically generate modules like this.

Records

The current situation with records in Haskell is kind of nightmarish. I really wish we could just have row polymorphism like PureScript. There's been quite a bit of research on the subject, I don't really understand why the GHC team is so conservative about adding it to the type system, especially given the fact that I'm pretty sure most implementations of it can be completely eliminated into GHC Core; the existence of vinyl and union certainly implies this, although the fact that those necessarily have linear-time accesses in present-day Haskell might mean that we need to extend Core in some way to make an implementation of row polymorphism efficient (I honestly don't know).

One other thing is that, in addition to making the record situation easier, row polymorphism can be extremely useful in other ways. For instance, we could have instances of Aeson's FromJSON and ToJSON typeclasses for a generic record type^1 like Rec from vinyl. This is useful because most of the time, when you wrap an API that uses JSON, you want two different "levels" of types; one level is a straightforward translation of the JSON format described in the API documentation (which requires comparatively little effort to completely wrap the API), and the second level is a more high-level Haskell-appropriate translation of those types (which is almost never an up-to-date complete description of the API). Since the FromJSON and ToJSON instances for the low-level types are so trivial, you really want them to be automatically generated. Sure, you can do that with GHC.Generics, but then you have to either use DuplicatedRecordFields or prefix all your fields, and that ends up being much worse than the situation I'm talking about. I know this because there is actually already a package that uses vinyl for this workflow, called composite-aeson, and I've used it to wrap APIs (e.g.: the Bittrex API). In general, I think a lot of the things we currently use GHC.Generics for are better served by adding instances for anonymous row-polymorphic records/unions, since I don't think it's reasonable to have the semantics of your program depend on the identifiers you chose for your record accessors (in the anonymous record world, these are type-level strings or empty data types equipped with instances of an open type family into Symbol, so it is much less surprising that program behavior will change if the type-level string or open type family instance is changed).

Type system

Haskell doesn't have quantified class constraints. This means that some typeclasses, like MonadTrans, cannot be written in a way that restricts instances to have the desired behavior; the way you would want to write the MonadTrans class is that any instance t should have the property that for any monad m, t m should also have a Monad instance, but you can't express this kind of superclass constraint without quantified class constraints.

Haskell doesn't have dependent types, though they are on the way. I don't think dependent types are necessarily something you should use all the time, but they are pretty nice to have when needed.

GUI

The biggest gap in the Haskell library ecosystem is definitely that of a good GUI library. I'm not talking about FRP or high-level wrappers or whatever, I think the situation there is mostly decent (reflex is best-in-class IMO for that, though I think reflex-dom is a bit of a crazy codebase). Instead, I'm talking about the ability to write cross-platform (by which I mean Windows, Mac OS, and Linux; I'm not convinced that writing the same GUI for PCs and mobile devices is satisfactorily possible) GUI applications that look good and perform well.

There are basically four options in that realm currently, listed from least-promising (IMO) to most-promising:

  1. FLTK, which has Haskell bindings in the form of /u/deech's fltkhs package. I haven't gotten a chance to look at the quality of these bindings, but they aim to be complete, which is admirable. However, I've never gotten fltkhs to build on NixOS, so I can't use them, and moreover as far as I can tell FLTK isn't a very good UI toolkit anyway (much like GTK), though it is at least nominally cross-platform.
  2. The Haskell GTK bindings. These are extremely good bindings, and are well-maintained and I have actually managed to build them, which is more than I can say for most of the other options. However, GTK itself is a really bad UI toolkit, and although it is nominally cross-platform, GTK applications look pretty much the same wherever you run them (by default, at least), so they aren't a solution I'd bet a big project on.
  3. Qt, which has two bindings: hsqml and Qtah. I can pretty much immediately throw out hsqml because Qt Quick is really not very usable for making complicated GUIs (I have tried; it is far less developed and well-documented than the rest of the Qt ecosystem). Qtah, on the other hand, is far more impressive to me. Until today, I hadn't been able to compile it, but it ticks off most of my boxes, so I am generally impressed. Now someone just needs to figure out how to cross-compile Haskell programs using Qtah from Linux to Mac OS and Windows with Nix, and I will be much more satisfied with the state of the Haskell GUI library ecosystem. After that, if someone (perhaps me) writes a reflex-qtah, we will live in a world where it is possible to write a cross-platform, performant Haskell application that can be built entirely through Nix without any Windows or Mac OS licenses in the mix. There could even be a repository like reflex-platform that makes this kind of workflow completely plug-and-play! The only unfortunate thing about this is that making good-looking GUIs with Qt is generally more difficult than making good-looking GUIs with HTML + CSS, owing to the man-decades of effort that have been poured into the latter activity.
  4. Compiling your code to JavaScript using GHCJS and running it in some kind of browser (e.g.: electron). This is the most promising solution in some ways, since I have no doubt about my ability to make HTML + CSS look decent (it is painful, but doable), and it is definitely cross-platform, but there are real issues with the performance of code generated by GHCJS and the memory usage of modern browser engines. My hope is that by compiling to WebAssembly using WebGHC, and by making something like electron that uses Mozilla's Servo browser engine, we can overcome these issues, but I don't know how likely that is to happen. FWIW, this is the approach I have taken most seriously, as I put a bunch of work into GHCJS bindings for the Electron API: ghcjs-electron (they are still a work-in-progress unfortunately).

Footnotes

  1. If your system has anonymous record types like vinyl's Rec type, just use those, and if it doesn't then you can make instances of those typeclasses on a wrapper type defined like

    newtype Wrapped t = Wrapped (∀ ρ. t ⋄ ρ)

    where t is a type variable representing the required field types and is row combination (with vinyl would be a closed type family that does type-level list concatenation).

@andrewthad
Copy link

I agree with a lot of these pain points. I wanted to mention that, concerning quantified class constraints and row polymorphism, there haven't been any proposals made in the ghc-proposals repo for them. It's possible to write up a proposal without offering to actually implement it (I've done this). Both of these are things where SPJ would want to see the expected impact on Core as a part of the proposal. If anyone wants to write up proposals for these (especially quantified class constraints, since to my understanding it presents far fewer opportunities for bikeshedding), it would be a great service to the community at large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment