Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@tfausak
Last active February 2, 2024 20:58
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tfausak/a8d7f135bf76e64ea6f35d3be692cbeb to your computer and use it in GitHub Desktop.
Save tfausak/a8d7f135bf76e64ea6f35d3be692cbeb to your computer and use it in GitHub Desktop.
Survey of invertible syntax description libraries for Haskell.

Invertible syntax descriptions

An "invertible syntax description" is something that can be used to both parse and generate a syntax. For example, instead of defining separate toJson and fromJson functions for a given data type, an invertible syntax description could be used to provide both. There are many Haskell libraries that provide this functionality or something close to it. As far as I can tell most of them are at least inspired by Invertible syntax descriptions by Tillmann Rendel and Klaus Ostermann.

Personally I am interested in using these for HTTP routing. I frequently want to be able to define a route such as /episodes/:id.json, dispatch based on that route, and generate links to that route. Doing so manually is tedious and error prone. Talking with a friend at Flipstone unveiled their text-routing project, which does exactly that. I am considering adopting their project at ITProTV, but I wanted to survey the options on Hackage first. That's what this document is for.

There are two features that are absolutely critical for me:

  • Being able to have "holes" in the parse result. For example, one way to think about routing is a function that takes a path and returns a handler (that is, String -> Handler). That's how scotty works, for example. I'm not satisfied with that because a route like /episodes/:id.json should really require an episode ID before returning the handler (String -> (EpisodeId -> Handler)). Doing that type of thing with normal routing means you have to be sure to match the identifier in your route (:id) with the identifier in your handler.

  • Being able to carry around data on the side when parsing and generating. For example /episodes/:id.json isn't the entire location of a resource because you also need the HTTP method. When matching on a route I'll need to make sure the method matches in addition to matching the path. And when generating a route I'll need to produce a method along with the path. I would like to do this without resorting to in band signaling like "GET /episodes/123.json.

As I survey these libraries, here are the things I'm looking for:

  • Builds with GHC 8.8.
  • Proven support for "real world" formats like CSV, JSON, BSON, and of course web routes.
  • Well maintained, responsive to issues and pull requests, with source available somewhere nice.
  • Well tested, either by lots of users or a robust test suite.
  • Well documented, both at the API level and the package level with worked examples.
  • Small core that's easy to understand and extend, like attoparsec.
  • Avoids crazy extensions and operators.
  • Avoids having a lot of dependencies, and avoids "big" dependencies like lens.
  • Avoids leaning too heavily on generic deriving.
  • Avoids Template Haskell entirely, except maybe as an optional feature.
  • Avoid arrows.

And here are things that I don't care about too much:

  • Faster is better, but ultimately I'm looking for "not slow".
  • Things should be easy to reason about, but I care more about utility that theoretical purity.
  • Being available on Stackage would be nice, but I can do this myself.

After surveying all the options, hopefully I'll find something that suites my needs. If not, I'm comfortable making my own, at which point this document can be used to motivate its existence.

Generally speaking this document is meant to be read top to bottom. It's roughly a stream of consciousness, although it was compiled over several days. And I've made edits here and there.

Libraries

Right out of the gate I'm surprised that there are so many libraries. Before researching this I was only aware of boomerang and tomland. If this is such a powerful and universal concept, why are all these libraries reinventing the wheel? I suppose things aren't much better for parsing (parsec, Earley, trifecta, ...) and pretty printing (wl-pprint, pretty, prettyprinter, ...) separately, so there's no reason why both of them together would be any better.

Really the most concerning thing is that the language specific libraries like JsonGrammer don't reuse one of the lower level libraries like codec. Are there problems with the low level libraries, or were the authors of the high levels libraries simply unaware of them? Hopefully I'll find out.

I will reference "the paper" a lot, which means Invertible syntax descriptions. I'd recommend at least reading the abstract to get a feel for what it's about.

This package's author is one of the authors of the paper. It seems like this is a more or less direct translation of the paper's code into a package. The package depends on partial-isomorphisms, which defines the Iso type from the paper along with some helpful functions. What follows is essentially an overview of the paper through the lens of this package's documentation and exported identifiers.

At its core, this defines a couple data types. The parser is pretty typical: Parser (String -> [(alpha, String)]), like ReadS. The pretty printer is a little stranger: Printer (alpha -> Maybe String). How can printing something fail? I've found it illustrative to consider a parser that consumes exactly two characters of input, like a country code (US, UK, FR, ...). The corresponding pretty printer should fail if it somehow produces any amount of output other than two characters, like USA.

Anyway, both the Parser and Printer implement the Syntax type class, which has three super classes: IsoFunctor, ProductFunctor, and Alternative. It also defines a couple methods, but we'll get back to that.

IsoFunctor is a type class for partial isomorphisms. An isomorphism is a lossless conversion between two types, like Maybe () and Bool. (Because they have the same number of values, Maybe () and Bool can be converted between each other without losing any information.) A partial isomorphism is a lossy conversion. For example, consider String and Rational. You could parse a String as a Rational, like "1.2" as 12 % 10, but that conversion can clearly fail when given values like "twelve". And you could represent a Rational as a Stringby going the other way, but some numbers can't be represented, like 1 % 3.

Unsurprisingly the IsoFunctor type class is a lot like Functor. Given an isomorphism between two types (Iso a b), an IsoFunctor lets your map over a value of one type to produce the other. More concretely, (<$>) :: IsoFunctor f => Iso a b -> f a -> f b. This type class is a necessary addition over the existing Functor class because Parser is covariant but Printer is contravariant.

mapParser  :: (a -> b) -> Parser  a -> Parser  b --     covariant
mapPrinter :: (b -> a) -> Printer b -> Printer a -- contravariant

I've never been good with variance. To put it in terms I can understand, Parser is covariant because it eventually produces and a; Printer is contravariant because it consumes an a and produces something else. An IsoFunctor could be one or the other or both.

Moving on, ProductFunctor is similar to Applicative. This is easier to grok because it doesn't depend on the whole partial isomorphism thing. Instead it's merely used to produce products, which in Haskell are tuples. Hence (<*>) :: ProductFunctor f => f a -> f b -> f (a, b). Compare that to Applicative's (<*>), which has type Applicative f => f (a -> b) -> f a -> f b.

Up next is Alternative, which is just like the one in base except it doesn't have any super class constraints. As a refresher:

class Alternative f where
  empty :: f a
  (<|>) :: f a -> f a -> f a

Finally we're ready to tie those three classes together in the Syntax type class. It introduces two new methods: pure and token. Both are easy to understand in the context of parsers and printers.

In a parser, pure x will produce x without consuming input. In a printer, pure x will do produce an empty string for values equal to x and otherwise produce nothing. (That's kind of confusing. In code: if actual == expected then Just "" else Nothing.)

In a parser, token x will consume exactly x, which is probably a Char in typical use cases. In a printer, token x will produce exactly x.

And that's ... it. With these basic building blocks, the package produces a rudimentary parser/printer. It provides some combinators, but that's about it. It feels kind of unsatisfying after walking through that paper.

Practically speaking this seems like it could be a good foundation for a more robust library, but at that point I'm not sure why you wouldn't re-implement these type classes yourself. Sure, you'd save yourself a little work, but you'd be taking on this apparently unmaintained dependency. In short: Great idea, solid implementation, but not really suitable for use as a library. At least it gives us a good foundation for the rest of the packages we're surveying.

For starters, next to no documentation on the package, in the one exposed module, or on GitHub. Latest commit was more than eight years ago. So probably not the greatest package from a software engineering point of view. Regardless, let's dive in.

This package is focused on web routes specifically. I think it's the basis for the more recent boomerang package, so it probably has some good ideas. The main module it exposes is surprisingly short. It exports two types: Router a b and a :- b. I think Router a b is an invertible parser from String to b. Not sure why a is there. Maybe for the Category or Monoid instance? And I think a :- b is for building "functions" out of parsers. Or, to put it differently, it lets you use constructors in Routers.

There aren't any examples in the documentation, but luckily there are a few in the source repository. Unfortunately the examples use Template Haskell, which wasn't mentioned in the documentation at all. Seems like the repo probably drifted from Hackage a bit. Since I'm trying to avoid TH, I'm not going to dig too deep into this library.

Let's quickly break down an example. They start with an ADT for all the routes, which I like.

data Sitemap
   = Home
   | UserOverview
   | UserDetail Int
   | Article Int String

Then comes a TH splice, a type family instance, and a bunch of top level declarations run together. This is gross to me.

$(deriveAll ''Sitemap "PF_Sitemap")
type instance PF Sitemap = PF_Sitemap
Z rHome :& Z rUserOverview :& Z rUserDetail :& Z rArticle = mkRouters

It's not clear to me what "PF" means. I also don't understand why this isn't all done with a single TH call. Why split it up? Every time you add a new route you'll have to change the mkRouters line.

Anyway, moving on. The sitemap itself uses (/) for path separators, which is cute. Also (<>) for combining different routes, which makes sense. And (.) from Control.Category for building up a single route, which is alright but kind of annoying that you can't use (.) from the Prelude.

sitemap :: Router r (Sitemap :- r)
sitemap = id /
    (  rHome
    <> "users" . users
    <> rArticle . ("article" / int . "-" . part)
    )
  where
    users  = rUserOverview
          <> rUserDetail / int

Looks like constructors take tuples, which is why the rArticle line uses parentheses like that. Overall the end result looks relatively clean, but it doesn't seem worth the high entry fee of Template Haskell.

The Happstack documentation says:

Sjoerd Visscher and Martijn van Steenbergen figured out exactly how to do that and published a proof of concept library know as Zwaluw. With permission, I have refactored their original library into two separate libraries: boomerang and web-routes-boomerang.

This package seems to have documentation and was updated less than two weeks ago, so maybe it's the "production ready" version of Zwaluw? Let's dive in.

Looks like Boomerang doesn't require type families and only uses a single Template Haskell splice for the sitemap data type, so that's an improvement over Zwaluw. Constructing data also appears to not require tuples, so that's nice. (It means fewer parens in the route definition.)

Functionally that seems to be about it. There's a lot of documentation and a lot of helpful combinators too, which is nice. I don't see any examples of how to do things without TH though. I could dig through the code to see what the splices generate, but I don't think that's worth it.

TODO seems similar to boomerang but hyper focused on http routing to the point of building a map from the parser probably not useful in other contexts

TODO also very specific to web routing not on hackage yet hopefully simpler than alternatives and not tightly coupled to happstack

TODO focused on json for some reason also involves type script might have some neat stuff under the covers but none of it is exposed

TODO also focused on json seems to have some cool ideas about documentation interacts well with the ecosystem (aeson, attoparsec, pretty)

TODO focused on toml recent and well documented would be nice to dig into this and see what kinds of problems they ran into (if any) neat ideas: functions instead of type classes and two phase approach (text <=> ast <=> data) also tagged partial bidirectional isomorphism and monadic profunctors https://kowainik.github.io/posts/2019-01-14-tomland

TODO seems like an entirely different approach allows for monadic parsers rather than merely applicative ones https://www.microsoft.com/en-us/research/publication/functional-pearl-pickler-combinators/ https://blog.poisson.chat/posts/2016-10-12-bidirectional-serialization.html https://blog.poisson.chat/posts/2017-01-01-monadic-profunctors.html

TODO looks very similar to the paper but more robust and "production ready" with json and xml libraries

TODO looks like a big improvement over the paper but requires arrows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment