Invertible syntax descriptions
An "invertible syntax description" is something that can be used to both parse and generate a syntax.
For example, instead of defining separate
fromJson functions for a given data type, an invertible syntax description could be used to provide both.
There are many Haskell libraries that provide this functionality or something close to it.
As far as I can tell most of them are at least inspired by Invertible syntax descriptions by Tillmann Rendel and Klaus Ostermann.
Personally I am interested in using these for HTTP routing.
I frequently want to be able to define a route such as
/episodes/:id.json, dispatch based on that route, and generate links to that route.
Doing so manually is tedious and error prone.
Talking with a friend at Flipstone unveiled their
text-routing project, which does exactly that.
I am considering adopting their project at ITProTV, but I wanted to survey the options on Hackage first.
That's what this document is for.
There are two features that are absolutely critical for me:
Being able to have "holes" in the parse result. For example, one way to think about routing is a function that takes a path and returns a handler (that is,
String -> Handler). That's how
scottyworks, for example. I'm not satisfied with that because a route like
/episodes/:id.jsonshould really require an episode ID before returning the handler (
String -> (EpisodeId -> Handler)). Doing that type of thing with normal routing means you have to be sure to match the identifier in your route (
:id) with the identifier in your handler.
Being able to carry around data on the side when parsing and generating. For example
/episodes/:id.jsonisn't the entire location of a resource because you also need the HTTP method. When matching on a route I'll need to make sure the method matches in addition to matching the path. And when generating a route I'll need to produce a method along with the path. I would like to do this without resorting to in band signaling like
As I survey these libraries, here are the things I'm looking for:
- Builds with GHC 8.8.
- Proven support for "real world" formats like CSV, JSON, BSON, and of course web routes.
- Well maintained, responsive to issues and pull requests, with source available somewhere nice.
- Well tested, either by lots of users or a robust test suite.
- Well documented, both at the API level and the package level with worked examples.
- Small core that's easy to understand and extend, like
- Avoids crazy extensions and operators.
- Avoids having a lot of dependencies, and avoids "big" dependencies like
- Avoids leaning too heavily on generic deriving.
- Avoids Template Haskell entirely, except maybe as an optional feature.
- Avoid arrows.
And here are things that I don't care about too much:
- Faster is better, but ultimately I'm looking for "not slow".
- Things should be easy to reason about, but I care more about utility that theoretical purity.
- Being available on Stackage would be nice, but I can do this myself.
After surveying all the options, hopefully I'll find something that suites my needs. If not, I'm comfortable making my own, at which point this document can be used to motivate its existence.
Generally speaking this document is meant to be read top to bottom. It's roughly a stream of consciousness, although it was compiled over several days. And I've made edits here and there.
Right out of the gate I'm surprised that there are so many libraries.
Before researching this I was only aware of
If this is such a powerful and universal concept, why are all these libraries reinventing the wheel?
I suppose things aren't much better for parsing (
trifecta, ...) and pretty printing (
prettyprinter, ...) separately, so there's no reason why both of them together would be any better.
Really the most concerning thing is that the language specific libraries like
JsonGrammer don't reuse one of the lower level libraries like
Are there problems with the low level libraries, or were the authors of the high levels libraries simply unaware of them?
Hopefully I'll find out.
I will reference "the paper" a lot, which means Invertible syntax descriptions. I'd recommend at least reading the abstract to get a feel for what it's about.
This package's author is one of the authors of the paper.
It seems like this is a more or less direct translation of the paper's code into a package.
The package depends on
partial-isomorphisms, which defines the
Iso type from the paper along with some helpful functions.
What follows is essentially an overview of the paper through the lens of this package's documentation and exported identifiers.
At its core, this defines a couple data types.
The parser is pretty typical:
Parser (String -> [(alpha, String)]), like
The pretty printer is a little stranger:
Printer (alpha -> Maybe String).
How can printing something fail?
I've found it illustrative to consider a parser that consumes exactly two characters of input, like a country code (
The corresponding pretty printer should fail if it somehow produces any amount of output other than two characters, like
Anyway, both the
Printer implement the
Syntax type class, which has three super classes:
It also defines a couple methods, but we'll get back to that.
IsoFunctor is a type class for partial isomorphisms.
An isomorphism is a lossless conversion between two types, like
Maybe () and
(Because they have the same number of values,
Maybe () and
Bool can be converted between each other without losing any information.)
A partial isomorphism is a lossy conversion.
For example, consider
You could parse a
String as a
12 % 10, but that conversion can clearly fail when given values like
And you could represent a
Rational as a
Stringby going the other way, but some numbers can't be represented, like
1 % 3.
IsoFunctor type class is a lot like
Given an isomorphism between two types (
Iso a b), an
IsoFunctor lets your map over a value of one type to produce the other.
(<$>) :: IsoFunctor f => Iso a b -> f a -> f b.
This type class is a necessary addition over the existing
Functor class because
Parser is covariant but
Printer is contravariant.
mapParser :: (a -> b) -> Parser a -> Parser b -- covariant mapPrinter :: (b -> a) -> Printer b -> Printer a -- contravariant
I've never been good with variance.
To put it in terms I can understand,
Parser is covariant because it eventually produces and
Printer is contravariant because it consumes an
a and produces something else.
IsoFunctor could be one or the other or both.
ProductFunctor is similar to
This is easier to grok because it doesn't depend on the whole partial isomorphism thing.
Instead it's merely used to produce products, which in Haskell are tuples.
(<*>) :: ProductFunctor f => f a -> f b -> f (a, b).
Compare that to
(<*>), which has type
Applicative f => f (a -> b) -> f a -> f b.
Up next is
Alternative, which is just like the one in
base except it doesn't have any super class constraints.
As a refresher:
class Alternative f where empty :: f a (<|>) :: f a -> f a -> f a
Finally we're ready to tie those three classes together in the
Syntax type class.
It introduces two new methods:
Both are easy to understand in the context of parsers and printers.
In a parser,
pure x will produce
x without consuming input.
In a printer,
pure x will do produce an empty string for values equal to
x and otherwise produce nothing.
(That's kind of confusing. In code:
if actual == expected then Just "" else Nothing.)
In a parser,
token x will consume exactly
x, which is probably a
Char in typical use cases.
In a printer,
token x will produce exactly
And that's ... it. With these basic building blocks, the package produces a rudimentary parser/printer. It provides some combinators, but that's about it. It feels kind of unsatisfying after walking through that paper.
Practically speaking this seems like it could be a good foundation for a more robust library, but at that point I'm not sure why you wouldn't re-implement these type classes yourself. Sure, you'd save yourself a little work, but you'd be taking on this apparently unmaintained dependency. In short: Great idea, solid implementation, but not really suitable for use as a library. At least it gives us a good foundation for the rest of the packages we're surveying.
For starters, next to no documentation on the package, in the one exposed module, or on GitHub. Latest commit was more than eight years ago. So probably not the greatest package from a software engineering point of view. Regardless, let's dive in.
This package is focused on web routes specifically.
I think it's the basis for the more recent
boomerang package, so it probably has some good ideas.
The main module it exposes is surprisingly short.
It exports two types:
Router a b and
a :- b.
Router a b is an invertible parser from
Not sure why
a is there.
Maybe for the
And I think
a :- b is for building "functions" out of parsers.
Or, to put it differently, it lets you use constructors in
There aren't any examples in the documentation, but luckily there are a few in the source repository. Unfortunately the examples use Template Haskell, which wasn't mentioned in the documentation at all. Seems like the repo probably drifted from Hackage a bit. Since I'm trying to avoid TH, I'm not going to dig too deep into this library.
Let's quickly break down an example. They start with an ADT for all the routes, which I like.
data Sitemap = Home | UserOverview | UserDetail Int | Article Int String
Then comes a TH splice, a type family instance, and a bunch of top level declarations run together. This is gross to me.
$(deriveAll ''Sitemap "PF_Sitemap") type instance PF Sitemap = PF_Sitemap Z rHome :& Z rUserOverview :& Z rUserDetail :& Z rArticle = mkRouters
It's not clear to me what "PF" means.
I also don't understand why this isn't all done with a single TH call.
Why split it up?
Every time you add a new route you'll have to change the
Anyway, moving on.
The sitemap itself uses
(/) for path separators, which is cute.
(<>) for combining different routes, which makes sense.
Control.Category for building up a single route, which is alright but kind of annoying that you can't use
(.) from the
sitemap :: Router r (Sitemap :- r) sitemap = id / ( rHome <> "users" . users <> rArticle . ("article" / int . "-" . part) ) where users = rUserOverview <> rUserDetail / int
Looks like constructors take tuples, which is why the
rArticle line uses parentheses like that.
Overall the end result looks relatively clean, but it doesn't seem worth the high entry fee of Template Haskell.
Sjoerd Visscher and Martijn van Steenbergen figured out exactly how to do that and published a proof of concept library know as Zwaluw. With permission, I have refactored their original library into two separate libraries: boomerang and web-routes-boomerang.
This package seems to have documentation and was updated less than two weeks ago, so maybe it's the "production ready" version of Zwaluw? Let's dive in.
Looks like Boomerang doesn't require type families and only uses a single Template Haskell splice for the sitemap data type, so that's an improvement over Zwaluw. Constructing data also appears to not require tuples, so that's nice. (It means fewer parens in the route definition.)
Functionally that seems to be about it. There's a lot of documentation and a lot of helpful combinators too, which is nice. I don't see any examples of how to do things without TH though. I could dig through the code to see what the splices generate, but I don't think that's worth it.
TODO seems similar to boomerang but hyper focused on http routing to the point of building a map from the parser probably not useful in other contexts
TODO also very specific to web routing not on hackage yet hopefully simpler than alternatives and not tightly coupled to happstack
TODO focused on json for some reason also involves type script might have some neat stuff under the covers but none of it is exposed
also focused on json
seems to have some cool ideas about documentation
interacts well with the ecosystem (
TODO focused on toml recent and well documented would be nice to dig into this and see what kinds of problems they ran into (if any) neat ideas: functions instead of type classes and two phase approach (text <=> ast <=> data) also tagged partial bidirectional isomorphism and monadic profunctors https://kowainik.github.io/posts/2019-01-14-tomland
TODO seems like an entirely different approach allows for monadic parsers rather than merely applicative ones https://www.microsoft.com/en-us/research/publication/functional-pearl-pickler-combinators/ https://blog.poisson.chat/posts/2016-10-12-bidirectional-serialization.html https://blog.poisson.chat/posts/2017-01-01-monadic-profunctors.html
TODO looks very similar to the paper but more robust and "production ready" with json and xml libraries
TODO looks like a big improvement over the paper but requires arrows