Legend:
- ➖ poor (impossible or very hard to achieve)
- ➕ good (possible but requires some dancing)
- ✔️ excellent (very easy to achieve/have)
PC
stands for parser combinators and PG
— parser generators.
Property | Generators | Combinators | Description |
---|---|---|---|
Power | ✔️ LALR | ➕ LL(∞) | PC can't handle left-recursive grammars (though, there's some article¹...). |
Research/Age | ✔️ | ✔️ | Both approaches have ~50 years of research behind them. |
Accessibility | ➖ | ➕ | PG is a separate framework you need to learn. As well as some formal language theory. PC is just a library in your PL. |
Debug | ➖ | ➕ | PG generates a lot of unreadable code you can't understand and debug most of the time. Though, debugging in Haskell is another story... |
Visualization | ✔️ | ➖ | ANTLR is capable of visualizing parsed tree on-the-fly as you change input or grammar. |
Composability | ➖ | ✔️ | You can't easily combine HTML grammar and Haskell grammar to get nice templating engine for example. PC are much more reusable. |
Performance | ➕ | ➕ | I didn't see benchmarks. But nobody complained so far. |
Static analysis | ✔️ | ✔️ | PG can analyze your grammar in advance and warn about ambiguity. uu-parsinglib ⁷ also has this feature |
Flexibility | ➕ | ✔️ | PC is just code in your programming language. You can perform any computation there. PG achieve this through attributes and inline code. |
Error messages | ➕ | ✔️² | In PC you have full control on input stream so you can produce excellent error messages |
Error recovery | ✔️ | ➕³ | You may want to report multiple parser errors or do partial code analysis of unparsed code. LALR PG can handle this better than LL. |
Indentation | ➕ | ✔️⁴ | Usually it's much more difficult to parse layout-sensitive languages with PG.⁶ |
Pretty printing | ➕ | ➕⁵ | Some PG allow you to specify formatting rules via attributes. Some Haskell libraries provide you bidirectional parsing/printing. |
References:
- [1]: Parser Combinators for Ambiguous Left-Recursive Grammars
- [2]: megaparsec: How to introduce custom error messages
- [3]: megaparsec: Fun with the recovery feature
- [4]: megaparsec: Indentation-sensitive parsing
- [5]: syntax: Reversible parsing and pretty-printing.
- [6]: GHC and parsing layout it is effectively managed by having the lexer produce a set of «indent, dedent and virtual semicolon» tokens before the parser generator sees the token stream.
- [7]: uu-parsinglib: Fast, online, error-correcting, monadic, applicative, merging, permuting, interleaving, idiomatic parser combinators.
Also see discussion here for more details:
Good for:
- Parser Generators
- Programming languages
- Complex domain specific languages (like SQL)
- Parser Combinators
- Simple DSLs and markup languages (XML, HTML, Markdown)
- Serialisation and configuration formats (JSON, YAML, TOML)
- Creating custom markup languages (like Haskell-specific markdown)
- Parsing arbitrary non-difficult textual data (logging output, org-mode files)
Existing sources with excellent description:
Generalized LL parsing (GLL) algorithms can be used to write parser combinators that can handle left recursive grammars.