chshersh/parser-combinators-vs-parser-generators.md

## parser-combinators-vs-parser-generators.md

      
    Raw
  

              parser-combinators-vs-parser-generators.md
            
          
    Legend:

➖ poor (impossible or very hard to achieve)
➕  good (possible but requires some dancing)
✔️ excellent (very easy to achieve/have)

PC stands for parser combinators and PG — parser generators.


Property
Generators
Combinators
Description


Power
✔️ LALR
➕ LL(∞)
PC can't handle left-recursive grammars (though, there's some article¹...).


Research/Age
✔️
✔️
Both approaches have ~50 years of research behind them.


Accessibility
➖
➕
PG is a separate framework you need to learn. As well as some formal language theory. PC is just a library in your PL.


Debug
➖
➕
PG generates a lot of unreadable code you can't understand and debug most of the time. Though, debugging in Haskell is another story...


Visualization
✔️
➖
ANTLR is capable of visualizing parsed tree on-the-fly as you change input or grammar.


Composability
➖
✔️
You can't easily combine HTML grammar and Haskell grammar to get nice templating engine for example. PC are much more reusable.


Performance
➕
➕
I didn't see benchmarks. But nobody complained so far.


Static analysis
✔️
✔️
PG can analyze your grammar in advance and warn about ambiguity. uu-parsinglib⁷ also has this feature


Flexibility
➕
✔️
PC is just code in your programming language. You can perform any computation there. PG achieve this through attributes and inline code.


Error messages
➕
✔️²
In PC you have full control on input stream so you can produce excellent error messages


Error recovery
✔️
➕³
You may want to report multiple parser errors or do partial code analysis of unparsed code. LALR PG can handle this better than LL.


Indentation
➕
✔️⁴
Usually it's much more difficult to parse layout-sensitive languages with PG.⁶


Pretty printing
➕
➕⁵
Some PG allow you to specify formatting rules via attributes. Some Haskell libraries provide you bidirectional parsing/printing.


References:

[1]: Parser Combinators for Ambiguous Left-Recursive Grammars
[2]: megaparsec: How to introduce custom error messages
[3]: megaparsec: Fun with the recovery feature
[4]: megaparsec: Indentation-sensitive parsing
[5]: syntax: Reversible parsing and pretty-printing.
[6]: GHC and parsing layout it is effectively managed by having the lexer produce a set of «indent, dedent and virtual semicolon» tokens before the parser generator sees the token stream.
[7]: uu-parsinglib: Fast, online, error-correcting, monadic, applicative, merging, permuting, interleaving, idiomatic parser combinators.

Also see discussion here for more details:

Markdown table with parser generators and parser combinators comparison

Good for:

Parser Generators

Programming languages
Complex domain specific languages (like SQL)


Parser Combinators

Simple DSLs and markup languages (XML, HTML, Markdown)
Serialisation and configuration formats (JSON, YAML, TOML)
Creating custom markup languages (like Haskell-specific markdown)
Parsing arbitrary non-difficult textual data (logging output, org-mode files)


Existing sources with excellent description:

SE: When to use a Parser Combinator? When to use a Parser Generator?
SO: Parsec vs Yacc/Bison/Antlr: Why and when to use Parsec?
SO: What is the advantage of using a parser generator like happy as opposed to using parser combinators?
Property	Generators	Combinators	Description
Power	✔️ LALR	➕ LL(∞)	PC can't handle left-recursive grammars (though, there's some article¹...).
Research/Age	✔️	✔️	Both approaches have ~50 years of research behind them.
Accessibility	➖	➕	PG is a separate framework you need to learn. As well as some formal language theory. PC is just a library in your PL.
Debug	➖	➕	PG generates a lot of unreadable code you can't understand and debug most of the time. Though, debugging in Haskell is another story...
Visualization	✔️	➖	ANTLR is capable of visualizing parsed tree on-the-fly as you change input or grammar.
Composability	➖	✔️	You can't easily combine HTML grammar and Haskell grammar to get nice templating engine for example. PC are much more reusable.
Performance	➕	➕	I didn't see benchmarks. But nobody complained so far.
Static analysis	✔️	✔️	PG can analyze your grammar in advance and warn about ambiguity. `uu-parsinglib`⁷ also has this feature
Flexibility	➕	✔️	PC is just code in your programming language. You can perform any computation there. PG achieve this through attributes and inline code.
Error messages	➕	✔️²	In PC you have full control on input stream so you can produce excellent error messages
Error recovery	✔️	➕³	You may want to report multiple parser errors or do partial code analysis of unparsed code. LALR PG can handle this better than LL.
Indentation	➕	✔️⁴	Usually it's much more difficult to parse layout-sensitive languages with PG.⁶
Pretty printing	➕	➕⁵	Some PG allow you to specify formatting rules via attributes. Some Haskell libraries provide you bidirectional parsing/printing.