Skip to content

Instantly share code, notes, and snippets.

@chshersh
Last active June 4, 2024 13:27
Show Gist options
  • Save chshersh/27844477752359735dfa41ac184d3bf2 to your computer and use it in GitHub Desktop.
Save chshersh/27844477752359735dfa41ac184d3bf2 to your computer and use it in GitHub Desktop.
Per criterion comparison of Parser Generators and Parser Combinators

Legend:

  • ➖ poor (impossible or very hard to achieve)
  • ➕ good (possible but requires some dancing)
  • ✔️ excellent (very easy to achieve/have)

PC stands for parser combinators and PGparser generators.

Property Generators Combinators Description
Power ✔️ LALR ➕ LL(∞) PC can't handle left-recursive grammars (though, there's some article¹...).
Research/Age ✔️ ✔️ Both approaches have ~50 years of research behind them.
Accessibility PG is a separate framework you need to learn. As well as some formal language theory. PC is just a library in your PL.
Debug PG generates a lot of unreadable code you can't understand and debug most of the time. Though, debugging in Haskell is another story...
Visualization ✔️ ANTLR is capable of visualizing parsed tree on-the-fly as you change input or grammar.
Composability ✔️ You can't easily combine HTML grammar and Haskell grammar to get nice templating engine for example. PC are much more reusable.
Performance I didn't see benchmarks. But nobody complained so far.
Static analysis ✔️ ✔️ PG can analyze your grammar in advance and warn about ambiguity. uu-parsinglib⁷ also has this feature
Flexibility ✔️ PC is just code in your programming language. You can perform any computation there. PG achieve this through attributes and inline code.
Error messages ✔️² In PC you have full control on input stream so you can produce excellent error messages
Error recovery ✔️ ➕³ You may want to report multiple parser errors or do partial code analysis of unparsed code. LALR PG can handle this better than LL.
Indentation ✔️⁴ Usually it's much more difficult to parse layout-sensitive languages with PG.⁶
Pretty printing ➕⁵ Some PG allow you to specify formatting rules via attributes. Some Haskell libraries provide you bidirectional parsing/printing.

References:

Also see discussion here for more details:

Good for:

  • Parser Generators
    • Programming languages
    • Complex domain specific languages (like SQL)
  • Parser Combinators
    • Simple DSLs and markup languages (XML, HTML, Markdown)
    • Serialisation and configuration formats (JSON, YAML, TOML)
    • Creating custom markup languages (like Haskell-specific markdown)
    • Parsing arbitrary non-difficult textual data (logging output, org-mode files)

Existing sources with excellent description:

@poscat0x04
Copy link

Generalized LL parsing (GLL) algorithms can be used to write parser combinators that can handle left recursive grammars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment