Skip to content

Instantly share code, notes, and snippets.

@jeffreykegler
Last active August 29, 2015 14:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jeffreykegler/3563fd82932663837f01 to your computer and use it in GitHub Desktop.
Save jeffreykegler/3563fd82932663837f01 to your computer and use it in GitHub Desktop.
floss.md

These are my notes on Andrew's interview for FLOSS Weekly, #321, which aired 14 January, 2015. Readers may also want to look at Andrew's prepared talk on Marpa.

3:19 Randal talks about ambiguous grammars, which in fact Marpa can handle. But most new users are going to care about grammars that are unambiguous or nearly so. Because of its power Marpa can handle, not just any grammar you're likely to want to parse, but a lot of grammars that you probably did not feel a need to get into. Consider it like the ability of a good sports car to run smoothly at 120mph. Not something that's advisable to do on the US highways, but a sign of the kind of power and handling that can be very nice to have.

4:40 Randal is reading from my web site on Marpa. Taking this to the bottom line -- the problem with parser generators in the past is that they'd fail or go non-linear on most useful grammars. Marpa parses in linear time everything they parse in linear time, and a good deal more, so that most users don't have to worry about getting their grammar to work anymore -- it "just parses". If you've used other parser generators, you'll know that that is new.

6:15 "unmarked middle recursion" -- I went back to my web site and explained this a bit more. Bottom line: YAGNI -- "You ain't gonna need it."

6:30 Now comes the practical focus. (This is why I asked Andrew to do this interview.)

10:15 About as good an answer as you can do on a show without visuals. To see what's being talking about, Marpa::R2's top-level page gives a small example of a Marpa parser written in the SLIF (aka "Scanless interface"). This is the interface that Andrew describes at this point.

10:41 "Yes" -- perfect answer.

15:00 A good explanation from Andrew at this point.

15:50 At this point, Andrew confuses Ron Savage with Jean-Damien Durand, both major contributors to Marpa. Ron was a very early Marpa user, who runs Marpa's main web site, and who has written quite a number of Marpa parsers -- but not the ones' Andrew lists. Those parsers, the very sophisticated Marpa-powered C, SQL and ECMAscript parsers, were all written by Jean-Damien Durand.

15:25 Here Andrew seriously understates his role in Marpa. Marpa's main interface, the SLIF, uses a two-layer approach to parsing which Andrew was the first to implement. Andrew's pioneering work had a major effect on the shape of the SLIF -- Andrew has done a lot more than "fix a few semi-colons".

18:00 "Ruby Slippers": Andrew clearly needs to watch the "Wizard of Oz" again. It's called the Ruby Slippers because Marpa's "left-eideticism" -- its perfect awareness of the state of the parse -- allows Marpa, when it rejects a token, to tell the lexer what token it was actually looking for. Which mean the lexer can, on the fly, make the parsers "wishes come true", just like Dorothy's Ruby Slippers could do. This trick is amazingly useful. I've described the Ruby Slippers in a few places. Here's [a recent, up-to-date, blog post] (http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2014/11/delimiter.html), [another post that is a little out of date] (http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2011/11/marpa-and-the-ruby-slippers.html), and [my first post on the topic] (http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2010/06/parsing-with-ruby-slippers.html), which is very out of date, but perhaps my fullest description.

22:10 Again, the [Marpa-powered C compiler front-end] (https://metacpan.org/release/MarpaX-Languages-C-AST) that Andrew is describing is actually by Jean-Damien Durand.

23:00 Here Andrew overstates Marpa's current capabilities. In the works is "strand parsing", which will allow Marpa to parse in constant space. Currently, however, Marpa is DOM-ish -- it parses by reading everything into memory, unless you resort to very special tricks, such as your own custom scheme for "piece-wise" parsing. So, if your application involves reading all of Wikipedia, with full history, the current version of Marpa is probably not for you.

23:35 Here Gareth plays devil's advocate and asks "What is Marpa not good at?" Here's a fuller answer than Andrew gives:

  • Marpa, in the absence of your own special adaptation of it, will read the entire parse into memory. When we introduce "strand parsing", things will change, but for the time being, if your application is parsing very large files, you may need to look elsewhere.

  • If your application is a good fit for regular expressions, Marpa will handle it easily, but it will be considerably slower. This situation changes if, as is often the case, you're extending your regex engine beyond its capabilities. When it comes to large grammars, complex delimiter matching, recursion, etc., etc., Marpa will usually be easier and faster.

25:07 Re the comparison with yacc/bison, I would add that it's quite likely your grammar will be one of those yacc can't parse and that bison can't parse without switching over to GLR, which is exponential-time and creates issues for the lexer. Worse, both yacc and bison are notoriously bad about informing the user of the troubles they're leading him into, which is why even experts have pretty much abandoned their use at this point.

27:00 "What is the biggest application, by volume, on Marpa to date?" I don't really know. It might be the Tivoli log analyzer [that's in use at IBM] (https://www.ibm.com/developerworks/community/blogs/jalvord/entry/sitworld_itm_situation_audit?lang=en).

30:00 Andrew and Randal do this well. Those interested in my blog posts on various approaches to parsing, their tradeoffs, history, etc., may want to start with this post, which was wildly popular, or with this more recent one.

31:00 Joop Leo's first name is pronounced to rhyme with "hope" (he's Dutch). In talking about my blog posts at this point, Randal may be referring to this blog post while Andrew alludes to this one. Other posts to look at if you interested in the Marpa algorithm and its antecedents are here and here.

36:31 The precedence parsing they are discussing is described [in my "Making DSL's even simpler" blog post] (http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2013/01/dsl_simpler2.html). It's "general precedence parsing", not operator precedence -- there's no need to classify or even identify operators, you just simply list rules by precedence. General precedence is new with Marpa, and it's made possible by Marpa's ability to parse a broader class of grammars.

40:00 Perl 6's parser, like other top-down parsers, requires you to not just specify the grammar, but also to tell the parser how to parse. This means you can't necessarily combine its grammars in a clean way. Marpa, on the other hand, works purely from the BNF, with no additional guidance required, so that you get exactly the language specified by the BNF. This means that you can cut and paste pieces of BNF between Marpa grammars, and you will get a parser that describes exactly the newly combined language, nothing more, nothing less. This may not follow functional programming terminology as closely, but it delivers more on the actual vision of composability and reuse.

41:11 "How does Marpa handle something like C typedefs?" There are a few ways to do it. In [his C parser] (https://github.com/jddurand/MarpaX-Languages-C-AST), Jean-Damien uses a scheme based on the one described [in a paper by McKeeman] (http://www.cs.dartmouth.edu/~mckeeman/references/JCLT/ResolvingTypedefsInAMultipassCCompiler.pdf). To implement it, Jean-Damien used Marpa's "parse events" -- the code can be found here. The interview does not get around to talking about Marpa's event capability, but this allows custom procedural logic inside a syntax driver parser, where the procedural logic has access to full information about the state of the parse -- a kind of "best of both worlds".

42:53 "Is there a minimum-sized project for Marpa?" I'd say the minimum size and complexity at which your regex solution becomes awkward. As said already, if your problem really is a regular expression, yes, Marpa will handle it, but you probably want to use a regex engine instead.

44:00 Pure JSON is on the small side for a Marpa application, but if you're thinking in terms of customizing or extending JSON, then Marpa is the best solution. JSON parsers are also very popular as test examples and experiments in the Marpa community, and Ron Savage has [collected 3 of them in a CPAN module] (https://metacpan.org/release/MarpaX-Demo-JSONParser).

45:00 At this point Randal and Andrew get confused about Marpa's license. They eventually come up with the right answer, but I'll ruin the suspense for you. You can embed Marpa into your proprietary application without problem. Marpa is LGPL, not GPL. I encourage you to make what you do with Marpa open-source, but I do not require it. I believe that Marpa is embedded in several proprietary applications, and the persons doing this had every right to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment