martijndwars/notes.md

## notes.md

      
    Raw
  

              notes.md
            
          
    This document explains some aspects of Spoofax that we got questions about. It also describes some common mistakes and pitfalls. Finally, it describes Spoofax' internals more in-depth, with the goal of clarifying some of its behaviour.

Disambiguation tests. Disambiguation tests are tests of the form test ... [[1+2+3]] parse to [[(1+2)+3]]. SPT will parse both fragments and compare the ASTs. A common mistake is to introduce a constructor when parsing parenthesized expressions, e.g. Exp.Parens = <(<Exp>)>. When you make this mistake, the AST for the second fragment will contain Parens nodes and hence be different from the AST for the first fragment.

The solution is to not introduce a constructor. This is semantically cleaner anyway, as a Parens node does not convey any semantic content (remember: the AST is a tree, not a string, and as such it already encodes what you would otherwise achieve by adding parenthesis). Spoofax will show an error "Missing bracket attribute or constructor name" which can be solved by adding the bracket attribute. In the end, your production should look like: Exp = <(<Exp>)> {bracket}.


Disambiguation constructs. Many students copy/paste Java’s precedence rules to the context-free priorities section, but then end up with some unexplainable behaviour. For example, when using Exp.Call > Exp.NewObj as a context-free priority, parsing new Foo().m() fails. It is important to realise what a context-free priority means: Exp.Call > Exp.NewObj means that a parse tree in which Exp.NewObj is a direct child of Exp.Call is not allowed.


Template productions. There seems to be some confusion between SDF2 productions and template productions (available since SDF3). An SDF2 production looks like Exp = Exp "+" Exp, i.e. literal strings are quoted. A template production looks like Exp = <<Exp> + <Exp>>, i.e. the non-terminals are escaped by angle brackets. Template productions are the preferred way of using SDF, as you get a pretty printer for free.


Whitespace. Every symbol in the context-free productions section is surrounded by LAYOUT?. There is an internal definition LAYOUT = LAYOUT LAYOUT {left}, allowing you to specify LAYOUT = [\ \t\n\r] (note: do not add a repetition to the character class). You might have seen that common.sdf3 in the initial project contains a context-free restriction section with the rule LAYOUT? -/- [\ \t\n\r]. You should read this as "layout may not be followed by space, tab, newline, carriage return". The effect is that layout is parsed greedily. Without this rule, the parser can parse layout in many different ways, which blows up on all but the most trivial piece of whitespace. Moreover, the grader will run out of memory and fail to produce a grade.


Tokenization. The template productions by default tokenize on whitespaces and character specified in the tokenization option. This option defaults to tokenization : “()”. This should be interpreted as the set consisting of ( and ). Tokenization occurs on edges: whenever the input changes from a tokenization-character to a non-tokenization character. This means that, given the default tokenization option, () becomes a single token! A solution is to omit ) from the tokenization option. Most important: check correct tokenization in the generated SDF2 in src-gen/syntax/*.


Crashing Eclipse. If you experience crashes, probably there is something wrong with your grammar. Most likely the grammar is ambiguous and the stack overflows during parsing or it runs out of memory. Another pitfall is a production that does not consume any input such as Exp = Exp* or ExpList = Param*, Param = Type?. In both situations, it's best to construct a minimal example that triggers this behavior and check the related productions.