This document explains some aspects of Spoofax that we got questions about. It also describes some common mistakes and pitfalls. Finally, it describes Spoofax' internals more in-depth, with the goal of clarifying some of its behaviour.
- Disambiguation tests. Disambiguation tests are tests of the form
test ... [[1+2+3]] parse to [[(1+2)+3]]
. SPT will parse both fragments and compare the ASTs. A common mistake is to introduce a constructor when parsing parenthesized expressions, e.g.Exp.Parens = <(<Exp>)>
. When you make this mistake, the AST for the second fragment will containParens
nodes and hence be different from the AST for the first fragment.
The solution is to not introduce a constructor. This is semantically cleaner anyway, as a Parens
node does not convey any semantic content (remember: the AST is a tree, not a string, and as such it already encodes what you would otherwise achieve by adding parenthesis). Spoofax will show an error "Missing bracket attribute or constructor name" which can be solved by adding the bracket attribute. In the end, your production should look like: Exp = <(<Exp>)> {bracket}
.
-
Disambiguation constructs. Many students copy/paste Java’s precedence rules to the
context-free priorities
section, but then end up with some unexplainable behaviour. For example, when usingExp.Call > Exp.NewObj
as a context-free priority, parsingnew Foo().m()
fails. It is important to realise what a context-free priority means:Exp.Call > Exp.NewObj
means that a parse tree in whichExp.NewObj
is a direct child ofExp.Call
is not allowed. -
Template productions. There seems to be some confusion between SDF2 productions and template productions (available since SDF3). An SDF2 production looks like
Exp = Exp "+" Exp
, i.e. literal strings are quoted. A template production looks likeExp = <<Exp> + <Exp>>
, i.e. the non-terminals are escaped by angle brackets. Template productions are the preferred way of using SDF, as you get a pretty printer for free. -
Whitespace. Every symbol in the context-free productions section is surrounded by
LAYOUT?
. There is an internal definitionLAYOUT = LAYOUT LAYOUT {left}
, allowing you to specifyLAYOUT = [\ \t\n\r]
(note: do not add a repetition to the character class). You might have seen thatcommon.sdf3
in the initial project contains a context-free restriction section with the ruleLAYOUT? -/- [\ \t\n\r]
. You should read this as "layout may not be followed by space, tab, newline, carriage return". The effect is that layout is parsed greedily. Without this rule, the parser can parse layout in many different ways, which blows up on all but the most trivial piece of whitespace. Moreover, the grader will run out of memory and fail to produce a grade. -
Tokenization. The template productions by default tokenize on whitespaces and character specified in the tokenization option. This option defaults to
tokenization : “()”
. This should be interpreted as the set consisting of(
and)
. Tokenization occurs on edges: whenever the input changes from a tokenization-character to a non-tokenization character. This means that, given the default tokenization option,()
becomes a single token! A solution is to omit)
from the tokenization option. Most important: check correct tokenization in the generated SDF2 insrc-gen/syntax/*
. -
Crashing Eclipse. If you experience crashes, probably there is something wrong with your grammar. Most likely the grammar is ambiguous and the stack overflows during parsing or it runs out of memory. Another pitfall is a production that does not consume any input such as
Exp = Exp*
orExpList = Param*, Param = Type?
. In both situations, it's best to construct a minimal example that triggers this behavior and check the related productions.