luqui/bidi.mkd

## bidi.mkd

      
    Raw
  

              bidi.mkd
            
          
    Our task is to characterize a relationship between an abstract syntax tree and concrete syntax, much like the development in Invertible Syntax Descriptions by Tillmann and Ostermann.  We have an additional constraint, however.  We aim to make the mapping interactively editable -- that is, convert editing commands on the text to transformations on the abstract syntax.
For example, consider a simple arithmetic grammar with precedence.  Given this abstract syntax tree (in S-expression notation):
(plus 3 4)

which is pretty-printed this way:
3 + 4

and the cursor at the end of the input, if the user presses +, we would like to convert the abstract syntax to one of these:
(plus 3 (plus 4 [])
(plus (plus 3 4) [])
(plus 3 4 [])

corresponding to different choices of how to treat associativity.  The [] stands for a hole, a position in the tree whose value needs to be filled in.  However, if the user presses *, we would instead like to convert the abstract syntax to
(plus 3 (times 4 []))

because * has a tighter precedence than +.
Further, we would like to do something reasonable when inserting and deleting from the middle of the tree. For example, using the same abstract syntax, suppose the cursor in the concrete syntax is given by |.  Then consider this state:
3 | + 4

If the user types * here, we would like to expand the abstract syntax to
(plus (times (3 [])) 4)

And in this state
3 + | 4

If the user types -, we would like to transform the abstract syntax to
(plus 3 (negative 4))

So we may have to modify either side of the cursor depending on the input.
(Even more confusingly, if the user continues to type 5 + then the resulting abstract syntax should be
(plus 3 (negative 5) 4)

a case that I will ignore for now, because of the complexity of its interaction with the surrounding precedence.  Perhaps a clean solution will appear once we get the easy ones figured out.)
This development has one major constraint: if you type a valid string of the language from start to end, the generated tree should be an exact parse of that string.  This also goes at smaller scales; if the cursor is "between nodes with low enough precedence" (to be defined precisely later), then typing a complete string of high enough precedence should insert a correct parse of that string at the cursor.  I hope this constraint will be enough to guide most of our semantics.
To begin speaking formally, we have to define how to map a cursor position to some construction in abstract syntax space.  I will call this construct a Cursor.  To define it, we need some other things first.
A Flat Context is one level of an expression with a single hole in it.  For example,
(plus 3 [] 5)

is a flat context.  It must have exactly one hole.  This is a different kind of hole from the ones above meaning "yet un-written code".  Instead it usually means that we are trying to describe something that goes in the hole and using it as our context.
A Context is a chain of flat contexts, for example:
(times 2 []) (plus 3 [] 5)

which, when "folded together", represent an entire expression with a single hole:
(plus 3 (times 2 []) 5)

Note how the innermost expression comes first in the list.
Finally, a Slice is an expression with its arguments separated into two sets, a left and a right set.  For example:
(plus 3 4 | 5)

is a slice with (3 4) in the left set and (5) in the right set.  A Proper Slice is a slice where neither set is empty (i.e. we are not at the beginning or the end).
Now, finally, a Cursor is a context paired with a proper slice.  It defines a location in the abstract syntax which is inside a context above it and between two symbols below it.  If the leaves of the tree are serialized in order, then any position between two leaves has a unique cursor associated with it (except for the very beginning and end, which need special consideration).
Now our "parser" is a function f :: Token -> Cursor -> Maybe Cursor, which modifies a tree at a given point given a token.  There are some configurations where no token could possibly produce a valid tree, so we add Maybe to the return type.  For a given language, we want to define a syntax description which has a pretty printer and one of these parsers.  The pretty printer is pretty easy, so let's focus on the parser.