cgswords/quasi.md

## quasi.md

      
    Raw
  

              quasi.md
            
          
    Quasiquotation for Rust Macros

Following Scheme, we'll provide a handful of operators to programmers to help them
easily construct new syntactic forms (ie, output tokenstreams). These forms include:

quote!, for quickly creating new syntactic forms/tokenstreams
unquote, which escapes quote (following , from Scheme's quasiquote form), allowing
users to perform general computation inside of a quote! invocation.
unquote-splice, which escapes quasi-quoting and then list-splices the result into the
quasi-syntax term.

Syntax

The basic syntax operator allows users to easily create new syntactic forms. For example,
consider a macro that tests its input and produces an expression based on its value:
#[macro]
fn derive_typical_maybe_copy(input: TokenStream, env: SyntaxEnv) -> TokenStream {
  let maybe = input.expect_bool()?;
  if maybe {
    quote!(env,#[derive(PartialEq, Eq, Hash, Clone, Copy)])
  } else {
    quote!(env,#[derive(PartialEq, Eq, Hash, Clone, Copy)])
  }
}

Then we can use this to conditionally set definitions for configuration:
derive_typical_maybe_copy!(true)
struct Foo { ... }

This will produce:
#[derive(PartialEq, Eq, Hash, Clone, Copy)]
struct Foo { ... }

In general, quote! has the form:
quote!(env, quotable) => tokenstream
During invocation, the quotable term is (1) converted into a tokenstream, (2) marked with the
syntactic (hygiene) information from the macro declaration site (also called recoloring, for short),
and (3) returned. Step two is important; we want to ensure that we don't accidentally change the
macro outcome because of bindings at the expansion site.
For example, consider the following Scheme snippet:
(define y (lambda (x) x))

(define-syntax (foo x) #'y)

(let ((y (lambda (x) (+ x 1)))) ((foo 'bar) 10)) ;; => evaluates to 10

Here, the y produced by expanding foo yields the identity function; that's the correct,
hygienic binding for it. We'd like to preserve this for Rust:
fn y (x : Int) -> Int { x }

#[macro]
fn foo(input: TokenStream, env: SyntaxEnv) -> TokenStream {
  quote!(env, y)
}

fn main() {
  let y = |x| x + 1;
  (foo!())(10);
}

Implementation Comment

If quotable does not contain unquotes, we can simply convert the expression into a
tokenstream and perform mark it with its appropriate hygiene information.
Unquote

Unquote allow programmers to easily construct new tokenstreams out of existing
tokenstreams and newly-constructed tokenstreams, allowing programmers to do
stream analysis and computation in the middle of constructing the output
syntax.
We begin by revisiting our previous example, utilizing unquote as $ to reduce
the number of quote! invocations:
#[macro]
fn derive_typical_maybe_copy(input: TokenStream, env: SyntaxEnv) -> TokenStream {
  let maybe = input.expect_bool()?;
  quote!(env,#[derive(PartialEq,Eq,Hash,Clone
               $(if maybe { quote!(env, ,Copy) } else { empty_stream })
              ]
        )
}

While this example isn't particularly clearer, it gets at the heart of the $ operator:
we can provide and expression that evaluates to a TokenStream inside of $() and,
during expansion, this expression will be executed and its result will be placed in the
output TokenStream. (We also introduce an additional form here, empty_stream, which provides
an empty TokenStream. This is useful for situations such as this, where we'd like to produce
empty output in some cases.)
Cond: a larger quote/unquote example

To move onto a larger example, we begin with a partial implementation of the Scheme
operator cond, a compact conditional form that works as:
(define fib
  (lambda (n)
    (cond
      ((= n 0) 1)
      ((= n 1) 1)
      (else (+ (fib (- n 1)) (fib (- n 2)))))))

To recreate this in Scheme, we can implement it as follows (using #` as quote! and
#, as $):
;; NB There are additional forms, mentioned below, that could simplify
;; this implementation.
#[macro]
(define-syntax (cond x)
  (let ((rhs (cdr (syntax->datum x))))                              ;; (1)
    (if (null? rhs)                                                 ;; (2)
        #`(void)                                                    ;; (3)
        (let ((test (caar rhs))                                     ;; (4)
              (res  (cadar rhs))
              (rest (cdr rhs)))
          #`(if #,(if (eq? 'else test) #`#t (datum->syntax x test)) ;; (4.i)
                  #,(datum->syntax x res)                           ;; (4.ii)
                  #,(if (null? rest)                             
                        #`(void)                                    ;; (4.iii)
                        (datum->syntax x (cons (datum->syntax x 'cond) rest))))))))

The macro algorithm proceeds as follows:

We pull the input syntactic object x into a data object, then destruct it to get at
the right-hand side of the cond (eg the clauses).
We check if there are any clauses left.
If not, we return void.
Otherwise, we pull out the first test, the first res, and the rest of the clauses.
Then we construct an if statement, where the test and consequent branch reflect the
test and res and the alternative branch will handle any remaining clauses as
follows:
check if the rest is else, and, if so, emit #t (true in Scheme) as the test;
otherwise, emit the test.
emit the result of the test as the true branch of the if
if there are no additional clauses, emit void
if there are, construct an invocation of cond on the rest to recur down them.

To summarize, at each step, we pull out the test and result of the next test clause, then
construct a series of nested if expressions that perform the computation (substituting
#t for else).
(In the above example, datum->syntax and syntax->datum are tools for converting from
syntactic forms into list-like structures and back; we elide them in our rust examples
because there the current proposal puns between syntax and datum through TokenStreams,
as discussed below.)
To recreate this sort of behavior in the Rust procedural macro system, we'll
use quote!, the quote operator, and $, the unquote operator, following a similar
algorithm:
To recreate this sort of behavior in the Rust procedural macro system, we'll
use quote!, the quote operator, and $, the unquote operator, following a similar
algorithm:

Check if cond has no arguments; if so, emit {}
Check if there is a pair of a test and rhs as a prefix of the tokenstream
If there is, construct an if statement:
1. If there is, then check if the test is else.

If so, emit true
If not, emit the test.
2. Emit the rhs as the true branch of the if test
3. If the remaining input is empty, emit an empty token stream (to form if test   {rhs})
4. If it is not empty, construct a new invocation of cond! as syntax and add it to
the tokenstream.


If there is not, panic with an error

Implementing this algorithm using the current proposal proceeds as:
#[macro]
fn cond!(input: TokenStream, env: SyntaxEnv) -> TokenStream {
  if input.is_empty() {                                                 // (1)
    quote!(env, {})
  } else if let Some(((test, rhs), rest)) = input.maybe_pair_prefix() { // (2)
    quote!(env, if $(if test.free_id_eq(env.as_ident("else")) {         // (2.i), (2.i.a)
                       quote!(env, true)                                // (2.i.a.a)
                     } else {
                       env.to_syntax(test, input)}) {                   // (2.i.a.b)
                   $( env.to_syntax(rhs, input) )                       // (2.i.b)
                } $(if rest.is_empty() {                                // (2.i.c)
                      empty_stream
                    } else {                                            // (2.i.d)
                      quote!(env, else { cond!($(rest)) } )
                    }))
  } else {                                                              // (2.ii)
    panic("Invalid conditional form: {:?}", input);
  }
}

This macro, when used, would work as:
fn fib(n: Int) -> Int {
  cond!(
    (n == 0, 1)
    (n == 1, 1)
    (else, fib(n-1) + fib(n-2))
  )
}

This would produce the program:
fn fib(n: Int) -> Int {
  if n == 0 {
      1
  } else {
      if n == 1 {
          1
      } else {
          if true {
              fib(n-1) + fib(n-2)
          }
      }
  }
}

While the result is not idiomatic rust, it (a) achieves the goal and (b) could be
transformed into idiomatic rust via a more complex cond! implementation.
(Incidentally, the implementation has a secondary advantage: if there is no else clause,
the control flow analysis will report that not all paths can produce a value, guiding the
programmer to detect bugs by reusing existing compiler infrastructure.)
At this point, we have used a number of additional forms and values in defining cond!. Most notably:

free_id_eq, which checks if two things are identifiers that refer to the same object a
la free-identifer=? from Scheme.) We use it to ask if the test-position token is
actually the keyword else.
as_ident : String -> TokenTree (?), which converts its input into an uncolored
'ident' token (maybe tokenstream?) (We define another form, below, called
as_colored_ident, which also does recoloring)
to_syntax : TokenStream -> &TokenStream -> TokenStream, which takes two tokenstreams
(or token slices / token trees, or better any of the three types) and 'colors' the first
input using the second input's coloring before returning the first one. We use this
twice, recoloring both the test and the rhs to ensure that they will use the bindings from
the macro invocation site (which has input's colors) instead of the macro declaration site.)
as_fn_call_syntax : String -> TokenStream -> &TokenStream, which takes an ident,
arguments, and a coloring object, and constructs a call as ident(args), colored with
the coloring object, as syntax.

These would need to be provided to the programmer as 'quality of life' tools.
Even with all of this, however, we have not explored the really thorny bits of this
design:
The TokenStream Pun

At this point in the design, there's a pretty brutal pun between TokenStreams as syntax
(colored) objects and TokenStreams as non-syntactic (uncolored) objects; for example,
test and rhs are uncolored, but input itself is colored. These two variants of
tokenstreams correspond to the structures produced by syntax->datum and datum->syntax
in the Scheme implementation above.
The Scheme salve for this problem is to provide two different things: datum, in the form
of quoted expressions, and syntax, in the form of syntactic information wrapped around
datum. In Rust, this might manifest as TokenStreams, analogous to data, and
SyntaxStreams, analogous to syntax forms. It is unclear if this would be a 'big win';
the types may provide annoying to macro writers.
The Threaded-Environment Situation

There is another 'problem' with the code we've seen so far: quote must manually handle
its environment. This is in stark contrast to the Scheme code above, which carries its
expansion environment as part of the expander, not the macros themselves. Moreover, env is
already in scope, and quoting with a different environment than the current one seems...
fundamentally incorrect.
It may be possible to completely elide this need by modifying the expander to handle this
context itself . This would require quote!(quotable) to be expander-handled form instead
of providing it as as a stand-alone entity, and, in addition, it would require that
TokenStreams (SyntaxStreams?) know how to recover their colorings so that, eg,
to_syntax could work as to_syntax(test, input) without the need for an environment.
While both of these modifications would pull the Rust expander closer to the expanders
presented in Syntactic Abstractions in Scheme / Macros That Work Together, they would
require serious expander refactoring (and possibly modification) to work.
Unquote Splicing

Finally, we get to unquote splicing, written #,@ in Scheme and $@ in Rust. There
are a few different design proposals for this, each attempting to replicate some of the
style of unquote-splicing from Scheme.
First, to demonstrate how it works in Scheme (and why it is useful there), we present
a few usages of this splicing as Scheme code (eliding # for readability)
(1a)    `(a b ,(reverse '(a b c)))   ;; => (a b (c b a))
(1b)    `(a b ,@(reverse '(a b c)))  ;; => (a b c b a)

(2a)    `(+ ,(1 2))                  ;; => (+ (1 2))
(2b)    `(+ ,@(1 2))                 ;; => (+ 1 2)

This structural tool is of particular utility in Scheme because (e1 e2 ...) applies e1 as a procedure to e2.
That is: the parentheses are critically important, and so an operator that allows programmers to more easily
control them has immense utility. For example, the recursive cond case from before may be rewritten using
syntax splicing as:
(old)    (datum->syntax x (cons (datum->syntax x 'cond) rest))))))))
(new)    #`(cond3 #,@(datum->syntax x rest))

The idea here is that, before, we manually constructed a list of the rest and syntactic cond before
converting it inot syntax, and now we can directly construct the list we want. While this operator may
be directly translated into the system as we have it so far, parentheses play a subdued
role in Rust, and, as a result, the ability to trivially escape them seems to lend less utility
than in s-expression languages.
To this end, we present two alternative design proposals for splicing that attempt to find a
'middle ground' utility for providing similar semantic behavior to Scheme's splicing operator
(instead of similar syntactic behavior).
NB. If you have another idea for a useful unsplicing-like structure for Rust, please share it!
Alternative Splicing 1: One-Layer Unnesting

One-layer splicing is the idea that a programmer is promised that, during splicing, the current
braces/parentheses will be 'extruded outward'. For example:
{a; $@({b; c}); d} ~> {a; b; c; d}

Because {} provide serious meaning in Rust (dealing with Lifetimes), this
scope-extrusion utility more closely semantically mirrors the system in Scheme: macro
writers in Scheme use splicing to restructure the program to ensure a specific structure
around their s-expressions, and this unsplicing will lend itself to a similar approach in
Rust.
We may also provide a nicety that un-extrudable things are simply inlined:
{a; $@(b; c); d} ~> {a; b; c; d}

Alternative Splicing 2: Unravel Unnesting

Finally, taking the previous proposal to the extreme, we might imagine unquote splicing
as a large 'flattening' hammer:
{a; $@({b; {c; d}}); e} ~> {a; b; c; d; e}

The advantage here is that programmers are ensured full unwrapping, and don't have to
worry about any semantic-shifting constructions inside of $@(...)---they will definitely
all be removed in the course of expansion. The downside, of course, is that this prevents
a programmer from only extruding a specific layer.
Summary

It's unclear which of these three designs would be most useful. Any of them might provide
serious utility to the programmer, but it is also imaginable that their particular
semantics may make them generally undesirable as a feature. More work should be done to
discern several good examples for how they might be used, and if any of then provide
'big wins' to a macro writer.
With-Syntax

TBD