xuwei-k/principled-meta-programming.md

## principled-meta-programming.md

      
    Raw
  

              principled-meta-programming.md
            
          
    Principled Meta Programming for Scala

This note outlines a principled way to meta-programming in Scala. It tries to
combine the best ideas from LMS and Scala macros in a minimalistic design.


LMS: Types matter. Inputs, outputs and transformations should all be statically typed.


Macros: Quotations are ultimately more easy to deal with than implicit-based type-lifting


LMS: Some of the most interesting and powerful applications of meta-programming
can be expressed as variants of staging.


Macros: Meta-programming is easier to integrate in the tool-chain if it is done at compile time,
in the standard language, using the standard compiler.


The ideas of LMS and macros can be reconciled as is described in the following.
Quotes and Splices

We start with two well-known fundamental operations: quotation
and splicing. Unlike in standard treatments of meta-programming,
splices are permitted outside of quotes, which makes splices and
quotes true duals of each other.
In Scala programs, quotation is expressed as '(...) or '{...}
for expressions (both forms are equivalent) and as '[...] for
types. Splicing is expressed as a prefix ~ operator.
If e is an expression, then '(e) or '{e} represent the
typed abstract syntax tree representing e. If T is a type, then '[T]
represents the type structure representing T.  The precise
definitions of "typed abstract syntax tree" or "type-structure" do not matter
for the purpose of this note, the names are used only to give
some concrete intuitions. Conversely, ~ e evaluates the expression
e, which must yield a typed abstract syntax tree or type structure,
and embeds the result as an expression (respectively, type) in the enclosing
program.
Quotations can have spliced parts in them; in this case the
embedded splices are evaluated and embedded as part of the
formation of the quotation.
Quotes and splices are duals of each other. For arbitary
expressions e and types T we have:
~'(e) = e
'(~e) = e
~'[T] = T
'[~T] = T

Types for Quotations

The type signatures of quotes and splices can be described using
two fundamental types:

Expr[T]: abstract syntax trees representing expressions of type T
Type[T]: type structures representing type T.

Quoting takes expressions of type T to expressions of type Expr[T] and it takes types T
to expressions of type Type[T]. Splicing takes expressions of type Expr[T] to expressions
of type T and it takes expressions of type Type[T] to types T.
The two types can be thought of being defined as follows:
class Expr[T] {
  def unary_~: T   // splice operation
}
class Type[T] {
  type unary_~ = T  // splice type
}

Note that the prefix type ~ on Type requires SIP-33.
The Phase Consistency Law

A fundamental phase consistency law regulates accesses to free variables in quoted and spliced code:

For any free variable reference x, the number of quoted scopes and the number of spliced scopes between the reference to x and the definition of x must be equal.

Here, this-references count as free variables. On the other
hand, we assume that all imports are fully expanded and that _root_ is
not a free variable. So references to global definitions are
allowed everywhere.
The phase consistency law can be motivated as follows: First, suppose the result
of a program P is some quoted text '{ ... x ... } that refers to a free variable x in P
This can be represented only by referring to original the variable x. Hence, the result
of the program will need to persist the program state itself as one of its parts. We don't
want to do this, hence this situation should be made illegal. Dually, suppose a top-level
part of a program is a spliced text ~{ ... x ... } that refers to a free variable x in P.
This would mean that we refer during construction of P to a value that is available only
during execution of P. This is of course impossible and therefore needs to be ruled out.
Now, the (informal, currently unspecified) small-step evaluation of a program will reduce quotes and
splices in equal measure using the cancellation rules above. But it will neither create nor
remove quotes or splices individually. So the phase consistency law ensures that program elaboration
will lead to neither of the two unwanted situations described above.
Function Expansion

There's also an implicit "expand" decorator that turns a tree describing a function
into a function mapping trees to trees.
implicit inline class Expander[T, U](f: Expr[T => U]) extends AnyVal {
  def apply(x: Expr[T]): Expr[U] = ???
}

The definition of apply above is provided as a primitive. It
follows known value definitions until a closure is found.  If no
closure is found for a function f, Expander(f).apply(x) is
simply '((~f)(~x)).
The expansion operation distributes applications of Expr over
function arrows:
Expander(_).apply: Expr[S => T] => (Expr[S] => Expr[T])

The dual of expansion, let's call it reflect, can be defined as follows:
def reflect(f: Expr[T] => Expr[U]): Expr[T => U] = '{
  (x: T) => ~f('(x))
}

Note how the fundamental phase consistency law works in two different directions here for f and x.
The reference to f is legal because it is quoted, then spliced, whereas the reference
to x is legal because it is spliced, then quoted.
In the definition of reflect above, the types T and U are assumed to refer to global types, so the occurrence of T in the body of reflect does not constitute a violation of the phase consistency law. A polymorphic version of reflect could not use T like this, because there would be a quote but no splice between the parameter binding of T and its usage. But we can define the following alternative:
def reflect[T, U](f: Expr[T] => Expr[U])(implicit t: Type[T]): Expr[T => U] = '{
  (x: ~t) => ~f('(x))
}

Example

Assume an Array class with an inline map method that forwards to a macro implementation.
class Array[T] {
  inline def map[U](f: T => U): Array[U] = ~ Macros.mapImpl[T, U]('[U], '(this), '(f))
}

Here's the definition of the mapImpl macro, which takes quoted types and expressions to a quoted expression:
object Macros {

  def mapImpl[T, U](u: Type[U], arr: Expr[Array[T]], op: Expr[T => U])(implicit ctx: Context): Expr[Array[U]] = '{
    var i = 0
    val xs = ~arr
    var len = xs.length
    val ys = new Array[~u]
    while (i < len) {
      // ys(i) = (~op)(xs(i))   (x => x + 1)(xs(i))
      ys(i) = ~op('(xs(i)))
      i += 1
    }
    ys
  }
}

Here's an application of map and how it rewrites to optimized code:
genSeq[Int]().map(x => x + 1)

==> (inline)
val $this: Seq[Int] = genSeq[Int]()
val f: Int => Int = x => x * x
~ _root_.Macros.mapImpl[Int, Int]('[Int], '($this), '(f))

==> (splice)
val $this: Seq[Int] = genSeq[Int]()
val f: Int => Int = x => x * x

{
  var i = 0
  val xs = ~'($this)
  var len = xs.length
  val ys = new Array[~'[Int]]
  while (i < len) {
    ys(i) = ~('(f)('(xs(i))))
    i += 1
  }
  ys
}

==> (expand and splice inside quotes)
val $this: Seq[Int] = genSeq[Int]()
val f: Int => Int = x => x * x

{
  var i = 0
  val xs = $this
  var len = xs.length
  val ys = new Array[Int]
  while (i < len) {
    ys(i) = xs(i) + 1
    i += 1
  }
  ys
}

==> (elim dead code)
val $this: Seq[Int] = genSeq[Int]()

{
  var i = 0
  val xs = $this
  var len = xs.length
  val ys = new Array[Int]
  while (i < len) {
    ys(i) = xs(i) + 1
    i += 1
  }
  ys
}

Relationship with Inline

The proposal needs inline to move splices in macro libraries to the point where a macro is called. The proposal as is does not depend on the other uses of inline, notably inline values and parameters. It can be combined with inline values and parameters by assuming that they are already substituted out before phase consistency is checked.
For instance, here is an implementation of the power function that makes use of a statically known exponent:
inline def power(inline n: Int, x: Double) = ~powerCode(n, '(x))

private def powerCode(n: Int, x: Expr[Double]): Expr[Double] = 
  if (n == 0) '(1.0)
  else if (n == 1) x
  else if (n % 2 == 0) '{ { val y = ~x * ~x; ~powerCode(n / 2, '(y)) } }
  else '{ ~x * ~powerCode(n - 1, x) }

The usage of n as an argument in ~powerCode(n, '(x)) is not phase-consistent, but since n is an inline parameter the code is still valid.
There's some overlap between splice/quote and inlining, because the current definition of inlining in Dotty also does simplifications of conditionals and beta-reduction. Indeed, Array#map and power could also have been implemented as simple inline methods. Their implementation in terms of quoting and splicing is more involved but also more explicit.
A possible design choice would be to remove language support for conditional rewritings and beta-reduction of inlined code and rely exclusively on quoting and splicing.
Relationship with Staging

The framework expresses at the same time compile-time meta-programming and staging. The phase in which code is run is determined by the difference between the number of splice scopes and quote scopes in which it is embedded.


If there are more splices than quotes, the code is run at "compile-time" i.e. as a macro. In the general case, this means running an interpreter that evaluates the code, which is represented as a typed abstract syntax tree. The interpreter can fall back to reflective calls when evaluating an application of a previously compiled method. If the splice excess is more than one, it would mean that a macro's implementation code (as opposed to the code it expands to) invokes other macros. If macros are realized by interpretation, this would lead to towers of interpreters, where the first interpreter would itself interprete an interpreter code that possibly interpretes another interpreter and so on.


If the number of splices equals the number of quotes, the code is compiled and run as usual.


If the number of quotes exceeeds the number of splices, the code is staged. That is, it produces a typed abstract syntax tree or type structure at run-time. A quote excess of more than one corresponds to multi-staged programming.


Going Further

The presented meta-programming framework is so far quite restrictive in that it does not allow for the inspection of quoted expressions and types. It's possible to work around this by providing all necessary information as normal, unquoted inline parameters. But we would gain more flexibility by allowing for the inspection of quoted code with pattern matching. This opens new possibilites. For instance, here is a version of power that generates the multiplications directly if the exponent is statically known and falls back to the dynamic implementation of power otherwise.
inline def power(n: Int, x: Double): Double = ~{
  '(n) match {
    case Constant(n1) => powerCode(n1, '(x))
    case _ => '{ dynamicPower(n, x) }
  }
}

private def dynamicPower(n: Int, x: Double): Double =
  if (n == 0) 1.0
  else if (n % 2 == 0) dynamicPower(n / 2, x * x)
  else x * dynamicPower(n - 1, x)

This assumes a Constant extractor that maps tree nodes representing constants to their values.
Once we allow for inspection of code, the "expansion" operation that maps expressions over function to a functions over expressions can be implemented in user code:
implicit inline class Expander[T, U](f: Expr[T => U]) extends AnyVal {
  def apply(x: Expr[T]): Expr[U] = 
    f match {
      case Lambda(g) => g(x)
      case _ => '((~f)(~x))
    }

This assumes an extractor
object Lambda {
  def unapply[T, U](x: Expr[T => U]): Option[Expr[T] => Expr[U]]
}

Once we allow inspection of code via extractors, it's natural to also add constructors that construct typed trees
directly without going through quotes. Most likely, those constructors would work over Expr types which lack a known type argument. For instance, an Apply constructor could be typed as follows:
def Apply(fn: Expr[_], args: List[Expr[_]]): Expr[_]

This would allow constructing applications from lists of arguments without having to match the arguments one-by-one with the corresponding formal parameter types of the function. We need then "at the end" a method to convert an Expr[_] to an Expr[T] where T is given from the outside. E.g. if code yields a Expr[_], then code.atType[T] yields an Expr[T]. The atType method has to be implemented as a primitive; it would check that the computed type structure of Expr is a subtype of the type structure representing T.
Implementation Scheme

Quotes and splices are primitive forms in the generated abstract syntax trees. They are eliminated in a macro-expansion phase. Macro-expansion takes place after typing. A natural place for it in the compilation pipeline would be shortly after tree pickling, but other schemes are also possible.
Macro-expansion works outside-in. If the outermost scope is a splice, the spliced AST will be evaluated in an interpreter. A call to a previously compiled method can be implemented as a reflective call to that method. Since this is always available as a fallback we can restrict the kind of trees that can be interpreted as much as we need to. Hence, full interpretation of ASTs can be regarded as an optional feature - we don't lose essential expressiveness if it is absent. Quotes in spliced, interpreted code are kept as they are, after splices nested in the quotes are expanded recursively.
If the outermost scope is a quotation, we generate code that constructs the quoted tree. In Dotty this would be done using calls to the tpd module which provides a range of methods to construct statically typed trees. Splices inside quoted code
insert the spliced tree as is, after expanding any quotes in the spliced code recursively.
Methodology

Meta-programming always made my head hurt (I believe that's the same for many people). But with explicit Expr/Type types and quotes and splices it has become downright pleasant. The method I was following was to first define the underlying quoted or unquoted types using Expr and Type and then inserted quotes and splices to make the types line up. Phase consistency was at the same time a great guideline where to insert a splice or a quote and a vital sanity check that I did the right thing.