Skip to content

Instantly share code, notes, and snippets.

@odersky
Last active March 3, 2023 02:48
Show Gist options
  • Save odersky/f91362f6d9c58cc1db53f3f443311140 to your computer and use it in GitHub Desktop.
Save odersky/f91362f6d9c58cc1db53f3f443311140 to your computer and use it in GitHub Desktop.

Principled Meta Programming for Scala

This note outlines a principled way to meta-programming in Scala. It tries to combine the best ideas from LMS and Scala macros in a minimalistic design.

  • LMS: Types matter. Inputs, outputs and transformations should all be statically typed.

  • Macros: Quotations are ultimately more easy to deal with than implicit-based type-lifting

  • LMS: Some of the most interesting and powerful applications of meta-programming can be expressed as variants of staging.

  • Macros: Meta-programming is easier to integrate in the tool-chain if it is done at compile time, in the standard language, using the standard compiler.

The ideas of LMS and macros can be reconciled as is described in the following.

Quotes and Splices

We start with two well-known fundamental operations: quotation and splicing. Unlike in standard treatments of meta-programming, splices are permitted outside of quotes, which makes splices and quotes true duals of each other.

In Scala programs, quotation is expressed as '(...) or '{...} for expressions (both forms are equivalent) and as '[...] for types. Splicing is expressed as a prefix ~ operator.

If e is an expression, then '(e) or '{e} represent the typed abstract syntax tree representing e. If T is a type, then '[T] represents the type structure representing T. The precise definitions of "typed abstract syntax tree" or "type-structure" do not matter for the purpose of this note, the names are used only to give some concrete intuitions. Conversely, ~ e evaluates the expression e, which must yield a typed abstract syntax tree or type structure, and embeds the result as an expression (respectively, type) in the enclosing program.

Quotations can have spliced parts in them; in this case the embedded splices are evaluated and embedded as part of the formation of the quotation.

Quotes and splices are duals of each other. For arbitary expressions e and types T we have:

~'(e) = e
'(~e) = e
~'[T] = T
'[~T] = T

Types for Quotations

The type signatures of quotes and splices can be described using two fundamental types:

  • Expr[T]: abstract syntax trees representing expressions of type T
  • Type[T]: type structures representing type T.

Quoting takes expressions of type T to expressions of type Expr[T] and it takes types T to expressions of type Type[T]. Splicing takes expressions of type Expr[T] to expressions of type T and it takes expressions of type Type[T] to types T.

The two types can be thought of being defined as follows:

class Expr[T] {
  def unary_~: T   // splice operation
}
class Type[T] {
  type unary_~ = T  // splice type
}

Note that the prefix type ~ on Type requires SIP-33.

The Phase Consistency Law

A fundamental phase consistency law regulates accesses to free variables in quoted and spliced code:

  • For any free variable reference x, the number of quoted scopes and the number of spliced scopes between the reference to x and the definition of x must be equal.

Here, this-references count as free variables. On the other hand, we assume that all imports are fully expanded and that _root_ is not a free variable. So references to global definitions are allowed everywhere.

The phase consistency law can be motivated as follows: First, suppose the result of a program P is some quoted text '{ ... x ... } that refers to a free variable x in P This can be represented only by referring to original the variable x. Hence, the result of the program will need to persist the program state itself as one of its parts. We don't want to do this, hence this situation should be made illegal. Dually, suppose a top-level part of a program is a spliced text ~{ ... x ... } that refers to a free variable x in P. This would mean that we refer during construction of P to a value that is available only during execution of P. This is of course impossible and therefore needs to be ruled out. Now, the (informal, currently unspecified) small-step evaluation of a program will reduce quotes and splices in equal measure using the cancellation rules above. But it will neither create nor remove quotes or splices individually. So the phase consistency law ensures that program elaboration will lead to neither of the two unwanted situations described above.

Function Expansion

There's also an implicit "expand" decorator that turns a tree describing a function into a function mapping trees to trees.

implicit inline class Expander[T, U](f: Expr[T => U]) extends AnyVal {
  def apply(x: Expr[T]): Expr[U] = ???
}

The definition of apply above is provided as a primitive. It follows known value definitions until a closure is found. If no closure is found for a function f, Expander(f).apply(x) is simply '((~f)(~x)).

The expansion operation distributes applications of Expr over function arrows:

Expander(_).apply: Expr[S => T] => (Expr[S] => Expr[T])

The dual of expansion, let's call it reflect, can be defined as follows:

def reflect(f: Expr[T] => Expr[U]): Expr[T => U] = '{
  (x: T) => ~f('(x))
}

Note how the fundamental phase consistency law works in two different directions here for f and x. The reference to f is legal because it is quoted, then spliced, whereas the reference to x is legal because it is spliced, then quoted.

In the definition of reflect above, the types T and U are assumed to refer to global types, so the occurrence of T in the body of reflect does not constitute a violation of the phase consistency law. A polymorphic version of reflect could not use T like this, because there would be a quote but no splice between the parameter binding of T and its usage. But we can define the following alternative:

def reflect[T, U](f: Expr[T] => Expr[U])(implicit t: Type[T]): Expr[T => U] = '{
  (x: ~t) => ~f('(x))
}

Example

Assume an Array class with an inline map method that forwards to a macro implementation.

class Array[T] {
  inline def map[U](f: T => U): Array[U] = ~ Macros.mapImpl[T, U]('[U], '(this), '(f))
}

Here's the definition of the mapImpl macro, which takes quoted types and expressions to a quoted expression:

object Macros {

  def mapImpl[T, U](u: Type[U], arr: Expr[Array[T]], op: Expr[T => U])(implicit ctx: Context): Expr[Array[U]] = '{
    var i = 0
    val xs = ~arr
    var len = xs.length
    val ys = new Array[~u]
    while (i < len) {
      // ys(i) = (~op)(xs(i))   (x => x + 1)(xs(i))
      ys(i) = ~op('(xs(i)))
      i += 1
    }
    ys
  }
}

Here's an application of map and how it rewrites to optimized code:

genSeq[Int]().map(x => x + 1)

==> (inline)

val $this: Seq[Int] = genSeq[Int]()
val f: Int => Int = x => x * x
~ _root_.Macros.mapImpl[Int, Int]('[Int], '($this), '(f))

==> (splice)

val $this: Seq[Int] = genSeq[Int]()
val f: Int => Int = x => x * x

{
  var i = 0
  val xs = ~'($this)
  var len = xs.length
  val ys = new Array[~'[Int]]
  while (i < len) {
    ys(i) = ~('(f)('(xs(i))))
    i += 1
  }
  ys
}

==> (expand and splice inside quotes)

val $this: Seq[Int] = genSeq[Int]()
val f: Int => Int = x => x * x

{
  var i = 0
  val xs = $this
  var len = xs.length
  val ys = new Array[Int]
  while (i < len) {
    ys(i) = xs(i) + 1
    i += 1
  }
  ys
}

==> (elim dead code)

val $this: Seq[Int] = genSeq[Int]()

{
  var i = 0
  val xs = $this
  var len = xs.length
  val ys = new Array[Int]
  while (i < len) {
    ys(i) = xs(i) + 1
    i += 1
  }
  ys
}

Relationship with Inline

The proposal needs inline to move splices in macro libraries to the point where a macro is called. The proposal as is does not depend on the other uses of inline, notably inline values and parameters. It can be combined with inline values and parameters by assuming that they are already substituted out before phase consistency is checked.

For instance, here is an implementation of the power function that makes use of a statically known exponent:

inline def power(inline n: Int, x: Double) = ~powerCode(n, '(x))

private def powerCode(n: Int, x: Expr[Double]): Expr[Double] = 
  if (n == 0) '(1.0)
  else if (n == 1) x
  else if (n % 2 == 0) '{ { val y = ~x * ~x; ~powerCode(n / 2, '(y)) } }
  else '{ ~x * ~powerCode(n - 1, x) }

The usage of n as an argument in ~powerCode(n, '(x)) is not phase-consistent, but since n is an inline parameter the code is still valid.

There's some overlap between splice/quote and inlining, because the current definition of inlining in Dotty also does simplifications of conditionals and beta-reduction. Indeed, Array#map and power could also have been implemented as simple inline methods. Their implementation in terms of quoting and splicing is more involved but also more explicit.

A possible design choice would be to remove language support for conditional rewritings and beta-reduction of inlined code and rely exclusively on quoting and splicing.

Relationship with Staging

The framework expresses at the same time compile-time meta-programming and staging. The phase in which code is run is determined by the difference between the number of splice scopes and quote scopes in which it is embedded.

  • If there are more splices than quotes, the code is run at "compile-time" i.e. as a macro. In the general case, this means running an interpreter that evaluates the code, which is represented as a typed abstract syntax tree. The interpreter can fall back to reflective calls when evaluating an application of a previously compiled method. If the splice excess is more than one, it would mean that a macro's implementation code (as opposed to the code it expands to) invokes other macros. If macros are realized by interpretation, this would lead to towers of interpreters, where the first interpreter would itself interprete an interpreter code that possibly interpretes another interpreter and so on.

  • If the number of splices equals the number of quotes, the code is compiled and run as usual.

  • If the number of quotes exceeeds the number of splices, the code is staged. That is, it produces a typed abstract syntax tree or type structure at run-time. A quote excess of more than one corresponds to multi-staged programming.

Going Further

The presented meta-programming framework is so far quite restrictive in that it does not allow for the inspection of quoted expressions and types. It's possible to work around this by providing all necessary information as normal, unquoted inline parameters. But we would gain more flexibility by allowing for the inspection of quoted code with pattern matching. This opens new possibilites. For instance, here is a version of power that generates the multiplications directly if the exponent is statically known and falls back to the dynamic implementation of power otherwise.

inline def power(n: Int, x: Double): Double = ~{
  '(n) match {
    case Constant(n1) => powerCode(n1, '(x))
    case _ => '{ dynamicPower(n, x) }
  }
}

private def dynamicPower(n: Int, x: Double): Double =
  if (n == 0) 1.0
  else if (n % 2 == 0) dynamicPower(n / 2, x * x)
  else x * dynamicPower(n - 1, x)

This assumes a Constant extractor that maps tree nodes representing constants to their values.

Once we allow for inspection of code, the "expansion" operation that maps expressions over function to a functions over expressions can be implemented in user code:

implicit inline class Expander[T, U](f: Expr[T => U]) extends AnyVal {
  def apply(x: Expr[T]): Expr[U] = 
    f match {
      case Lambda(g) => g(x)
      case _ => '((~f)(~x))
    }

This assumes an extractor

object Lambda {
  def unapply[T, U](x: Expr[T => U]): Option[Expr[T] => Expr[U]]
}

Once we allow inspection of code via extractors, it's natural to also add constructors that construct typed trees directly without going through quotes. Most likely, those constructors would work over Expr types which lack a known type argument. For instance, an Apply constructor could be typed as follows:

def Apply(fn: Expr[_], args: List[Expr[_]]): Expr[_]

This would allow constructing applications from lists of arguments without having to match the arguments one-by-one with the corresponding formal parameter types of the function. We need then "at the end" a method to convert an Expr[_] to an Expr[T] where T is given from the outside. E.g. if code yields a Expr[_], then code.atType[T] yields an Expr[T]. The atType method has to be implemented as a primitive; it would check that the computed type structure of Expr is a subtype of the type structure representing T.

Implementation Scheme

Quotes and splices are primitive forms in the generated abstract syntax trees. They are eliminated in a macro-expansion phase. Macro-expansion takes place after typing. A natural place for it in the compilation pipeline would be shortly after tree pickling, but other schemes are also possible.

Macro-expansion works outside-in. If the outermost scope is a splice, the spliced AST will be evaluated in an interpreter. A call to a previously compiled method can be implemented as a reflective call to that method. Since this is always available as a fallback we can restrict the kind of trees that can be interpreted as much as we need to. Hence, full interpretation of ASTs can be regarded as an optional feature - we don't lose essential expressiveness if it is absent. Quotes in spliced, interpreted code are kept as they are, after splices nested in the quotes are expanded recursively.

If the outermost scope is a quotation, we generate code that constructs the quoted tree. In Dotty this would be done using calls to the tpd module which provides a range of methods to construct statically typed trees. Splices inside quoted code insert the spliced tree as is, after expanding any quotes in the spliced code recursively.

Methodology

Meta-programming always made my head hurt (I believe that's the same for many people). But with explicit Expr/Type types and quotes and splices it has become downright pleasant. The method I was following was to first define the underlying quoted or unquoted types using Expr and Type and then inserted quotes and splices to make the types line up. Phase consistency was at the same time a great guideline where to insert a splice or a quote and a vital sanity check that I did the right thing.

@olafurpg
Copy link

Thank you for sharing your note Martin. I have some comments:

  • do you propose Expr[T] should be favored over Tree? That sounds like
    going against the experience from scala.reflect where I believe Tree is
    more popular than Expr[T]. quoting
    https://github.com/milessabin/macro-compat#why-you-should-use-macro-compat

    Two of the biggest improvements of the Scala 2.11.x macro API over 2.10.x
    were [...] the ability to type macro
    implementation method arguments and results as Tree rather than Expr[T]. Both
    of these allow the signatures of macro implementation methods to be written
    significantly more succintly and readably

  • Can you elaborate the difference between the Array[T].map example from your note
    and a regular Array[T].map with inline?

class Array[T] {
  inline def map[U](f: T => U): Array[U] = {
    var i = 0
    val xs = this
    var len = xs.length
    val ys = new Array[U](len)
    while (i < len) {
      ys(i) = f(xs(i))
      i += 1
    }
    ys
  }
}

On portability, I have concerns about:

  • using inline as an enabler for macros in nsc. I suspect it will require large
    engineering work to pull it off and it still does not address the biggest
    complaints from the community with scala.reflect:
    • ease of implementing ambitious typed tree transformations
    • mixing of untyped/typedtrees causes cryptic crashes in later phases.
  • defining a system like this on top of TASTY until we have validated it's
    possible to robustly integrate TASTY with nsc. Efforts so far have been
    unsuccessful.

On the syntax for quotes/splicing:

  • How do you construct a pattern with quotations? For example x @ Some(y: Int).
  • How do you splice lists of trees? Like q"fun(..$args)".
  • SIP-33 prefix operators on types were informally rejected in the last SIP
    meeting, do you propose to reconsider that decision?
  • The '{...}' and '(...) syntax will require changes to not only parser but
    also tokenization, can we avoid that?
  • Could we support quotations with less intrusive changes to syntax?
    • quote[T](expr: T): Expr[T]
    • quotePat[A](f: PartialFunction[A, Unit]): Pat[???]
    • quoteType(T).type: Type[T]

@odersky
Copy link
Author

odersky commented Oct 30, 2017

@olafur Good points. I elaborated on some of them in my last edit of the gist.

  • Once we construct trees by hand, we need to be able to forget temporarily the subject type, as in Expr[_]. Maybe Tree is a good alias for that. Depends how often these are used.

  • The map example on Array is indeed not more powerful than inline alone. I have added more examples that demand more expressiveness than what can be supported by inline alone.

  • Portability: Yes, this is a far cry from current macros. Regarding inline, I think it's conceivable that under Scala2 mode we would also allow something like

      def map[U](f: T => U): Array[U] = macro mapImpl
    

    and translate it to the code that's given in the note. What's important is that inline, quote and splice decompose the question of meta-programming into three orthogonal, simple concepts, so we need to keep all three.

  • There's no pattern matching using quotes. I would argue that's a feature. I believe we need a system of extractors that is considerably more high-level than simple syntactic quotes. For instance, the Lambda extractor should certainly be insensitive to braces and aliasing. E.g., given

    val f = { (y: Int) => y + 1 }

we definitely want Lambda.unapply('(f)) to succeed, which means we have to follow the alias and strip the braces.

  • SIP 33: If we have a use case, I see no problem accepting prefix types. We did not have a good use case before, so we leaned towards rejection of that part of SIP 33.

  • Syntax: We could use quote { ... } and splice { ... }, in fact I started out this way. But since phase consistency requires that the typer counts quote and splice braces it should not just be some calls to Predef methods. So we'd need to make quote and splice keywords. Given this, it looks like '{...} and ~... is a more lightweight alternative. But either scheme would be fine with me.

@adriaanm
Copy link

adriaanm commented Oct 30, 2017

Interesting! Could we lift inline def to inline type and pattern match on Type[T] values? Something like inline type Show[T] = ~{ '[T] match { ... } }

@Ichoran
Copy link

Ichoran commented Oct 30, 2017

This looks really nice! I'm not completely sold on the syntax, but conceptually, with matching, it is very comprehensible (to me anyway). If one is not allowed to match on types, I suppose that one could just use typeclasses instead? For instance, when trying to abstract across math on primitives, since all operations take place widened at least to Int, you have various bit-masking and casting to do if you want to maintain the original data type.

It looks to me like the scheme doesn't have any safeguards against nontermination (e.g. forgetting the i += 1 in the while loop in the map example would do it), but it shouldn't be too hard to wrap this in a supervisor and bail out with a helpful message.

I suspect if this works well, people are going to run into the same kind of problems as with C++ templates, as this is essentially a code generation mechanism, not a method or class generation mechanism. This means there can be a huge explosion of code size because it's not apparent whether a statement inserts a method call or 1 kb of bytecode. I wouldn't initially say that this is a terrible problem to have, except that (1) the bytecode-per-method size on the JVM is pretty limited, and (2) for a .js target you may want to inline less to reduce transfer times. I guess it's not too hard to get around these by identifying common code to de-inline.

Also, it doesn't seem terribly useful for solving the major problem of going from schema to code (whether it be database or JSON or whatever). This is a big use case for macros presently. But is there any reason the same strategy can't be applied to classes? That is, must the target of a quote necessarily be an expression, or is any language statement fair game? Alternatively, one could "expressionize" class definitions.

@odersky
Copy link
Author

odersky commented Oct 31, 2017

Interesting! Could we lift inline def to inline type and pattern match on Type[T] values? Something like inline type Show[T] = ~{ '[T] match { ... } }

Yes, quite possibly. That's to some degree orthogonal to this proposal though. It would affect the way we can express whitebox macros. I would like to pursue this one as a separate issue.

@odersky
Copy link
Author

odersky commented Oct 31, 2017

If one is not allowed to match on types, I suppose that one could just use typeclasses instead?

I think one should be allowed to match on types. If you can match on Expr, then you should also be able to match on Type. Would be good to come up with an example that shows this.

Also, it doesn't seem terribly useful for solving the major problem of going from schema to code (whether it be database or JSON or whatever). This is a big use case for macros presently. But is there any reason the same strategy can't be applied to classes? That is, must the target of a quote necessarily be an expression, or is any language statement fair game? Alternatively, one could "expressionize" class definitions.

Generating class definitions (and other definitions) is potentially possible but will be tricky because of the types. What is the type of a class definition? It would have to tell us about all its members, its parents, and its constructors. There is such a type structure in the compiler (it's called ClassInfo) but it's not exposed to the user currently. Even if we did expose it, we'd need rules how to abstract over it and so on. So this is left for future work.

An alternative to type-provider like macros would be code generation. The proposed framework is actually quite suitable for code generation as well, because it works in exactly the same way for staging and for macros.

@vkuncak
Copy link

vkuncak commented Nov 4, 2017

Hi Martin,

  1. I would love to see operational semantics rules that would extend e.g. small step CBV lambda calculus semantics with rules for quote and splice. I understand that
    ~'(e) = e
    '(~e) = e
    would be there (as rules or consequences), but there need to be some congruence rules that connect quote and splice to function application (and, equally importantly here, some congruence rules would be absent).

Ignoring the need to enforce left-to-right evaluation, I guess one wants something like:

if e1 -> e2 then '(e1) -> '(e2)

  1. What is the relationship to string interpolation? I am asking because string interpolation expands into certain function calls and it also allows splicing and they desugar into method calls.

{val x = "foo"; s"x = ${val y = "bar"; s"hello ${x + y} world" }"}

Why not explain the mechanism using such desugaring? Can one separate the spliciting and quoting facilities from their application to the data type that represents Scala programs? This would unify with string interpolation and give better support to constructing DSLs that are used for templating other types of things, not just Scala code itself.

@LPTK
Copy link

LPTK commented Nov 13, 2017

@odersky thank you for sharing the URL.

As I have mentioned earlier, the Squid type-safe metaprogramming framework that I have been working on already does pretty much all of this (in a more or less experimental capacity 😄). So it could be a good testbed for ideas before integrating support directly into the compiler.

For example, the second power macro example in this document is exactly, modulo syntactic differences, the last example given in our SCALA'17 paper (in the Squid repo, the prototype implementation of @macroDef lives under branch squid-macros):

@macroDef(Embedding)
def naivePower(base: Double, exp: Int): Double = {
  // in this scope, base: Code[Double] and exp: Code[Int]
  exp match {
    case Const(exp) =>
      var cur = code"1.0"
      for (i <- 1 to exp) cur = code"$cur * $base"
      cur
    case _ => code"Math.pow($base, $exp.toDouble)"
  }
}

In Squid, we use traditional quasiquotes (a.k.a. string interpolators, as noted by @vkuncak) of the form code"...${xyz}...", though we also support quasi-code code{...${xyz}...}, but the latter is more limited due to the macro-based embedding (cannot use it in patterns, for example).

The main place where first-class compiler support would improve Squid is cross-quotation references. Currently, if we with to express '{(x:Int) => ~(identity('{x}))} we have to use either an explicit free variable (marked by prefix ?) that is then captured by the lambda binder, as in code"(x:Int) => ${identity(code"?x:Int")}" or use automatic function lifting (which takes a Code[A] => Code[B] and inserts/inlines it as a Code[A => B]), for example code"(x:Int) => ${ (x0:Code[Int]) => identity(x0) }(x)" or equivalently code"(x:Int) => ${ identity(_:Code[Int]) }(x)".
Though it should be noted that only the former approach is scope-safe (see our POPL paper), as the other approach as well as the approach advocated in this design document (which is MetaOCaml's approach) are vulnerable to runtime scope extrusion errors by ways of imperative effects or misplaced code evaluation.

I don't think conflating code evaluation and unquote/splicing with ~ is a good idea, because these are fundamentally two different concepts. As I hinted previously, having code evaluation opens up the possibility of runtime crashes, for example if the user does code"(x:Int) => ${ (x:Code[Int]) => println(code"$x+1".run); x }(x)". If code eval has the same syntax as unquotation, it will make such errors much harder to find –– this would now look like '{(x:Int) => ~{ println(~'{x+1}); '{x} }}.
Note that Template Haskell uses $-splice for both unquotation and final top-level macro expansion as in this document, but I don't think Template Haskell can evaluate any code at compile time.
Relatedly, I do not understand this part:

If there are more splices than quotes, the code is run at "compile-time"

If '{123} has type Expr[Int], then ~'{123} (which has the same number of quotes and splices) should have type Int, and if not surrounded by a quotation it should result in compile-time evaluation (unless it is the direct result of a macro); I don't see how things with more splices than quotes like ~(~'{123}) would type check at all, as unary_~ is defined on Expr[_], not Int.

There's no pattern matching using quotes.

Why not? As we show in our GPCE'17 paper, this can be an immensely useful feature for designing type-safe rewritings/optimizations.

I would argue that's a feature. I believe we need a system of extractors that is considerably more high-level than simple syntactic quotes. For instance, the Lambda extractor should certainly be insensitive to braces and aliasing.

The current version of Scala-reflection quasiquotes already glosses over minor syntactic differences like that –– for instance pattern q"($name:$typ) => $body" matches q"{_:Int => ()}".
Squid takes this much further, as all quotes, including patterns, are normalized (fully-resolved types and implicits, fully-qualified names, etc.), which makes pattern-matching more semantic than syntactic. We can even define extensions to the IR so that, say pattern code"(x:$t) => $body(x)" matches code"(x:Int,y:String) => y * x".

As for the formal part and to partially answer @vkuncak, in our POPL paper we use big-step semantics, and remove all same-level unquotes at once, inside-out (this is the role of all the Q-rules in our operational semantics).

if e1 -> e2 then '(e1) -> '(e2)

We don't want to evaluate quoted code unless it is actually run. code"2.toDouble" is a value, but code"2.toDouble".run evaluates to 2.0.

Can one separate the spliciting and quoting facilities from their application to the data type that represents Scala programs?

Squid does something like that, as it separates the type-safe quotation/unquotation mechanisms from the definition of the language's intermediate representation. As such, Squid (which originally stands for the approximate contraction of Scala Quoted DSLs) can be used to help the design of DSLs. This is explained in our SCALA'17 paper.

Finally, note that staging applications (i.e., runtime multi-stage programming) require runtime code compilation, because the whole point is to generate efficient implementations at runtime, so an interpreter doesn't cut it. Squid provides the .compile method (analogous to .run) for this use case. It invokes the Scala compiler at runtime.

@Blaisorblade
Copy link

Blaisorblade commented Nov 22, 2017

Looks really interesting! A few assorted comments:

  • I’m not sure why Type[T] is indexed by the type.

  • @LPTK’s papers looked rather impressive, so I guess they’re worth checking out. And either way, I suspect expanding later versions into a paper (with further case studies) would be worthwhile to refine the design.

  • @Ichoran:

I suspect if this works well, people are going to run into the same kind of problems as with C++ templates, as this is essentially a code generation mechanism, not a method or class generation mechanism. This means there can be a huge explosion of code size because it's not apparent whether a statement inserts a method call or 1 kb of bytecode. I wouldn't initially say that this is a terrible problem to have, except that (1) the bytecode-per-method size on the JVM is pretty limited, and (2) for a .js target you may want to inline less to reduce transfer times.

In C++ you often don’t have the choice to not generate code (since you lack parametric polymorphism). It’s not clear to me why this proposal would be different than macros, in Scala or elsewhere, so I must be missing something. But it might be indeed cool to make it easy to tell what sort of method you’re calling, either through IDEs highlighting or Rust’s convention of using ! in macro names.

Also helpful: a way to wrap a function call so that it emits a new method; LMS has such a thing, under the type signature for reflect (let’s call it reflectAsMethod). That would avoid having to deinline functions after they’re expanded:

I guess it's not too hard to get around these by identifying common code to de-inline.

You want common-subexpression elimination that extracts functions? Never heard of such a thing; given two expressions, finding if they’re instances of the same function seems a case of higher-order unification, which is undecidable. Worse, the best algorithms for it are all extremely hard to implement on lambda calculus (let’s say there’s maybe 20 people who implemented that paper).

If you allow annotations, the above reflectAsMethod is a nice and well-tested interface.

@lihaoyi
Copy link

lihaoyi commented Nov 22, 2017

This means there can be a huge explosion of code size because it's not apparent whether a statement inserts a method call or 1 kb of bytecode

Sounds exactly like Scala as it has been for 3 years. You should talk to some of the people using uPickle 😛

@milessabin
Copy link

milessabin commented Nov 22, 2017

I like the look of this proposal a lot.

One of my long standing concerns with current Scala macros is that they have been predominantly concerned with terms whereas types have been handled almost, it seems, as an afterthought. It's not completely clear to me that your proposal allows for the construction (and inspection) of both terms and types on an equal footing. Could you give some examples that relate to types as well?

@milessabin
Copy link

milessabin commented Nov 22, 2017

Given the portability and evolution mess we're currently in, I would strongly advocate for a model which makes quote/unquote and (nested) quotation based extractors primary ... I think that experience has shown that an explicit tree representation is too easily conflated with compiler internals.

An interesting question here is then how do we handle important latent properties, ie. things which aren't directly observable in source fragments? For instance, the set of subclasses of a sealed type ... is there a natural way that that could be represented as an extraction from a quoted type? And could we extract dependent types to compile time functions from quoted terms to quoted types?

@olafurpg
Copy link

olafurpg commented Nov 22, 2017

I can highly recommend taking a look at @LPTK's Squid quasiquotes: https://github.com/epfldata/squid/ (esp. POPL paper https://infoscience.epfl.ch/record/232427) Squid quasiquotes are statically guaranteed (at macro definition time) to be well-typed, well-scoped and hygienic. Squid quasiquotes also support runtime (at macro expansion time) inspection/transformation of terms/types.

@odersky
Copy link
Author

odersky commented Nov 22, 2017

@ltpk:

... are vulnerable to runtime scope extrusion errors by ways of imperative effects or misplaced code evaluation.

That's an interesting question. I am actually not convinced the proposal has the problems you mention. The problem caused by Meta-OCamls run is absent, simply because there is no run! We use ~ instead, which is checked for phase consistency. Scope extrusion due to imperative effects seems to be prevented by phase consistency as well, at least the example you give in your paper would be rejected for that reason.

@LPTK
Copy link

LPTK commented Nov 22, 2017

Thank you @Blaisorblade and @olafurpg for the credit.

@Blaisorblade

I’m not sure why Type[T] is indexed by the type.

If you reflect the types of object language terms in your static type system, you need to do the same for type representations as well, as long as you want these two constructs to interact. In Squid, this is especially important since type representations are usually passed around implicitly. For example, in:

def foo[T:CodeType] = code"Option.empty[T]"
// which is equivalent to:
def foo[T](implicit tp: CodeType[T]): Code[T,{}] = code"Option.empty[$tp]"

@odersky

I am actually not convinced the proposal has the problems you mention. The problem caused by Meta-OCamls run is absent, simply because there is no run! We use ~ instead, which is checked for phase consistency.

Ah, I see what you did there 😄
I think that the problem you will face is to actually enforce the consistency rule:

  • For any free variable reference x, the number of quoted scopes and the number of spliced scopes between the reference to x and the definition of x must be equal.

How does the system keep track of which identifiers each piece of code may contain, and at which 'quoted scope' level they are?
For example, consider the following code:

var t = `{0}
`{ (x: Int) => ~{
  if (true) t = `{x} // how to prevent this? it assigns a well-defined value of type Expr[Int] to a variable of the same type
  x
}}
println(~t) // how do I know 't' is not properly scoped here?

It seems that you will need to either reflect scope information in the types of terms (similar to previous work, including our POPL paper; for example, refined environment classifiers), or rely on some necessarily incomplete static program analysis.

@TiarkRompf
Copy link

This looks like a useful proposal, but I don't see it reconciling "LMS ideas" with anything, other than Expr[T] providing a handle to define implicits that hide the quotation syntax again (as in the definition of class Expander).

I have similar doubts about the phase consistency as @LPTK.

@odersky
Copy link
Author

odersky commented Nov 23, 2017

@LPTK

How does the system keep track of which identifiers each piece of code may contain, and at which 'quoted scope' level they are?

I was assuming a purely syntactic criterion: Just count the number of quotes and splices between a definition and its usage. It's true that this does not prevent scope extrusion via variable assignment. But maybe we can in addition require that spliced expressions are pure? (or, punt the question as long as we do not have a sufficiently expressive effect system).

@odersky
Copy link
Author

odersky commented Nov 24, 2017

[Edited Nov29th]

I was completely wrong about the following:

There's actually an important aspect to the absence of run. Splice and run are not the same. A top-level splice means the spliced code will be evaluated at compile time, but a top-level run means it will be evaluated at run-time. Absence of run means that with quote and splice alone we can express staged code but cannot force its evaluation at run time. But this is not as bad as it looks! In fact having an eval-like capability in the language can be problematic to implement (e.g., how would you do eval on JS?) and can be problematic for security reasons.

In fact, top-level splices are evals. The compiler is free to reduce them at compile-time if there is sufficient static information to do so. That's just constant folding on steroids. The inline modifier just makes sure that the compiler will do the reductions.

Since top-level splices are both evals and macro-expansions, it follows that the spliced operations themselves must be pure.

What we could do instead in addition is define a set of translators that map an Expr[T] to code on some platform. E.g. there could be library methods toJVM, toLLVM, toJS, toCuda for the various targets that are supported in a system. For some platforms there could be added functionality which dynamically loads a generated artefact. That would then implement the full eval capability. But it would be platform dependent.

@odersky
Copy link
Author

odersky commented Nov 24, 2017

@milessabin

One of my long standing concerns with current Scala macros is that they have been predominantly concerned with terms whereas types have been handled almost, it seems, as an afterthought. It's not completely clear to me that your proposal allows for the construction (and inspection) of both terms and types on an equal footing. Could you give some examples that relate to types as well?

So far, the whole code and type inspection aspect is under-developed. In the extreme, following Meta OCaml, there would be none. But that's probably too restrictive. If we need some form of inspection, I am on the fence whether that would be quasi-quote based (e.g. as in Squid) or whether we would expose some form of ADT. I agree with you that exposing a very syntactic ADT close to compiler trees is clearly the worst of all options. But if we could come up with something more distilled it could give us a smaller language to deal with and give us exhaustivity warnings to boot. If our language was Haskell, then I think its typed IR which is essentially System F + constraints is much easier to deal with than source-level trees. That's why the Haskell compiler goes as fast as it can into that representation. For Scala we don't have that luxury because our compilation targets are different. So our intermediate representation is more complex than Haskell's, where we need to keep info about methods and classes and things like that. But at least we do have a specification of this IR now in the form of Tasty trees.

For the types I was thinking if we could map them to something like DOT types it might be easier to analyze than source types. E.g. source-level refinement types (which have almost all the complexity of classes), can be mapped to DOT's recursive types and intersections, which are individually much easier to deal with.

@LPTK
Copy link

LPTK commented Nov 24, 2017

@odersky

I was assuming a purely syntactic criterion: Just count the number of quotes and splices between a definition and its usage.

Note that this imposes several important restrictions. In particular, you cannot abstract over expressions that do code evaluation, as something like (pgrm: Expr[Int]) => Some[Int](~pgrm) is not phase-consistent. This means you cannot even let-bind programs that are to be evaluated!
Another limitation is that you cannot do cross-stage persistence (CSP), of which the most basic example is def lift(n: Int): Expr[Int] = `{n}. That latter limitation is probably fine; Squid does not allow CSP either, because it has problematic semantics in the context of general metaprogramming applications, but maybe CSP could have made sense in this particular system?

It's true that this does not prevent scope extrusion via variable assignment. But maybe we can in addition require that spliced expressions are pure?

Perhaps an appropriate direction would be to make open code values second-class citizens, so as to limit their lifetime and prevent scope extrusion. It would be really nice to support second-class values in general, anyway.

Splice and run are not the same. A top-level splice means the spliced code will be evaluated at compile time

Really? What would be the semantics of this code?

~(if (readInt > 0) `{1} else `{0})

If by "compile time," you mean "macro execution time," then run does the same as your top-level splice.

how would you do eval on JS?

Like this (link) 😄 –– but sure, it would be more problematic on other platforms like Scala Native.

@LPTK
Copy link

LPTK commented Nov 24, 2017

@odersky

If we need some form of inspection, I am on the fence whether that would be quasi-quote based (e.g. as in Squid) or whether we would expose some form of ADT.

I would like to emphasize that having quoted patterns does not mean that the semantics of matching is necessarily syntax-based (as it is in scala-reflect or scala-meta). There can be an arbitrary amount of normalization, pre-processing and unification going on, which are in large part enabled by the statically-typed and hygienic setting we have here (and in Squid; as opposed to scala-reflect).

Notice that the following Squid program correctly matches the expression with its pattern, although they are of widely different syntactic shapes:

code"intWrapper(1).to(10).foreach[scala.Unit](println(_))" match {
  case code"for (i <- 1 to $n) $f(i) : $t" => println(n,f,t)
}

The printed values are (code"10", code"(x$1_0: scala.Int) => scala.Predef.println(x$1_0)", CodeType(Unit)). Note that Squid caches pattern representations, so it does not have to renormalize them on every execution of the match.

On the other hand, I think that the widespread adoption of scala-reflect quasiquote has already shown that quoted expressions and patterns provide a much nice user interface than algebraic data types: they're straightforward to understand, and all features are simple translations from the actual syntax of the language (as opposed to some awkward encoding), making them easily discoverable and rememberable.
What Squid shows in addition is that we can graft advanced custom static typing guarantees to the quasiquote system that would be difficult to encode directly with normal language features (and such encoding would obscure the reflection interface even further).

@odersky
Copy link
Author

odersky commented Nov 25, 2017

@ltpk

Note that this imposes several important restrictions. In particular, you cannot abstract over expressions that do code evaluation, as something like (pgrm: Expr[Int]) => Some[Int](~pgrm) is not phase-consistent.

Indeed, you can't do that. You have to write: (pgrm: Expr[Int]) => '{Some[Int](~pgrm)}.

This means you cannot even let-bind programs that are to be evaluated!

I am not sure what you mean by that. Can you show a full, self contained example that demands this? That would help us better evaluate how serious a restriction this is. Note that the following is phase consistent:

val x = '{ ... }
'(f ~x)

About second-class-citizens, we'd need to see that fully worked out. I am a bit skeptical it would be lightweight enough, in the end. This holds for all more sweeping type system extensions. My standpoint is to look for something dead-simple. And if there's nothing of the kind then don't do anything. Scala's type system is rich enough already.

@odersky
Copy link
Author

odersky commented Nov 25, 2017

Really? What would be the semantics of this code?

    ~(if (readInt > 0) `{1} else `{0})

If that occurs at toplevel, it runs the code at compile-time - the compiler would go read an integer from the command line and insert 1 or 0 depending on what it read (which is probably not what you want, and hopefully the macro sandbox will prevent you from doing it). Remember that macro invocation is just splicing of a call to a macro library - there is no separate mechanism. I would be very surprised if you could show me a way to do evaluation of arbitrary code at runtime within the confines of the system.

@LPTK
Copy link

LPTK commented Nov 27, 2017

@odersky

Remember that macro invocation is just splicing of a call to a macro library - there is no separate mechanism.

Thanks for the clarifications. I see that I had misunderstood your description of the outer splice. What confused me was the wording "if the outermost scope is a splice, the spliced AST will be evaluated in an interpreter" –– to me, from a MSP perspective, "the spliced AST" referred to the runtime code representation that is returned by the expression inside the splice. Similarly the formulation "the code is run," I mistook for referring to runtime code representation evaluation (the run operation), whereas you were just talking about normal current-stage code execution.

Am I correct that this proposal could be formulated in terms of more traditional multi-stage programming semantics, where the whole user program is in fact a quoted code expression, and top-level splices are not top-level anymore and are not special in any way?
The main difference, of course, being that the code is run "at compile-time" (although the meaning of "compile time" at this point is pretty blurry; as we see it's also a runtime stage in its own right now).

In other words, the following code:

  inline def triple(n: Int) = ~{ 
    println("printed a compile-time; code passed was: " + `{n})
    `{val tmp = n; tmp + tmp + tmp }
  }
  triple(1) + triple(2)

can be understood in terms of traditional staging as:

`{
  inline def triple(n: Int) = ~{ 
    println("printed a compile-time; code passed was: " + `{n})
    `{val tmp = n; tmp + tmp + tmp }
  }
  triple(1) + triple(2)
}

if we assume that the inlining is performed before the splicing (and the evaluation of the expression in the splice), which is the only non-standard behavior here, I would say.

@odersky
Copy link
Author

odersky commented Nov 29, 2017

@ltpk Trying to do the formalization I just found out that I was completely wrong about the difference between splice and eval. In fact, top-level splices can express both eval and macro-expansion, depending on whether they are inlined or not. So coming back to your example:

~(if (readInt > 0) '{1} else '{0})

If this was legal, it could be executed at run-time or at compile time. But since the spliced expression is impure, it is in fact not legal.

@LPTK
Copy link

LPTK commented Nov 30, 2017

@odersky here is what I was expecting, given what you said:

  inline def triple(n: Int) = ~{ 
    println("printed a compile-time; code passed was: " + `{n})
    `{val tmp = n; tmp + tmp + tmp }
  }
  triple(1) + triple(2)

prints printed a compile-time; code passed was: '{1} and printed a compile-time; code passed was: '{2}, and is equivalent to:

  {val tmp = 1; tmp + tmp + tmp } + {val tmp = 2; tmp + tmp + tmp }

and

  def triple(n: Int) = ~{ 
    println("printed a compile-time; code passed was: " + `{n})
    `{val tmp = n; tmp + tmp + tmp }
  }
  triple(1) + triple(2)

prints printed a compile-time; code passed was: '{n} and is equivalent to:

  def triple(n: Int) = {val tmp = n; tmp + tmp + tmp }
  triple(1) + triple(2)

Wouldn't that also work?

@odersky
Copy link
Author

odersky commented Dec 2, 2017

@LPTK Yes, that's the expected behavior, if we disregard that println is a side effect and therefore illegal in a splice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment