xeno-by/scala-meta-high-level-overview.md Secret

## scala-meta-high-level-overview.md

      
    Raw
  

              scala-meta-high-level-overview.md
            
          
    Scala.meta: high-level overview

In this document, we will take a tour of scala.meta. First, we will start with the architecture,
overviewing code organization patterns underlying scala.meta.
Afterwards, we will take an example of a metaprogram written using scala.meta,
explaining how it works and how it can be executed in different environments.
During the tour, we will intentionally limit ourselves to basic comments about scala.meta APIs,
leaving further documentation for future work.
This document describes work in progress. Parts of the functionality described below only exist
in experimental branches of scala.meta. Concretely, syntactic APIs, such as parsing, tokenization,
quasiquotes and prettyprinting, have been released in scala.meta 1.0.0 and are available for general use.
Semantic APIs, which involve typechecking, name resolution, etc, are planned for scala.meta 2.0.0
and are currently unavailable.
Table of contents


Architecture
The isImmutable metaprogram
Differences from scala.reflect
Compile-time execution
Runtime execution
Other environments

Architecture

Informed about the major usability downsides of using cake pattern to expose a public API,
scala.meta implements its language model in a set of top-level definitions.
All that it takes to use the language model is import scala.meta._.
Much like the decision to use cake pattern in scala.reflect, the decision to use top-level definitions
in scala.meta also has far-reaching consequences. In addition to bringing a lightweight feel of not forcing users
into unconventional idioms, it has a major impact on how scala.meta APIs are organized.
In scala.reflect, universes encompass both the definitions of the language model and
the pieces of state that are required for its operations. For example, when a metaprogram asks a definition
about its signature or a type about the list of its members, the enclosing universe consults its symbol table.
that internally lives in the universe. This happens unbeknownst to the users,
because scala.reflect lives within a cake that hides this detail in its internal state.
In scala.meta, we have to be explicit about state, because the cake is gone. In order to accommodate this design requirement,
we went through the operations supported by scala.reflect
and split them into groups based on the kind of state these operations work with.
As a result, we ended up with three groups of APIs that comprehensively cover the functionality exposed in scala.reflect.
1) Stateless APIs such as manual construction and deconstruction of reflection artifacts.
Unlike in scala.reflect, the language model of scala.meta is stateless, so it can be used in arbitrary situations,
regardless of whether it's compile time, runtime or any other environment.
scala> import scala.meta._
import scala.meta._

scala> Term.Name("x")
res0: Term.Name = x

scala> val Term.Name(x) = res0
x: String = x

2) Syntactic APIs such as parsing, quasiquotes and prettyprinting. These APIs can change behavior
depending on a particular version of the language, so we reified these distinctions into a dedicated entity called Dialect,
and require a dialect in all such APIs. Below you can see a simplified excerpt from scala.meta that illustrates this design.
package meta {
  trait Dialect {
    private[meta] def allowXmlLiterals: Boolean

    ...
  }

  package object dialects {
    implicit object Scala211 extends Dialect { ... }
    implicit object Dotty extends Dialect { ... }
    ...
  }
}

package object meta {
  implicit class XtensionParse[T](inputLike: T) {
    def parse[U](
      implicit convert: Convert[T, Input],
      parse: Parse[U],
      dialect: Dialect): Parsed[U] =
    {
      val input = convert.apply(inputLike)
      parse.apply(input, dialect)
    }
  }

  ...
}

Here, a dialect is an opaque entity that doesn't have public methods, encapsulating
differences between language versions in methods that are only visible to scala.meta. For example,
Dialect.allowXmlLiterals indicates whether a particular language version supports XML literals.
Current versions of the compiler have this feature, but future versions based on Dotty are going to drop support for it.
Syntactic operations like Input.parse take an implicit dialect and use its internal methods
to implement their logic. This particular operation is just a simple proxy that converts its argument
to parser input and then feeds the input along with the dialect into a parser encapsulated in Parse,
but the parser itself makes full use of syntax peculiarities expressed by the dialect.
In order to use a syntactic API, we import an implicit dialect (note the implicit
modifiers next to implementors of Dialect). After an implicit dialect is available in scope,
calls to syntactic operations will automatically use it.
scala> import scala.meta.dialects.Scala211
import scala.meta.dialects.Scala211

scala> "<xml />".parse[Term]
res0: Parsed[scala.meta.Term] = <xml />

In order to improve user experience, current version of scala.meta features a fallback dialect
that is used if no dialect was explicitly imported by the metaprogrammer. This default dialect captures
to the version of the compiler that compiled a given call to a syntactic API.
Therefore, even if in the listing above we didn't import Scala211, the call to parse would still work,
and its result would correspond to the behavior of the particular version of the compiler that underlies the REPL.
3) Semantic APIs such as name resolution, typechecking, enumeration of members of a given type, etc.
These operations need an index that keeps track of definitions available in the program and its dependencies.
We encapsulated such an index in a dedicated trait called Mirror and
require a mirror in all semantic APIs.
To illustrate this point, here's a simplified excerpt from scala.meta
that demonstrates the definition of Mirror and one of the associated semantic operations.
package meta {
  trait Mirror {
    private[meta] def dialect: Dialect
    private[meta] def defn(ref: Ref): Member
    ...
  }

  object Mirror {
    implicit def mirrorToDialect(mirror: Mirror): Dialect = {
      mirror.dialect
    }
  }
}

package object meta {
  implicit class XtensionTypeRef(tree: Type.Ref) {
    def defn(implicit m: Mirror): Member = {
      m.defn(tree)
    }
  }

  ...
}

Much like a dialect, a mirror is also an opaque trait with all its logic concentrated in internal methods.
Analogously to syntactic operations, semantic operations take an implicit mirror.
Additionally, since a mirror must be aware of its language version,
it has a dialect and can be converted to a dialect enabling syntactic APIs.
Here's an example that creates a mirror from a JVM environment that contains the standard library
and then transparently uses this mirror to resolve the identifier List,
obtaining a scala.meta representation of its definition. In this example, we use quasiquotes,
a convenient notation for abstract syntax trees.
scala> implicit val m = Mirror(".../scala-library-2.11.8.jar")
m: Mirror = ...

scala> q"List".defn
res3: Member.Term =
object List extends SeqFactory[List] with Serializable { ... }

This design of metaprogramming operations requires programs written using scala.meta
to explicitly request capabilities that describe their needs. For example, a formatter
will probably be okay with a dialect, whereas a linter will likely need a mirror.
Replacing universes with capabilities has been a significant improvement of user experience.
First, some metaprograms don't need explicit capabilities, which means that both they and their usages
are going to be more concise than in scala.reflect.
Secondly, in scala.reflect, both writers and callers of metaprograms have to worry about universes,
whereas in scala.meta capabilities are typically implicit, so they can be passed around automatically.
Finally, capabilities present a much smaller cognitive load, only requiring their users to understand implicits,
in contrast to universes that make use of advanced aspects of path-dependent types.
The only case when capabilities cause issues are universal methods toString, hashCode and equals
that are inherited from Any, the top type of the Scala type system.
These methods have hardcoded signatures that can't accommodate additional implicit parameters,
which presents a serious design problem.
Despite the difficulties, we've been able to ensure sensible behavior for stringification.
On the one hand, we provide providing a dedicated API for customizable prettyprinting, which does take a dialect.
On the other hand, for every object whose prettyprinting depends on a dialect, we remember if a particular dialect
was used during its creation (e.g. a dialect to parse code into a tree) and then use it when toString is called.
The problem with hashing and equality is more challenging. Maps and sets in both Scala and Java standard libraries
use hashCode and equals, and that makes them unusable for abstract syntax trees whose equality relies
on name resolution. Since, unlike with dialects, there's no mirror that could work as a sensible default, we currently
give up and require metaprogrammers to use customized collections.
To put it in a nutshell, the architecture of scala.meta provides tangible benefits over scala.reflect.
From the point of view of plumbing, metaprograms written with scala.meta are more concise and easier to understand,
because the infrastructure makes use of less advanced language mechanisms. The only significant issue
that we observed is the necessity for custom maps and sets to accommodate custom hashing and equality for abstract syntax trees.
The isImmutable metaprogram

We call a value immutable if it cannot be modified and all its fields themselves have immutable types.
In this section, we will use scala.meta to define a method called isImmutable
that checks whether a given type is immutable, i.e. whether all its values are guaranteed to be immutable.
For example, the immutability check on class C(x: Int) will fail,
because someone can create the subclass of C and add mutable state to that subclass.
Continuing our example, if we add final to the definition of class C, the immutability check
will succeed, because now values of type C must be instances of class C,
and the only field of that class is immutable and has primitive type.
Strictly speaking, even immutable fields can be modified by JVM reflection,
which means that isImmutable can only really succeed on primitives.
However, such use of JVM reflection is strongly discouraged in the Scala community,
therefore, without the loss of usefulness of our immutability check,
we will assume that immutable fields are allowed.
import scala.meta._

def isImmutable(t: Type)(implicit m: Mirror): Boolean = {
  val cache =
    scala.collection.mutable.Map[Type, Boolean]().
    withEquality(Type.equality)

  def uncached(t: Type): Boolean = {
    t match {
      case t"$t ..@$annots" =>
        cached(t)
      case t"$t forSome { ..$defns }" =>
        cached(t)
      case t"..$parents { ..$defns }" =>
        parents.exists(cached)
      case t"$_.type" | t"${_: Lit}" =>
        cached(t.widen)
      case t"${ref: Type.Ref}[...$_]" =>
        if (ref.defn.isFinal && (ref.defn.tpe =/= t"Array")) {
          if (t.vars.nonEmpty) return false
          val fieldTypes = (t.vals ++ t.objects).map(m => m.tpe)
          fieldTypes.forall(cached)
        } else {
          false
        }
      case _ =>
        sys.error("unsupported type: " + t)
    }
  }

  def cached(t: Type) = {
    cache.getOrElseUpdate(t, { cache(t) = true; uncached(t) })
  }

  cached(t)
}

The happy path of our algorithm happens in case t"${ref: Type.Ref}[...$_]".
First, we check whether the given type refers to a final class
or to an object (objects are singletons, so they are implicitly final).
Afterwards, we go through the members of the type.
While iterating through members, we only look into those that define terms,
skipping methods (because methods don't define state), bailing on var
and recursively checking types of vals and nested objects.
To be precise, we could allow definitions that aren't final. Sealed traits and classes
cannot be extended outside of their compilation units. This means that such definitions can be immutable as long as all their
subclasses, which are statically known, are immutable.
However, handling that case correctly requires a workaround or a fix for SI-7046,
which are both beyond the scope of this overview.
Now let's get into nitty-gritty details.
In our experience, metaprogramming in Scala makes comprehensive handling of corner cases unusually hard.
The language model is quite sizeable, so it takes a while to ensure that all possibilities are covered.
isImmutable is no exception - its full code is four times bigger than its happy path.
First, we create the infrastructure to avoid infinite recursion.
We use straightforward memoization with a minor change that postulates everything be immutable unless proven otherwise
(cache(t) = true). This change is necessary to handle circular dependencies,
i.e. situations when a class A has a field of type B and a class B has a field of type A.
Here, we have to accommodate the fact that semantic equality that is necessary to correctly compare types
doesn't work out of the box with maps from the standard library as described in "Architecture".
The withEquality method is a custom helper that makes maps respect custom equality schemes.
Implementation of this helper is omitted for brevity.
Afterwards, we go through all flavors of types that are supported by Scala, and make sure that
our algorithm works correctly on them.
First, we handle types that classify values. Along with such types,
the scala.meta language model also features others, e.g. Type.Method
(a type that encodes method signatures and contains information about parameters and return types)
and Type.Wildcard (a type that encodes an unknown type during type inference).
Immutability check doesn't make sense for types that don't classify values,
so we just error out when encountering them.
Annotated types (T @annotation) and existential types (T forSome { ... })
are trivially unwrapped and processed recursively, ignoring possible annotations
and existential definitions inside curly braces.
Refined types (T with U { ... }) are immutable as long as any parent is immutable.
If, for such type, one of the parents is immutable (i.e. final), this means that there can't exist any class,
apart from such parent, whose instances conform to such type. Therefore, the refined type is either equivalent
to that parent (if the parent conforms to the type) or is uninhabited.
In both cases, the immutability check succeeds.
Singleton types (x.type, this.type and 42.type) are immutable if and only if
the type of the singleton is immutable. In order to check the underlying type, we call Type.widen
and then recur.
Finally, there's the happy path that consists in references to term or type definitions
and type applications thereof (T, List[T]). We have already described it above.
This analysis covers the full spectrum of Scala types, completing the implementation of isImmutable.
As we will see in later sections, isImmutable can run both at compile time and at runtime.
Differences from scala.reflect


It is enough to simply import scala.meta._
in order to get access to the entire scala.meta API. In our personal experience of using scala.meta
after years of writing scala.reflect metaprograms, this relatively minor aspect is particularly refreshing.


With the cake gone, users of scala.meta have to explicitly request capabilities depending on the needs of their metaprograms.
isImmutable actively uses semantic APIs, e.g. name resolution and member enumeration,
so it requests a mirror.


The syntactic overhead imposed by implicit parameters required by scala.meta

is significantly smaller than the ceremony required to carry around universes in scala.reflect.
Moreover, thanks to these implicit parameters we can easily classify scala.meta metaprograms
to see, for example, in what environments they can execute.

Scala.reflect uses about a dozen different entities to represent Scala programs.
The most important data structures are trees, symbols and types,
but there's a long tail of highly-specialized data structures like modifiers and flags that often overlap
with the main ones.

To the contrast, scala.meta represents Scala programs with: tokens that describe low-level syntactic details,

and abstract syntax trees that describe everything else. This design is remarkably minimalistic.
Therefore, idioms that required symbols and types in scala.reflect are now consistently using abstract syntax trees.

As a result, we're able to take apart the given type using a WYSIWYG notation provided by quasiquotes.
Moreover, instead of working with scala.reflect symbols, which represent an approximation of definitions
with their own dedicated set of operations, we take the given type and
go directly to the definition that it refers to using the Ref.defn operation.

Other differences are not so major, and are mostly the consequence
of scala.meta being optimized for ease of use. For example, one can notice that instead of
going through t.members like in scala.reflect, we use more specific helpers t.vars, t.vals and t.objects.
Also, we don't have to remember things like Symbol.typeSignatureIn to obtain precise type signatures of members.

From the discussion above, we can see that scala.meta is simpler than scala.reflect -
both from the point of view of high-level architecture and from the point of view of the language model.
While these simplifications can't directly influence the inherent complexity of metaprogramming Scala,
they lower the barrier to entry and make metaprograms more robust.
Compile-time execution

In the listing below, we define a trait Immutable[T] and a macro materialize[T]
that generates a value of type Immutable[T] if its type argument is immutable
and fails with an error otherwise.
Since our goal in this section is to highlight the most important aspects of scala.meta,
below we only provide a brief description of how the macro works.
We refer curious readers to
a dedicated write-up
for a full explanation of underlying mechanisms.
import scala.meta._

trait Immutable[T]

object Immutable {
  inline implicit def materialize[T]: Immutable[T] = meta {
    if (isImmutable(T)) q"null"
    else abort(T.origin, T + " is not immutable")
  }
}

Cornerstone to this example is the interaction between the newly introduced mechanisms of inline and meta
that capture the essence of macro expansion. The inline modifier on a method indicates to the compiler that
invocations of that method need to be replaced with the method body having formal parameters substituted with real arguments.
Meta blocks wrap metaprograms written against scala.meta. Having encountered such a block, the compiler
runs the corresponding metaprogram, wrapping itself in a Mirror and passing this mirror to the metaprogram.
After the metaprogram returns, its result replaces the original block.
As a result, in this new system, macro applications such as materialize[MyConfig] expand in two steps.
First, the invocation of the macro gets inlined and occurrences of T in the macro body are replaced with
their actual values, resulting in meta { if (isImmutable(t"MyConfig")) ... }.
Afterwards, the meta block gets evaluated, performing the immutability check.
If the immutability check succeeds, macro expansion succeeds by producing a trivial instance of Immutable.
Since Immutable is just a marker, and there are no methods to call on it, we return null
to avoid runtime performance costs of instantiating and then garbage collecting dummy instances of Immutable.
q"null" used here is a quasiquote, a notation to create abstract syntax trees from snippets of Scala code.
If the immutatibility check fails, macro expansion fails by calling abort
which will result in a compilation error. This error will be positioned at the callsite of the macro
and will provide a helpful error message. The possibility to produce domain-specific errors
has proven to be one of the strong points of macros.
Going back to the architecture of scala.meta described in "Architecture",
we recall that most scala.meta metaprograms and APIs require certain capabilities to be able to execute.
For example, isImmutable needs a mirror and quasiquotes need a dialect. This contract is successfully met
by meta blocks, because they provide a mirror, which is the most powerful capability in scala.meta.
Note how all this machinery happens transparently for the metaprogrammer thanks to Scala implicits.
Let's put the materialize macro to good use. Suppose we have a slow algorithm
compute that takes an input configuration and then runs for a while, probably launching new threads
running in parallel with the main program.
trait Config {
  def param1: Parameter
  ...
}

def compute(c: Config) = { ... }

Now, we want to make sure that, while the algorithm is running, noone can modify its configuration
from a different thread. In order to guarantee that, we require callers to provide an evidence
that the configuration is immutable.
def compute[C <: Config](c: C)(implicit ev: Immutable[C]) = { ... }

Thanks to the mechanism of implicits, whenever the programmer
doesn't provide the evidence manually (which is typical, because writing evidences by hand is very tedious),
the compiler will insert the call to the materialize macro that will validate the fact
that the static type of the configuration is immutable by running isImmutable.
If the immutability check fails, a compile-time error will be issued by the macro.
final case class MyConfig(param1: Parameter) extends Config
val myConfig: MyConfig = obtainConfig()
compute(myConfig)
// equivalent to: compute(myConfig)(Immutable.materialize[MyConfig])

The technique of programmatic generation of implicit arguments that we explored in this toy example is actually very useful in practice.
Its main applications lie in the area of generic programming,
and it powers several macros
that are practically important to the Scala community.
Runtime execution

Continuing the example from "Compile-time execution",
we suppose that compute no longer knows static types of its input configurations.
For example, let's imagine that configurations are now dynamically deserialized from binary payload.
In such situation, we can still make use of isImmutable as demonstrated in the listing below.
def compute(c: Config) = {
  import scala.meta._

  def isImmutable(t: Type)(implicit m: Mirror): Boolean = {
    // source code taken from "Compile-time execution"
    ...
  }

  implicit val m = Mirror(c.getClass.getClassLoader)
  val t = c.getType
  if (!isImmutable(t)) sys.error(t + " is not immutable")

  ...
}

In order to execute isImmutable at runtime, we need to obtain a mirror.
This can be done via calling a straightforward factory method provided by a runtime implementation of scala.meta.
There are minor differences between the functionality available during compilation and at runtime. On the one hand,
c.abort and some other macro APIs only make sense within a compiler environment.
On the other hand, runtime reflection relies on runtime classes that may be
unavailable during compilation. Nonetheless, most scala.meta APIs,
i.e. everything that is necessary to run isImmutable, are independent of an environment.
Now, when we know how to execute isImmutable at runtime, let's figure out how to obtain
a scala.meta type from a given config object in order to get the immutability check going.
When isImmutable was running inside the compiler, the entire environment was working
in terms of a pretty rich metaprogramming API implemented in compiler internals.
At runtime, introspection happens in terms of the JVM object model, so we need to adapt it
to the scala.meta way.
In order to do that, we create a scala.meta mirror based on a class loader,
which is an entity that encapsulates a JVM environment.
Afterwards, we use this mirror to convert the JVM type of the config to a scala.meta type
via a helper extension method provided by scala.meta.
Unfortunately for our use case, Scala and therefore scala.meta use a type system
that is much richer than the type system of the JVM.
The only types available for introspection on the JVM are primitives, generic arrays,
as well as non-generic classes and interfaces. As a result, the type extracted from the config object
is going to be imprecise, with the most important problem of lacking potential type arguments
that are erased at runtime. In the listing below, we illustrate this principle on a series of examples
in a REPL session.
scala> final class Metadata { ... }
defined class Metadata

scala> final case class MyConfig[T](payload: T, metadata: List[Metadata])
defined class MyConfig

scala> val c = new MyConfig(42, Nil)
c: MyConfig[Int] = MyConfig@784c0b8

scala> c.getClass
res0: Class[_ <: MyConfig[Int]] = class MyConfig

scala> import scala.meta._
import scala.meta._

scala> implicit val m = Mirror(c.getClass.getClassLoader)
mirror: Mirror = ...

scala> val t = c.getType
t: Type = MyConfig

scala> t.vals("payload").tpe
res2: Type = => T

scala> t.vals("metadata").tpe
res3: Type = List[Metadata]

As it can be seen above, even though the Scala type of the config is MyConfig[Int],
its JVM type is just MyConfig. Therefore, after we go back from a JVM type to the Scala type,
the resulting Scala type is also just MyConfig.
Consequently, the type of c.payload is calculated as => T, meaning "a getter that returns a T",
where T is the type parameter of MyConfig. As a result, the immutability check
for the config will fail, because in the general case T can be anything, including a mutable type.
On the bright side, once we get into the realm of Scala types, we can continue operating in terms of Scala types.
Therefore, the type of c.metadata, which doesn't depend on T, is actually precise, saying List[Metadata].
Type erasure of the JVM can be selectively overcome
with type tags from scala.reflect
by applying manual annotations on case-by-case basis. We haven't experimented with this mechanism in scala.meta yet,
so we leave discussing it to future work.
This wraps up the overview of scala.meta on the JVM. Even though there's an abstraction gap between
the theoretical language model of Scala and the actual environment of the JVM, scala.meta does its best
to bridge this gap. Information obtained from dynamic values may be incomplete because of type erasure mandated by the JVM,
but static program structure is available with full fidelity.
Other environments

Apart from the JVM, Scala also supports other platforms, namely, JavaScript and native code.
We haven't implemented scala.meta for them and doubt that it will ever be possible to do that,
which means that running isImmutable in those environments is most likely out of the question.