Skip to content

Instantly share code, notes, and snippets.

@awto
Last active May 26, 2019 14:48
Show Gist options
  • Save awto/c170564cc038a812bf239c59d4215f3f to your computer and use it in GitHub Desktop.
Save awto/c170564cc038a812bf239c59d4215f3f to your computer and use it in GitHub Desktop.
Using Generators instead of Visitors simplifies compiler's development

AST transform using Generators

This transformation technique is used in @effectful/js and I use the same for other compilers (not based on babel though).

As the first step, this splits AST into a stream of tokens (simle ECMAScript Iterator), and as the last step, it builds the node back from the argument Iterator. There are a few functions for transforming the streams in the middle (stream transducers). They take an Iterable object and return some other (transformed) Iteratable. In most cases, the transformer function (transducer) is a Generator.

This way, transformations composition is just a plain function's composition. E.g. node => consume(transform1(transform2(produce(node)))) or with any pipe function/operator. And it is just a single pass. All transforms run simultaneously passing the transformed tokens to the next one.

Motivation

No AST mutations and intermediate values

That is the main goal of applying generators.

Mutating AST and traversing at the same time may be painful sometimes. To reduce difficulties someone may split passes and rebuild the whole tree on each pass. But, even not considering performance, this won't work at some transforms which must run in parallel.

E.g. real usage example, where I utilized this approach, but for another language. There is a function overloading by type of arguments. The result type of a function call depends on a type of arguments. And I want to handle the overloading in a separate pass. If passes run separately, it is impossible for expressions like (f(g(a))). The checker needs to resolve g overload before proceeding to f overloads. And there are other passes to implement other aspects of a type checker. It is not a problem if checking passes are in generators. All the aspects implementations are separate (almost)independent passes.

Regarding performance, for generators, no intermediate ASTs are constructed. They just pass lazy streams. The stream may even be infinite.

Mutations isn't strictly Visitor's problem, for example Visitor's interface in TypeScript (unlike babel) copies AST between passes, but generators still have more benefits.

Avoiding nested traversal

Babel merges visitors into a single traversal but sometimes (often) to decide about transform it may need to look further. And maybe far further. E.g. var variable declaration may be even after even some ancestor node. So we have to run nested traversal to query information needed for the transform. The data must be cached to avoid traversals cascading.

This is very error-prone. It requires complex cache management. And it is simple to forget to cache some degrading plugin performance.

With generators pipeline, everything is controlled in a simple way. There is no way to run nested traversal. If transform pass needs to query some info I can simply split it into two passes (query and actual transform) and add Array.from between them. Thus explicitly ensuring the query pass ends before transform pass.

Everything is just much simpler

Because it is just functions and streams they are easy to combine filter etc.

There is a small library to simplify working with hierarchical tokens structure. Here are some examples:

  • skipping all children nodes for(const i of s.sub()) {}
  • copying children nodes unchanged (it is not like path.skip, next transform will still receive them) - yield* s.sub()
  • handling nested structure is just a recursive call same traversal generator with yield*
  • no needs for classes to store state or node's cache for a local state, the state is just local variables of a generator function
  • no needs for stop traversal protocol in Visitors as they just break/continue with a nested loop and possibly with labels.

Easy to add temporal, control nodes

They are needed to pass collected information into next nodes. Here is a bigger example converting expressions like (() => { return X)() into X.

const etaReduce = Kit.pipe(
  Match.run("=(() => { return $$; })()"),
  function(s) {
    s = Kit.auto(s)
    return walk()
    function* walk() {
      for(const i of s.sub()) {
        if (i.type === Match.Root) {
          if (i.enter) {
            for(const j of s.sub()) {
              if (j.enter && j.type === Match.Placeholder)
                yield* Kit.reposeOn(walk(),i.pos)
            }
          }
        } else
          yield i
      }
    }
  })

Here pipe is just a function composition.

Match.pipe matches input tokends and injects Match.Root/Match.Placeholder tokens into output.

Kit.auto - adds helper fuction like sub (returns all children iterator) into input Iterator. Kit.reposeOne changes position of a token. Position is a field name in AST node.

Easy to define combinators, if needed

const map = fun => function* map(inp) {
  for(const i of inp) 
     yield fun(i)
 }

const filter = pred => function* map(inp) {
  for(const i of inp)
     if (pred(i)) 
        yield i
 }

// even stateful

const take = num => async function*(input) {
  let count = 0
  for await(const i of input) {
    yield i
    if (++count === num)
      return
  }
}

More information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment