This transformation technique is used in @effectful/js and I use the same for other compilers (not based on babel though).
As the first step, this splits AST into a stream of tokens (simle ECMAScript Iterator), and as the last step, it builds the node back from the argument Iterator. There are a few functions for transforming the streams in the middle (stream transducers). They take an Iterable object and return some other (transformed) Iteratable. In most cases, the transformer function (transducer) is a Generator.
This way, transformations composition is just a plain function's composition. E.g. node => consume(transform1(transform2(produce(node))))
or with any pipe function/operator. And it is just a single pass. All transforms run simultaneously passing the transformed tokens to the next one.
That is the main goal of applying generators.
Mutating AST and traversing at the same time may be painful sometimes. To reduce difficulties someone may split passes and rebuild the whole tree on each pass. But, even not considering performance, this won't work at some transforms which must run in parallel.
E.g. real usage example, where I utilized this approach, but for another language. There is a function overloading by type of arguments. The result type of a function call depends on a type of arguments. And I want to handle the overloading in a separate pass. If passes run separately, it is impossible for expressions like (f(g(a))
). The checker needs to resolve g
overload before proceeding to f
overloads. And there are other passes to implement other aspects of a type checker. It is not a problem if checking passes are in generators. All the aspects implementations are separate (almost)independent passes.
Regarding performance, for generators, no intermediate ASTs are constructed. They just pass lazy streams. The stream may even be infinite.
Mutations isn't strictly Visitor's problem, for example Visitor's interface in TypeScript (unlike babel) copies AST between passes, but generators still have more benefits.
Babel merges visitors into a single traversal but sometimes (often) to decide about transform it may need to look further. And maybe far further. E.g. var
variable declaration may be even after even some ancestor node. So we have to run nested traversal to query information needed for the transform. The data must be cached to avoid traversals cascading.
This is very error-prone. It requires complex cache management. And it is simple to forget to cache some degrading plugin performance.
With generators pipeline, everything is controlled in a simple way. There is no way to run nested traversal. If transform pass needs to query some info I can simply split it into two passes (query and actual transform) and add Array.from
between them. Thus explicitly ensuring the query pass ends before transform pass.
Because it is just functions and streams they are easy to combine filter etc.
There is a small library to simplify working with hierarchical tokens structure. Here are some examples:
- skipping all children nodes
for(const i of s.sub()) {}
- copying children nodes unchanged (it is not like
path.skip
, next transform will still receive them) -yield* s.sub()
- handling nested structure is just a recursive call same traversal generator with
yield*
- no needs for classes to store state or node's cache for a local state, the state is just local variables of a generator function
- no needs for stop traversal protocol in Visitors as they just
break
/continue
with a nested loop and possibly with labels.
They are needed to pass collected information into next nodes.
Here is a bigger example converting expressions like (() => { return X)()
into X
.
const etaReduce = Kit.pipe(
Match.run("=(() => { return $$; })()"),
function(s) {
s = Kit.auto(s)
return walk()
function* walk() {
for(const i of s.sub()) {
if (i.type === Match.Root) {
if (i.enter) {
for(const j of s.sub()) {
if (j.enter && j.type === Match.Placeholder)
yield* Kit.reposeOn(walk(),i.pos)
}
}
} else
yield i
}
}
})
Here pipe
is just a function composition.
Match.pipe
matches input tokends and injects Match.Root
/Match.Placeholder
tokens into output.
Kit.auto
- adds helper fuction like sub
(returns all children iterator) into input Iterator. Kit.reposeOne
changes position of a token. Position is a field name in AST node.
const map = fun => function* map(inp) {
for(const i of inp)
yield fun(i)
}
const filter = pred => function* map(inp) {
for(const i of inp)
if (pred(i))
yield i
}
// even stateful
const take = num => async function*(input) {
let count = 0
for await(const i of input) {
yield i
if (++count === num)
return
}
}
- @effectful/transducers - the framework (however I plan a big API rewrite). There is a
src\samples
folder with a few samples. - @effectful/js - the project using the framework, all the transforms are in
core
package. - Simpler Transducers for JavaScript - explaining the techniques with more details
- Decouple Business Logic using Async Generators and Async Generators as an alternative to State Management - the same technique but for UI
- Lazy v. Yield: Incremental, Linear Pretty-printing - the original article where I learned this technique