public
Created

D4, deconstructed

  • Download Gist
plan.md
Markdown

Heal the World

Let's consider a class declaration inside a quasi block:

quasi {
    class C {}
}

We definitely don't want it to be visible before the quasi block has been applied somewhere. (And we probably want to complain if it's applied twice, since that would amount to a class redeclaration.)

So whatever we do to "register" class C with the main program, we have to delay until macro apply time. Fine. But now let's consider the following:

quasi {
   class C {
       has C $.foo;
   }
}

In regular code, the above is completely fine. It should be completely fine within a macro, too. But if we defer all type registration until macro apply time, the parser won't know what C is when it sees the attribute declaration. And it will fail. Not what we want either.

This leads to the following two-step procedure:

  1. Parsing a quasi block: types and names and things are registered to a sandboxed symbol table. Action methods and meta-objects are invoked as usual, but everything happens within the sealed bubble of the quasi.

  2. Applying the macro: the sandboxed symbol table is unified with the mainline symbol table. Various other fixups are made to compensate for not running the actions at this point.

As it happens, the last refactor of Rakudo (nom) gave us an object called SymbolTable, through which all of transactions such as type registration are done. (We often refer to SymbolTable as World, though the rename hasn't happened in code yet.) It's accessible from grammar and actions through the contextual variable $*ST. It's increasingly becoming clear that parsing involves not only grammar and actions, as we've commonly believed, but

  • Grammar (for syntax)
  • Actions (imperative aspects)
  • World/SymbolTable (declarative aspects)

The essense of D4 is that quasi blocks need their own little World, that's later incorporated into the global World.

Things get tricky

Now let's combine quasi-quoting placeholders and type unification:

quasi {
    role R[ {{{$type}}} ] {}
}

This example puts the sandboxed World in a position that the global World never has to face: the exact nature of the type isn't known at parse-time! (Note that this example is conceptually very different from role R[$type] {}, which is just a normal signature. With the triple quotes around $type, we're telling the compiler "don't worry, we'll put in the rest of the AST here in time.)

Arguably, this is the essence of the functionality that macros provide. They delay some of the decisions until macro expansion time (which is still during parse time, but possibly "much later").

So now we have three places where things happen:

  1. Parsing a quasi block: we can still do the bare minimum of registering types and things, but it won't be the "real deal" because there are potentially placeholders inside the type longnames.

  2. Executing the macro: the missing bits for the quasi are now known, and so the quasi "collapses" into normal code without triple-brace placeholders in it. We can now run the type registrations that in normal code runs ASAP at parse time.

  3. Applying the macro: same unification as before with the global World. No missing bits remain at this point, so this step isn't any more difficult.

If you've been following along thus far, you might object that the delayed type registration is too high a price to pay, and that the types should instead be registered — placeholders and all — at parse time, and then patched up afterwards with the correct information at macro execution time. But that won't work, because of things like this:

quasi {
    role R[ {{{$foo}}} ] {};
    class C does R[ {{{$foo}}} ] {};
}

We don't have enough information to build C until we know exactly what R is. And we don't know that until at macro execution time, when the placeholders have been filled.

We're not done, it gets worse

Quasi-quotes and closures share a bit of semantics. With closures, the same block can be "cloned" many times and passed around to other lexical contexts while retaining ties to its original context. Quasi-quotes are also "cloned" zero or more times into distinct ASTs — and here the differences begin: an AST doesn't provide any ties back to its original context.

In other words, we want these to work the same:

sub   foo { my $a = 42; return { $a } }; foo()()
macro bar { my $a = 42; quasi  { $a } }; bar()

But in the former case, we have code-block-with-ties, and in the latter case we have a location-agnostic AST. Naively, the code expanded at bar() won't "find" $a, because it's in a lexical scope that isn't visible from where bar() is.

So we do the following:

  1. Executing the macro: "Snapshot" the lexical environment around the quasi block and hang it off the AST for later reference.

  2. Applying the macro: Traverse the AST and "fix up" every lexical lookup to explicitly look within the stored lexical environment.

So the ASTs generated by quasi blocks aren't closures, but they need to stash a closure away somewhere to make lookups work out right.

This is the essence of D3, hygienic macros. But it's also inexorably intertwined with D4-ish things, because the serialization context needs to preserve the closure until run time.

Summary and conclusion

In general, a number of things that I thought could wait until D4 can probably not wait. It can probably be done in small increments and as the need arises... but large bits of D4 turns out to actually be a natural part of D1, D2, and D3.

In retrospect, this isn't so surprising. The SymbolTable/World now is a central docking-point in the design of Rakudo, and lots of things need to go through it. Especially as it's also our SerializationContext. So it can't be delayed until the end, it needs to be integrated continually.

The good news, however, is that this is all very feasible in current Rakudo, since much design thought has already gone into giving the SymbolTable and serialization bits clean, well-defined interfaces to work with. Most of the distributed D4 work will consist of how to co-opt and extend the bits that are already there.

If anything, since I started discussing these bits with jnthn in August, they've only become more tangible and well-defined. It's a bit of a pleasant coincidence that my interest for macros flares up just as nom (with its much clearer interfaces for this kind of work) reaches maturity.

I don't imagine that the above work will be trivial, or that it will be free of surprises. But the fact that it's possible to see this far into the forthcoming work feels very promising.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.