public
Created

I'm thinking about macros again

  • Download Gist
ramble.md
Markdown

Macros and closures are a bit alike. Here's I'll try to nail down one of their differences.

Closures

A closure is a function which carries with it a lexical environment. The returned block below, for example, carries around the lexical environment containing the variable $counter:

sub create-counter(Int $start-value) {
    my Int $counter = $start-value;
    return { $counter++ };
}

my $c = create-counter(5);
say $c();   # 5
say $c();   # 6

That's all there really is to closures. They're routines whose lexical lookups land outside of them.

This simple feature means that ordinarily stack-allocated variables like $counter have to be allocated on the heap instead. Such a variable "outlives" the block where they were born and must be reclaimed statically. So, closures imply a GC. Cf. the funarg problem.

Closures are very "natural" in that they don't really depend on an exception to any rule, or invent new rules. Rather, they simply take existing rules to their obvious conclusion: when you make a block a first-class object, and allow it to be passed around dynamically, while still making its lookups in its original statical environment — and not artificially disallowing (some) lookups outside of the function, like Java does — what you get is a closure.

So closures get their strength from being passed around dynamically, while their lookups still statically follow the original OUTER paths. The alternative is imaginable, but horrible:

sub closure-returner {
    my $a = "correct value";
    return { say $a };
}

my $c = closure-returner();
my $a = "apfelstrudel";
$c();

We could imagine a world where the closure in $c got its OUTER set to the mainline code, instead of (as per default), the closure-returner sub. That would print the value apfelstrudel, which was probably not at all what was intended. Worse, if we didn't declare an $a in the mainline, we'd get a runtime variable lookup error.

Ever gotten a runtime variable lookup error? No? Right, that's because languages tend to resolve variable lookups statically. Lexical scoping is sometimes referred to as "static scoping". Even things like &eval resolve variable lookups during the compile phase, and only then run the code. Variable lookups are done statically, and so closures are natural.

ASTs

Now think of the AST objects created by quasi-quotes.

macro create-counter(AST $start-value) {
    my Int $counter = $start-value.mumble.handwave.value;
    return quasi { { $counter++ } };
}

my $c = create-counter(5);
say $c();   # 5
say $c();   # 6

Yes, this is the same counter example as with closures, but adapted to work with macros. Instead of returning a closure, the macro returns an AST describing a closure. Not an actual closure, then, but more like the potential for one. ASTs are "undigested" code, and can turn into code anywhere we like in the program. In our example, it's the macro call at this line:

my $c = create-counter(5);

that conceptually gets converted into this:

my $c = { $counter++ };

Because that's what macro calls do. They halt the parser, and conceptually replace the entire macro call with the entire returned AST. (The process is shown at the source code level above for expository purposes; in actual fact it takes place at the AST level.)

But hold on! This closure is a regular closure, but it doesn't have the nice feature that we associate with good, well-behaved closures, namely that its OUTER chain goes through its native environment. Unless we take action, this closure will get the mainline as the environment, and that won't do, because then $counter won't even resolve, and the macro application will fail. (If we're lucky.)

So, somewhat paradoxically, we have to cheat here, and re-bind the OUTER of this new closure to point to the lexical environment of the macro create-counter. We didn't have to do any such magic rebinding with closures, because they're already defined in their proper environment. But macros, essentially a code copying mechanism, create closures all over the place, most of which will have to be post-processed to get the right OUTER.

In practice, this means that every AST object has to hold a reference to the environment where the AST was "taken". This was not obvious to me before I started thinking about ASTs in earnest; an AST feels like it should be "just a syntax tree", unconnected to any specific point of origin in the code. But it must hold an environment reference, or variable lookups won't work correctly.

What this means for building ASTs "from scratch" — i.e. not using the quasi keyword, but piling AST nodes on top of each other somehow — I don't know. Maybe it simply won't be possible. Maybe there isn't much of a use case for it, since we do have quasi (and &eval). It certainly isn't in the spec.

The COMPILING namespace

There's a way to explicitly say "please let the lookup find the mainline OUTER, not the macro OUTER" — we use the COMPILING namespace. Shown here:

macro two-lookups {
    my $a = "correct value";
    return quasi { say $a; say $COMPILING::a };
}

my $a = "apfelstrudel";
two-lookups(); # prints "correct value" and then "apfelstrudel"

There's no corresponding mechanism for closures. Closures are too clean for that kind of thing. Closures only have one OUTER; macros essentially have two, but the less hygienic one is longer to write for the end user.

(moritz++ points out that closures have CALLER, too. Which is true, but CALLER is quite orthogonal to everything discussed here. Macros are weird in that the macro itself has a CALLER (since it's a subroutine), but the contents of the quasi don't necessarily have one (since it's inlined into the mainline).)

What do we have to do to support $COMPILING::a lookups? The answer isn't pleasant, but at least lies within the realm of the possible.

When we apply the AST, that is, stitch it into the code right before continuing with our regular parsing, we traverse the AST and find all occurrences of $COMPILING::<some var>. We then replace each such occurrence with some magical internal code that either makes the variable lookup at runtime from the correct mainline scope, or contains a precomputed reference to the variable in the mainline scope. (It amounts to the same thing. The latter, if it's possible, is a constant-time lookup, rather than having to iterate up an OUTER chain.)

Basically the bit of AST that gets replaced in has no representation in pure Perl 6 code. It says "look up the variable $a (or whatever) from this given lexical scope, not from the current one".

There's a chance this lookup may fail. But if it fails, we can make it fail during macro apply time, that is during parse time, that is during compilation. So it will not be much different from a compile-time "undeclared variable error". (You're still likely to be more confused than usual, of course, since macros involve more scopes in different parts of the program.) Run-time variable lookup failures still won't occur.

We could turn the whole thing on its head and rewrite ordinary variable lookups in this way, while doing nothing with COMPILING-namespace variables. This would have one advantage: we wouldn't have to fiddle with the closure's OUTER pointer. (In fact, we probably wouldn't even need a block surrounding the stitched-in AST code, since the main reason for this block is to modify its OUTER pointer.)

On the flip side, each normal, hygienic variable lookup will then have to be rewritten instead. And that feels odd, pessimising the normal use case and optimizing for the abnormal, COMPILING use case.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.