Skip to content

Instantly share code, notes, and snippets.

@masak
Created February 17, 2012 13:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save masak/1853560 to your computer and use it in GitHub Desktop.
Save masak/1853560 to your computer and use it in GitHub Desktop.
The lifetime of a macro invocation

Pads

Let's talk about pads.

sub counter(Int $start) {
    my Int $value = $start;
    return { $value++ };
}

my $c1 = counter(5);
say $c1();            # 5
say $c2();            # 6

my $c2 = counter(42);
say $c2();            # 42
say $c1();            # 7
say $c2();            # 43

A pad is not much more than a hash table mapping variable names to values. It's implicitly defined by the variable declarations in a block or module or file. There's a pad inside the counter subroutine, and that's what's holding the $value variable (and its value) for us. The pad gets "attached" to the returned closure (in the sense of the latter referencing the former), and so $value, which is referenced by the pad, never gets GC'd.

But wait! By the example, there seem to be two $value variables in circulation. How did that happen? Aren't pads intimately tied to the block or module or file? Well, yes. Then how can there be two different pads, each containing its own $value, in circulation?

You probably know the answer already. It's not the subroutine as such that has a pad, it's the invocation of the subroutine that does. Put differently, a new pad attaches to the sub at the point of call at runtime, not once-and-for-all at compile time. It's easy to fall into the type/instance homonym trap here. Maybe it's correct to say that "lexical scope" is the type here, and "pad" is the instance.

The idea of incarnating a pad for each routine invocation was once disputed in programming language design. Without it, if we just keep one static pad for each routine, it's much easier to represent the state of the program internally. It takes less memory, less code to make the runtime, etc. Oh, and recursion won't work, because your fibonacci function won't get a fresh pad with each invocation. Historically, this was an argument against recursion, because it required such a wacky model with incarnating pads and keeping stackframes around. In the end, though, recursion (and other useful things, such as closures) won out and many-pads-per-lexical-scope became the predominant model.

But wait! So if a pad isn't created until at routine/block entry, how do you explain... this?

sub foo {
    my $value = "HAHAHA DISREGARD THAT";

    our sub bar {
        say $value;
    }
}

our &bar;
bar();    # prints "Any()\n"

So, say $value somehow comes up with the standard undefined value Any, rather than the more obvious "HAHAHA DISREGARD THAT". Well, that's understandable by the pad-at-block-entry hypothesis — here, we managed to call bar without calling foo first, so the assignment hasn't actually run. But... we have to look into some pad to get $value, don't we? And a pad hasn't been created for us yet, so where does the Any come from? Shouldn't the variable lookup for $value just fail to find a pad, fall into outer space, and cause a horrible Null PMC access or something?

That's actually what it used to do in Rakudo, before Rakudo got a sensible pad model without egregious holes in it. The concept that turned out to be missing was that of a static pad, a pad that's added already during compilation. There's exactly one static pad per lexical scope. When an incarnated runtime pad hasn't been added yet for some lexical scope, lookup simply falls back to the static pad. That way, failure is never as bad as a Null PMC access; the worst thing that can happen is that you end up with an Any.

Perl 5 has a slight allergy to this situation, which happens every time you have a named subroutine inside another named subroutine:

$ perl -Mstrict -wE 'sub foo { my $value = "HAHAHA DISREGARD THAT"; sub bar { say $value } }; bar'
Variable "$value" will not stay shared at -e line 1.
Use of uninitialized value $value in say at -e line 1.

"Will not stay shared" is simply warning you that whenever you enter foo, bar will get re-bound to the incarnated pad of that foo invocation. So don't depend on always having the same view of $value.

So in our original counter example, there were actually three pads just for the counter subroutine. There were the two incarnated pads for the $c1 and $c2 invocations, and there was the static pad, which we never saw because there's no way to interact with the insides of counter without calling it.

Ordinary assignments happen at runtime, but if we can make them happen at compile time, we can actually observe the static pad doing work.

Making the assignment happen at BEGIN (parse) time:

sub foo {
    BEGIN my $value = "HAHAHA DISREGARD THAT";
    our sub bar {
        say $value;
    }
}

our &bar;
bar();    # prints "HAHAHA DISREGARD THAT\n"

Or, equivalently, using a constant declaration to make the assignment also happen at BEGIN time:

sub foo {
    constant $value = "HAHAHA DISREGARD THAT";
    our sub bar {
        say $value;
    }
}

our &bar;
bar();    # also prints "HAHAHA DISREGARD THAT\n"

Macros

Let's talk about macros. (You knew it was only a matter of time.)

macro counter(Perl6::AST $start) {
    my Int $value = eval $start; # hand-wavey, unspec'd
    return quasi { $value++ };
}

my $c1 = counter(5);
say $c1();            # 5
say $c2();            # 6

my $c2 = counter(42);
say $c2();            # 42
say $c1();            # 7
say $c2();            # 43

This is the same code as the one we started with, with four small modifications:

  1. It's a macro and not a sub.
  2. Macros take ASTs as arguments, so $start will now contain a Perl6::AST value.
  3. Because we're interested in the actual value and not the AST, we have to de-AST the value, which we do using (unspec'd semantics of) eval.
  4. Macros return ASTs, so we build one using the quasi keyword and a block.

But the rest is the same. There are still two incarnated pads for counter (and one static pad which we never see). Each pad still gets referenced by the block in the quasi, because it has to have a place to go to store and retrieve $value. In a very real sense, quasi blocks are closures, and we expect them to act accordingly. What's tricky about this is that the quasi blocks spend a disproportionate amount of their time as ASTs.

Since we're about to run across yet another type/instance homonym trap, let's proceed slowly and with caution.

There are three distinct "time points" of interest:

  • A: The macro and the quasi are parsed. No code is run.
  • B: The macro call site is parsed. Immediately as the macro call has been parsed, we momentarily enter run mode and macro runs and returns a Perl6::AST. Back in parse mode, this AST is stiched into the call site in lieu of the macro call.
  • C: The code so inserted is run.

All these are clearly separated in time. C, depending on actual code paths, needn't even happen. (Of course, if you don't call your macro, B needn't happen either.)

The time point A is in compile mode, and thus can only deal only in static pads. Nevertheless, this is where the quasi's AST gets created, and it needs to reference some pad. The situation is quite similar to a closure in a yet un-run block, but with an AST instead of the closure.

B is where the action is. We kick into run mode, and enter the macro block. But pay careful attention to the quasi object. The value sitting there was generated during A, and references the static pad. This is bad, because that means it points to the wrong $value: the static one, which will never be assigned to. What we really want is an AST that references the newly incarnated pad of the current macro.

Conclusion: there's a static quasi object that's generated during A, and the actual Perl6::AST objects returned from a macro are "incarnated" from this static object, having it point to the pad of the macro invocation in the process, so that each macro call gets its very own $value, just like each subroutine invocation did.

Some kind of cloning/fixup must happen at macro block entry. It's almost as if at A we only have a "static" version of the AST, pointing to static pads, but at B we want the incarnated version. One per macro call.

As my previous gist outlines, when we do the stiching, we also have to special-case either lexical lookups that expect to end up in the macro body, or lexical lookups that expect to end up in the mainline code. That's fine. The important part, and the point of this gist, is that "the macro body" is a dynamic/runtime thing, not a static thing. And that's why we need cloning/fixups at B.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment