Let's talk about pads.
sub counter(Int $start) {
my Int $value = $start;
return { $value++ };
}
my $c1 = counter(5);
say $c1(); # 5
say $c2(); # 6
my $c2 = counter(42);
say $c2(); # 42
say $c1(); # 7
say $c2(); # 43
A pad is not much more than a hash table mapping variable names to values. It's implicitly defined by the variable declarations in a block or module or file. There's a pad inside the counter
subroutine, and that's what's holding the $value
variable (and its value) for us. The pad gets "attached" to the returned closure (in the sense of the latter referencing the former), and so $value
, which is referenced by the pad, never gets GC'd.
But wait! By the example, there seem to be two $value
variables in circulation. How did that happen? Aren't pads intimately tied to the block or module or file? Well, yes. Then how can there be two different pads, each containing its own $value
, in circulation?
You probably know the answer already. It's not the subroutine as such that has a pad, it's the invocation of the subroutine that does. Put differently, a new pad attaches to the sub at the point of call at runtime, not once-and-for-all at compile time. It's easy to fall into the type/instance homonym trap here. Maybe it's correct to say that "lexical scope" is the type here, and "pad" is the instance.
The idea of incarnating a pad for each routine invocation was once disputed in programming language design. Without it, if we just keep one static pad for each routine, it's much easier to represent the state of the program internally. It takes less memory, less code to make the runtime, etc. Oh, and recursion won't work, because your fibonacci function won't get a fresh pad with each invocation. Historically, this was an argument against recursion, because it required such a wacky model with incarnating pads and keeping stackframes around. In the end, though, recursion (and other useful things, such as closures) won out and many-pads-per-lexical-scope became the predominant model.
But wait! So if a pad isn't created until at routine/block entry, how do you explain... this?
sub foo {
my $value = "HAHAHA DISREGARD THAT";
our sub bar {
say $value;
}
}
our &bar;
bar(); # prints "Any()\n"
So, say $value
somehow comes up with the standard undefined value Any
, rather than the more obvious "HAHAHA DISREGARD THAT"
. Well, that's understandable by the pad-at-block-entry hypothesis — here, we managed to call bar
without calling foo
first, so the assignment hasn't actually run. But... we have to look into some pad to get $value
, don't we? And a pad hasn't been created for us yet, so where does the Any
come from? Shouldn't the variable lookup for $value
just fail to find a pad, fall into outer space, and cause a horrible Null PMC access
or something?
That's actually what it used to do in Rakudo, before Rakudo got a sensible pad model without egregious holes in it. The concept that turned out to be missing was that of a static pad, a pad that's added already during compilation. There's exactly one static pad per lexical scope. When an incarnated runtime pad hasn't been added yet for some lexical scope, lookup simply falls back to the static pad. That way, failure is never as bad as a Null PMC access
; the worst thing that can happen is that you end up with an Any
.
Perl 5 has a slight allergy to this situation, which happens every time you have a named subroutine inside another named subroutine:
$ perl -Mstrict -wE 'sub foo { my $value = "HAHAHA DISREGARD THAT"; sub bar { say $value } }; bar'
Variable "$value" will not stay shared at -e line 1.
Use of uninitialized value $value in say at -e line 1.
"Will not stay shared" is simply warning you that whenever you enter foo
, bar
will get re-bound to the incarnated pad of that foo
invocation. So don't depend on always having the same view of $value
.
So in our original counter
example, there were actually three pads just for the counter
subroutine. There were the two incarnated pads for the $c1
and $c2
invocations, and there was the static pad, which we never saw because there's no way to interact with the insides of counter
without calling it.
Ordinary assignments happen at runtime, but if we can make them happen at compile time, we can actually observe the static pad doing work.
Making the assignment happen at BEGIN
(parse) time:
sub foo {
BEGIN my $value = "HAHAHA DISREGARD THAT";
our sub bar {
say $value;
}
}
our &bar;
bar(); # prints "HAHAHA DISREGARD THAT\n"
Or, equivalently, using a constant
declaration to make the assignment also happen at BEGIN
time:
sub foo {
constant $value = "HAHAHA DISREGARD THAT";
our sub bar {
say $value;
}
}
our &bar;
bar(); # also prints "HAHAHA DISREGARD THAT\n"
Let's talk about macros. (You knew it was only a matter of time.)
macro counter(Perl6::AST $start) {
my Int $value = eval $start; # hand-wavey, unspec'd
return quasi { $value++ };
}
my $c1 = counter(5);
say $c1(); # 5
say $c2(); # 6
my $c2 = counter(42);
say $c2(); # 42
say $c1(); # 7
say $c2(); # 43
This is the same code as the one we started with, with four small modifications:
- It's a
macro
and not asub
. - Macros take ASTs as arguments, so
$start
will now contain aPerl6::AST
value. - Because we're interested in the actual value and not the AST, we have to de-AST the value, which we do using (unspec'd semantics of)
eval
. - Macros return ASTs, so we build one using the
quasi
keyword and a block.
But the rest is the same. There are still two incarnated pads for counter
(and one static pad which we never see). Each pad still gets referenced by the block in the quasi
, because it has to have a place to go to store and retrieve $value
. In a very real sense, quasi blocks are closures, and we expect them to act accordingly. What's tricky about this is that the quasi blocks spend a disproportionate amount of their time as ASTs.
Since we're about to run across yet another type/instance homonym trap, let's proceed slowly and with caution.
There are three distinct "time points" of interest:
- A: The macro and the quasi are parsed. No code is run.
- B: The macro call site is parsed. Immediately as the macro call has been parsed, we momentarily enter run mode and macro runs and returns a Perl6::AST. Back in parse mode, this AST is stiched into the call site in lieu of the macro call.
- C: The code so inserted is run.
All these are clearly separated in time. C, depending on actual code paths, needn't even happen. (Of course, if you don't call your macro, B needn't happen either.)
The time point A is in compile mode, and thus can only deal only in static pads. Nevertheless, this is where the quasi
's AST gets created, and it needs to reference some pad. The situation is quite similar to a closure in a yet un-run block, but with an AST instead of the closure.
B is where the action is. We kick into run mode, and enter the macro
block. But pay careful attention to the quasi
object. The value sitting there was generated during A, and references the static pad. This is bad, because that means it points to the wrong $value
: the static one, which will never be assigned to. What we really want is an AST that references the newly incarnated pad of the current macro.
Conclusion: there's a static quasi
object that's generated during A, and the actual Perl6::AST
objects returned from a macro are "incarnated" from this static object, having it point to the pad of the macro invocation in the process, so that each macro call gets its very own $value
, just like each subroutine invocation did.
Some kind of cloning/fixup must happen at macro block entry. It's almost as if at A we only have a "static" version of the AST, pointing to static pads, but at B we want the incarnated version. One per macro call.
As my previous gist outlines, when we do the stiching, we also have to special-case either lexical lookups that expect to end up in the macro
body, or lexical lookups that expect to end up in the mainline code. That's fine. The important part, and the point of this gist, is that "the macro
body" is a dynamic/runtime thing, not a static thing. And that's why we need cloning/fixups at B.