The normal type. They look like subs. Just as with subs, some special names declare ops, in which case you can invoke them by using them as ops.
However, two things distinguish these a lot from subs. First, they accept and return Qtrees. Second, they are called at parse time; it is the parser that (by lookup) realizes that what it just parsed was a macro, passes it the arguments as Qtrees, and then expects back a Qtree that it can replace the call with.
sub compute() { say "COMPUTED!"; return 42 }
sub ex1($value) {
say "before";
say $value;
say "after";
}
ex1 compute(); # COMPUTED!\nbefore\n42\nafter\n
# compute() runs; Str result is passed to ex1()
macro ex2($qtree) {
return quasi {
say "before";
say {{{ $qtree }}}; # X
say "after";
};
}
ex2 compute(); # before\nCOMPUTED!\n42\nafter\n
# (arg is code, passed as Qtree)
# compute() isn't actually run until line X runs
Besides this "pure" role of transforming subcall Qtrees (or their
operator brethren) into Qtrees, macros can also carry out side effects
(prompt
/say
etc, or changing globals), or carry state from one macro
call to another. In general, a macro has two regions:
-
Directly inside the macro. Runs at macro invocation time, which is a part of parsing the program. From here, we can directly influence the compilation process itself.
-
Inside one or more
quasi
blocks in the macro. Runs as part of the mainline code, since the quasi block gets physically spliced into the normal code.
When is a macro called? A bit like BEGIN
time, it is called as soon as
everything is in place. The macro symbol itself needs to be parsed, and also
all its sub-Qtrees that act as arguments or operands. This means different
things for listops, infixes, prefixes, and postfixes.
mac $x, $y, $z ;
^--- here is where we can call the macro
$r ¤ $l ;
^
¤$x ;
^
$x¤ ;
^
my-if $COND, { ... } ;
^
But a macro may also declare itself with the is parsed
trait, and effectively
gains control the parse process earlier than it otherwise would. Instead of
"as soon all the sub-Qtrees are in place", the macro now gets called as soon as
the macro symbol has been parsed.
mac $x, $y, $z ;
^--- with `is parsed`, this is where the macro gets invoked
$l ¤ $r ;
^
¤ $x;
^
$x¤ ;
^
my-if $COND, { ... } ;
^
In the case of is parsed
, the macro is assuming the responsibility to parse
the rest of its arguments/operands, and hand control back to the Perl 6 parser
in a way that's consistent with the macro's grammatical category. It also gets
the responsibility to generate Qtrees for things that would otherwise have come
in as macro parameters.
The is parsed
trait expects a regex supplying a parsing strategy. Inside of
the regex, we have full access to things defined in Perl6::Grammar
. As an
example, the my-if
above almost looks like a normal built-in control
structure, save for that annoying comma. With is parsed
we can tell the
parser that we don't care for the comma.
class Q::MyIf {
has Q::Expr $.expr;
has Q::Block $.block;
}
macro my-if(Q::Expr, $expr, Q::Block $block)
is parsed(/<EXPR> <.ws> <pblock>/) {
return Q::MyIf.new(:$expr, :$block);
}
It's not clear that we can make the above work, or if it ends up being robust
and usable. But it seems to be a possibility. So let's hereby bring is parsed
back from the land of the deprecated into the shadow realm of the conjectural.
We're not bringing back is reparsed
, though. Besides breaking the rule of
one-pass parsing, it's not clear that this form has a good use case driving it.
These macros can exhibit unhygiene — the absence of hygiene — and declare or modify symbols into the mainline program. Such a macro will screw up simplistic tools such as syntax checkers, highlighters, linters, refactoring tools, IDEs, and Java programmers. Take that, simplistic tools!
Unhygiene may also void the warranty of your computer, your pets, your family, your friends, and nearby celestial bodies. Nevertheless, it can be very helpful sometimes, and I expect people (in proud Perl tradition) to manage to use the extra rope-lengths for shooting themselves in the appropriate amount of feet.
(Friendly reminder: the term "unhygienic macros" still refers to AST-based macros and does not mean "textual macros". As an analogy, whereas unhygiene is like a very lethal, armed nuclear warhead that probably oughtn't be in your possession in the first place, textual macros are like two subcritical piles of hot plutonium that when brought together will definitely go boom, and why are you standing so close to the hot piles anyway and sorry, you're dead now.)
It's possible to define multi macros. The biggest difference to multi subs is that the compiler will always pass Qtree arguments to the macro. Binding happens on compile-time Qtrees, not on runtime values.
Absent an is parsed
, a multi macro will parse according to the normal Perl 6
grammar, and then match the resulting Qtrees against the macro signature. If an
is parsed
is present, a failed parse is tantamount to backtracking out of an
alternation, and a successful parse means that the multi macro is considered a
candidate. If all multi candidates fail then that's a dispatch error as usual.
If two or more tie, then normal tiebreaking rules apply. If at all possible, we
should let multi is parsed
macros participate in LTM.
Conceptually, an is parsed
on the multi macro counts as having an additional
constraint placed on the signature. If two multis have the same signature but
one of them has an is parsed
trait, then the trait-adorned one will count as
narrower.
We may end up allowing is parsed
on ordinary subroutines, too. The is parsed
axis is orthogonal to the sub/macro axis, so it's a possibility. But
maybe it's less confusing not to allow that.
Syntax macros are defined at statement level and essentially introduce a new
type of statement. Let's say we wanted a new pretending
keyword, a block form
of temp
:
class Q::Statement::Pretending is Q {
has Q::Expr $.expr;
has Q::Block $.block;
}
macro statement_control:<pretending>(Q::Expr $expr, Q::Block $block)
is parsed(rule { <sym> <EXPR> <pblock> }) {
# ...code to check that $expr is of the form `{{{$var}}} = {{{$value}}}` elided...
return quasi {
temp {{{$var}}} = {{{$value}}};
{{{$block}}}; # handling of >0 params elided
}
}
Notice how we're using the same mechanism as with op macros, only within the
statement_control
grammatical category. The desire for syntax macros comes from
Scheme and Racket's define-syntax
and
syntax-rules
facilities. I set out to add support for those things in our macro system, and
happily found that something like the above seems to be enough. We need more
concrete examples to verify this, though.
Especially considering the discussion above with multi macros, which would allow us to dispatch the same keyword to various different syntax variants — and also would nicely support third-party syntax extension in the same way ordinary multis do — this seems to be a winner. We may still market them as "syntax macros" if we want people to pay special interest to them.
It is an open question exactly which grammatical categories we will be able to
hook into like this. But statement_control
seems like a straightforward one.
The elided parts of the syntax macro above interact in various ways with the
incoming $expr
and $block
Qtrees. We should anticipate the needs of macro
writers, and provide an API that makes simple things easy and mind-bendingly
weird things achievable.
IntelliJ IDEA has various visitors to do this. They hook behaviors onto PSI node type matching, enabling you to say things like "do this for all method calls" or "do this for all field declarations". The idea is sound, even though it'll likely come out looking a bit different in Perl 6 and with Qtrees. But at the very least, we should have various default traversals that cover many common use cases. This is something we've yet to investigate.
One thing which makes this less straightforward in Perl 6 is that the language is more "freely nesting" than Java.
say "{ .foo given class :: { method foo { "OH" } } } $(constant $ = "HAI")"
class C { say "OH { method hai { say "o.O" }.name.uc }" }; C.new.hai
(Java doesn't allow class or constant declarations inside of string literals.)
Again, this means that we need to be more guided by use cases. Which methods do we expect to find when we traverse a class for methods? Probably only the ones registered on the class itself. Which excludes methods in nested classes but includes methods nested in methods, or inside string literals, or other expressions. We need good defaults here, informed by actual use and expectations.
Also, the Qtrees themselves are supposed to be helpful in much the same way. Traversal aside, often when you're sitting there with a reference to a variable or a class, you want to ask "where is the declaration for this?". Such questions are likely to form the basis of interesting Qtree analysis and transformations.
Another thing that we will often want to do is evaluate a Qtree representing an
expression. The process is similar to EVAL
, but starting from an already
parsed/contexted Qtree instead of a program string.
Going in the opposite direction, we sometimes want to construct Qtree literals or identifiers from various run-phase inputs. Sometimes we're less interested in the name we give an identifier, and more interested that it doesn't clash with anything else in the lexical environment (à la gensyms).
It's possible we shouldn't call these "macros" at all. But I don't have a better name for them yet, so "visitor macros" it is, for now.
By way of example, let's say you want to write a macro that makes code such as the following illegal:
if $some-expr == True { # macro stops with "useless use of `== True`"
...
}
In this case, our macro is not so much a particular sub, op, or keyword, but a constellation of Qtree nodes.
The visitor macro might look something like this:
MATCH (Q::If (
Q::Infix::NumEq :$expr (
Q::Enum :$rhs where *.value eq "Bool::True"))) {
die "useless use of `== True`";
}
Every single bit of the above is conjectural syntax.
Conceptually, each visitor macro would traverse the Qtree nodes as they are emitted by the parser. A visitor macro has a matcher part, basically a signature, and a callback part, basically a routine body.
Unlike routine macros, a visitor macro is not expected to return a Qtree object in the end. (And if it does, then it will be discarded.) Instead, they are expected to further typecheck or analyze the matched Qtree nodes, and then maybe take some action. The action may be to modify the Qtree nodes somehow, or to parsefail, or update some global state.
Qtree structures can be matched with subsignatures digging into the child
nodes of the Qtree root. In the cases where this is too constraining, where
blocks can be put to use: in a where
block we can match against
non-descendant nodes in the program.
One big difference to ordinary routines is that there are no dispatch failures. Normally a visitor simply doesn't trigger, and that's that.
Another difference is that since visitor macros are not called, care has to be taken to not trigger them prematurely:
-
Firstly, it is probably prudent not to have visitor macros trigger inside of their own macro body. We can have them register on the final
}
, which shouldn't be an issue. -
Secondly, let's say someone defines a visitor macro like the example above, and wants to export it. Fine, we put an
is export
on the macro, and we might even go with giving visitor macros an identifier so that people can have a say in whether to import them. (And generally a way to refer to them.) In this case, it's easy to avoid putting== True
inif
statements, but in more delicate cases it might not be so easy to avoid triggering the visitor macro in the exporting module. Maybe anis export-only
trait might be useful here?
Alternatively, maybe we should think about separating the declaration and the
activation of a visitor macro? The above form which does both might still be a
convenient default, but thinking of export-only
makes it seem like we sometimes
have control over the exact parser/compunit we activate the visitor macro in.
Also, maybe it might be useful sometimes to be able to programmatically
de-activate a visitor macro. The mechanism could be similar to wrap handles à la
S06, which all have a .restore
method. Maybe visitor activations similarly returns
a handle with a .disable
method.
Some names that may be more meaningful than 'visitor macro': watcher, monitor, egregore (if you're feeling esoteric about it), auk.
'Auk' is, of course, an acronym for AST Usage Kibitzer; it is not at all conducive to the mental image of a bird diving swiftly into the sea of Qtree nodes to hunt down its chosen prey, much less is it an allusion to any random match-conditions-and-take-action programming language you might have heard of, so strike those thoughts from your head and don't think of a pink elephant either.