Skip to content

Instantly share code, notes, and snippets.

@masak
Last active February 22, 2019 13:32
Show Gist options
  • Save masak/13210c51f034f931af0c to your computer and use it in GitHub Desktop.
Save masak/13210c51f034f931af0c to your computer and use it in GitHub Desktop.
Three types of macros -- an essay into the current state of the Perl 6 art

Routine macros

The normal type. They look like subs. Just as with subs, some special names declare ops, in which case you can invoke them by using them as ops.

However, two things distinguish these a lot from subs. First, they accept and return Qtrees. Second, they are called at parse time; it is the parser that (by lookup) realizes that what it just parsed was a macro, passes it the arguments as Qtrees, and then expects back a Qtree that it can replace the call with.

sub compute() { say "COMPUTED!"; return 42 }

sub ex1($value) {
    say "before";
    say $value;
    say "after";
}

ex1 compute();  # COMPUTED!\nbefore\n42\nafter\n
                # compute() runs; Str result is passed to ex1()

macro ex2($qtree) {
    return quasi {
        say "before";
        say {{{ $qtree }}};     # X
        say "after";
    };
}

ex2 compute();  # before\nCOMPUTED!\n42\nafter\n
                # (arg is code, passed as Qtree)
                # compute() isn't actually run until line X runs

Besides this "pure" role of transforming subcall Qtrees (or their operator brethren) into Qtrees, macros can also carry out side effects (prompt/say etc, or changing globals), or carry state from one macro call to another. In general, a macro has two regions:

  • Directly inside the macro. Runs at macro invocation time, which is a part of parsing the program. From here, we can directly influence the compilation process itself.

  • Inside one or more quasi blocks in the macro. Runs as part of the mainline code, since the quasi block gets physically spliced into the normal code.

Activation

When is a macro called? A bit like BEGIN time, it is called as soon as everything is in place. The macro symbol itself needs to be parsed, and also all its sub-Qtrees that act as arguments or operands. This means different things for listops, infixes, prefixes, and postfixes.

mac $x, $y, $z ;
              ^--- here is where we can call the macro

$r ¤ $l ;
       ^

¤$x ;
   ^

$x¤ ;
   ^

my-if $COND, { ... } ;
                    ^

is parsed

But a macro may also declare itself with the is parsed trait, and effectively gains control the parse process earlier than it otherwise would. Instead of "as soon all the sub-Qtrees are in place", the macro now gets called as soon as the macro symbol has been parsed.

mac $x, $y, $z ;
   ^--- with `is parsed`, this is where the macro gets invoked

$l ¤ $r ;
    ^

¤ $x;
 ^

$x¤ ;
   ^ 

my-if $COND, { ... } ;
     ^

In the case of is parsed, the macro is assuming the responsibility to parse the rest of its arguments/operands, and hand control back to the Perl 6 parser in a way that's consistent with the macro's grammatical category. It also gets the responsibility to generate Qtrees for things that would otherwise have come in as macro parameters.

The is parsed trait expects a regex supplying a parsing strategy. Inside of the regex, we have full access to things defined in Perl6::Grammar. As an example, the my-if above almost looks like a normal built-in control structure, save for that annoying comma. With is parsed we can tell the parser that we don't care for the comma.

class Q::MyIf {
    has Q::Expr $.expr;
    has Q::Block $.block;
}

macro my-if(Q::Expr, $expr, Q::Block $block)
        is parsed(/<EXPR> <.ws> <pblock>/) {

    return Q::MyIf.new(:$expr, :$block);
}

It's not clear that we can make the above work, or if it ends up being robust and usable. But it seems to be a possibility. So let's hereby bring is parsed back from the land of the deprecated into the shadow realm of the conjectural.

is reparsed

We're not bringing back is reparsed, though. Besides breaking the rule of one-pass parsing, it's not clear that this form has a good use case driving it.

Unhygiene

These macros can exhibit unhygiene — the absence of hygiene — and declare or modify symbols into the mainline program. Such a macro will screw up simplistic tools such as syntax checkers, highlighters, linters, refactoring tools, IDEs, and Java programmers. Take that, simplistic tools!

Unhygiene may also void the warranty of your computer, your pets, your family, your friends, and nearby celestial bodies. Nevertheless, it can be very helpful sometimes, and I expect people (in proud Perl tradition) to manage to use the extra rope-lengths for shooting themselves in the appropriate amount of feet.

(Friendly reminder: the term "unhygienic macros" still refers to AST-based macros and does not mean "textual macros". As an analogy, whereas unhygiene is like a very lethal, armed nuclear warhead that probably oughtn't be in your possession in the first place, textual macros are like two subcritical piles of hot plutonium that when brought together will definitely go boom, and why are you standing so close to the hot piles anyway and sorry, you're dead now.)

Multi macros

It's possible to define multi macros. The biggest difference to multi subs is that the compiler will always pass Qtree arguments to the macro. Binding happens on compile-time Qtrees, not on runtime values.

Absent an is parsed, a multi macro will parse according to the normal Perl 6 grammar, and then match the resulting Qtrees against the macro signature. If an is parsed is present, a failed parse is tantamount to backtracking out of an alternation, and a successful parse means that the multi macro is considered a candidate. If all multi candidates fail then that's a dispatch error as usual. If two or more tie, then normal tiebreaking rules apply. If at all possible, we should let multi is parsed macros participate in LTM.

Conceptually, an is parsed on the multi macro counts as having an additional constraint placed on the signature. If two multis have the same signature but one of them has an is parsed trait, then the trait-adorned one will count as narrower.

We may end up allowing is parsed on ordinary subroutines, too. The is parsed axis is orthogonal to the sub/macro axis, so it's a possibility. But maybe it's less confusing not to allow that.

Syntax macros

Syntax macros are defined at statement level and essentially introduce a new type of statement. Let's say we wanted a new pretending keyword, a block form of temp:

class Q::Statement::Pretending is Q {
    has Q::Expr $.expr;
    has Q::Block $.block;
}

macro statement_control:<pretending>(Q::Expr $expr, Q::Block $block)
        is parsed(rule { <sym> <EXPR> <pblock> }) {
    # ...code to check that $expr is of the form `{{{$var}}} = {{{$value}}}` elided...
    return quasi {
        temp {{{$var}}} = {{{$value}}};
        {{{$block}}};   # handling of >0 params elided
    }
}

Notice how we're using the same mechanism as with op macros, only within the statement_control grammatical category. The desire for syntax macros comes from Scheme and Racket's define-syntax and syntax-rules facilities. I set out to add support for those things in our macro system, and happily found that something like the above seems to be enough. We need more concrete examples to verify this, though.

Especially considering the discussion above with multi macros, which would allow us to dispatch the same keyword to various different syntax variants — and also would nicely support third-party syntax extension in the same way ordinary multis do — this seems to be a winner. We may still market them as "syntax macros" if we want people to pay special interest to them.

It is an open question exactly which grammatical categories we will be able to hook into like this. But statement_control seems like a straightforward one.

Analysis and traversal

The elided parts of the syntax macro above interact in various ways with the incoming $expr and $block Qtrees. We should anticipate the needs of macro writers, and provide an API that makes simple things easy and mind-bendingly weird things achievable.

IntelliJ IDEA has various visitors to do this. They hook behaviors onto PSI node type matching, enabling you to say things like "do this for all method calls" or "do this for all field declarations". The idea is sound, even though it'll likely come out looking a bit different in Perl 6 and with Qtrees. But at the very least, we should have various default traversals that cover many common use cases. This is something we've yet to investigate.

One thing which makes this less straightforward in Perl 6 is that the language is more "freely nesting" than Java.

say "{ .foo given class :: { method foo { "OH" } } } $(constant $ = "HAI")"

class C { say "OH { method hai { say "o.O" }.name.uc }" }; C.new.hai

(Java doesn't allow class or constant declarations inside of string literals.)

Again, this means that we need to be more guided by use cases. Which methods do we expect to find when we traverse a class for methods? Probably only the ones registered on the class itself. Which excludes methods in nested classes but includes methods nested in methods, or inside string literals, or other expressions. We need good defaults here, informed by actual use and expectations.

Also, the Qtrees themselves are supposed to be helpful in much the same way. Traversal aside, often when you're sitting there with a reference to a variable or a class, you want to ask "where is the declaration for this?". Such questions are likely to form the basis of interesting Qtree analysis and transformations.

Another thing that we will often want to do is evaluate a Qtree representing an expression. The process is similar to EVAL, but starting from an already parsed/contexted Qtree instead of a program string.

Going in the opposite direction, we sometimes want to construct Qtree literals or identifiers from various run-phase inputs. Sometimes we're less interested in the name we give an identifier, and more interested that it doesn't clash with anything else in the lexical environment (à la gensyms).

Visitor macros

It's possible we shouldn't call these "macros" at all. But I don't have a better name for them yet, so "visitor macros" it is, for now.

By way of example, let's say you want to write a macro that makes code such as the following illegal:

if $some-expr == True {  # macro stops with "useless use of `== True`"
    ...
}

In this case, our macro is not so much a particular sub, op, or keyword, but a constellation of Qtree nodes.

The visitor macro might look something like this:

MATCH (Q::If (
           Q::Infix::NumEq :$expr (
               Q::Enum :$rhs where *.value eq "Bool::True"))) {
    die "useless use of `== True`";
}

Every single bit of the above is conjectural syntax.

Conceptually, each visitor macro would traverse the Qtree nodes as they are emitted by the parser. A visitor macro has a matcher part, basically a signature, and a callback part, basically a routine body.

Unlike routine macros, a visitor macro is not expected to return a Qtree object in the end. (And if it does, then it will be discarded.) Instead, they are expected to further typecheck or analyze the matched Qtree nodes, and then maybe take some action. The action may be to modify the Qtree nodes somehow, or to parsefail, or update some global state.

Qtree structures can be matched with subsignatures digging into the child nodes of the Qtree root. In the cases where this is too constraining, where blocks can be put to use: in a where block we can match against non-descendant nodes in the program.

One big difference to ordinary routines is that there are no dispatch failures. Normally a visitor simply doesn't trigger, and that's that.

Another difference is that since visitor macros are not called, care has to be taken to not trigger them prematurely:

  • Firstly, it is probably prudent not to have visitor macros trigger inside of their own macro body. We can have them register on the final }, which shouldn't be an issue.

  • Secondly, let's say someone defines a visitor macro like the example above, and wants to export it. Fine, we put an is export on the macro, and we might even go with giving visitor macros an identifier so that people can have a say in whether to import them. (And generally a way to refer to them.) In this case, it's easy to avoid putting == True in if statements, but in more delicate cases it might not be so easy to avoid triggering the visitor macro in the exporting module. Maybe an is export-only trait might be useful here?

Alternatively, maybe we should think about separating the declaration and the activation of a visitor macro? The above form which does both might still be a convenient default, but thinking of export-only makes it seem like we sometimes have control over the exact parser/compunit we activate the visitor macro in.

Also, maybe it might be useful sometimes to be able to programmatically de-activate a visitor macro. The mechanism could be similar to wrap handles à la S06, which all have a .restore method. Maybe visitor activations similarly returns a handle with a .disable method.

@eritain
Copy link

eritain commented Dec 10, 2015

Some names that may be more meaningful than 'visitor macro': watcher, monitor, egregore (if you're feeling esoteric about it), auk.

'Auk' is, of course, an acronym for AST Usage Kibitzer; it is not at all conducive to the mental image of a bird diving swiftly into the sea of Qtree nodes to hunt down its chosen prey, much less is it an allusion to any random match-conditions-and-take-action programming language you might have heard of, so strike those thoughts from your head and don't think of a pink elephant either.

@masak
Copy link
Author

masak commented Dec 22, 2015

@eritain: Of those, I like "watcher [macro]" best.

The name is not set in stone, and "watcher" is now a strong contender. Perhaps even stronger than "visitor".

@eritain
Copy link

eritain commented Dec 25, 2015

Derp ... 'monitor' is already taken, isn't it. Well, there's always competition for resources. ("Everyone wants the colon.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment