Skip to content

Instantly share code, notes, and snippets.

@jnthn
Last active April 27, 2022 17:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jnthn/420e93cd4ca074da0b620992c10fb361 to your computer and use it in GitHub Desktop.
Save jnthn/420e93cd4ca074da0b620992c10fb361 to your computer and use it in GitHub Desktop.

Compilation phases

Background

Compiling Raku is made especially interesting because the language user may participate in compile time, by writing BEGIN blocks, using constant, writing custom trait handlers, and so forth. The presence of declarations that are made can impact on the parse that follows. There is also a notion of CHECK time at the end of compilation, before optimization takes place. With RakuAST we expect that this list to grow with:

  • EVAL of an AST, where any actions that the parser might usually drive will not have taken place
  • Macros, which are routines invoked with pieces of AST and produce further AST, or may even modify the AST they are given
  • Quasi-quoted code, where we need to effect declarations to the degree they impact the parse, but would need to have them separately created in each "instantiation" of the quasi
  • Custom compiler passes
  • Access to the AST in routine traits

RakuAST nodes represent Raku code constructs quite closely. They form a tree: a given node may have attributes that refer to other RakuAST nodes. There is, however, no link from a child node up to its parent. Each node provides a visit-children routine that runs a callback on each of the child nodes it has, providing a uniform way to walk the tree.

However, compilation is far more involved that simply walking the tree and making code for each node. Of note:

  • Some symbols need registering immediately when parsing, since their declaration impacts the parse that follows
  • Usages need to be linked to declarations (and undeclared things reported)
  • Thunks need to be applied (for example, in expressions like * + 1, or in attribute initializer values, or in the replacement part of a substitution)
  • Meta-objects need to be produced
  • Traits need to be applied
  • Some nodes need to know about surrounding context in order to compile; for example, the compilation of $=finish needs to access the data attached to the enclosing compilation unit
  • Some nodes need to have their relationship with non-direct children managed; for example, a block or routine may have placeholder parameters, and a class declaration will have methods and attributes
  • Some nodes produce synthetic pieces of AST. For example, placeholder parameter declarations result in the production of a RakuAST::Signature

Those working with RakuAST shall also want to be able to analyze and modify the AST, however, which complicates matters further since those changes might come "too late". For example, a Routine trait accessing the AST might try to make changes that would impact upon the meta-object that is produced for the routine, however we need to form the Routine meta-object in order to pass it to the trait! It's OK if the answer is "too late", but we should avoid silent failure in such cases.

Phases

Parse

In the case that the AST node is being produced while parsing, this phase happens immediately after the node's creation. Node creation happens:

  • For elements that are lexical scopes, immediately upon scope entry
  • For all other elements, at the point they are produced by the action method

In a synthetic AST it happens in the initial walk of the tree, top-down for anything that is a RakuAST::LexicalScope and bottom-up for anything else. In this phase:

  • Elements that result in declarations register themselves with the resolver immediately, so that the symbol can be resolved later in the parse (this is primarily true of lexical symbols, but also true of multi-part package delcarations)
  • Packages stub a meta-object and apply traits
  • Variable usages that require parse-time resolution are resolved
  • Types are resolved

Begin

In the parser, begin time happens after the parsing of an element is complete. Thus for a sub it happens after the closing }, and similar for a package declaration. For explicit BEGIN statements it comes after the block or statement is parsed, and for constant and subset declarations after they are parsed too.

In a synthetic AST, this happens after bottom-up after visiting the node, and always after any parse time effects have been applied.

In this phase:

  • Placeholder parameters lead to signature production
  • Meta-objects for routines, packages, subset, and enum types are produced, and any traits are applied

Check

This phase most typically happens after the entire AST for a compilation unit has been produced. However, any code that is to be executed at begin time gets its check time moved earlier. In this phase:

  • Various semantic errors, including calls to undeclared routines, are reported
  • Sink context is calculated

Custom compiler passes

Custom compiler passes provide:

  • Phase predicates for nodes they wish to apply to: parse-predicate, begin-predicate, etc.
  • Zero or more pre-phase or post-phase method implementations

Note that the predicates are only evaluated (and phase methods only called) on nodea that do a matching phase role. For example, the parse-predicate is only evaluated, and pre-parse and post-parse only called, on nodes that do the RakuAST::ParseTime role. The default predicate for all of these evaluates to False; one must always override at least one predicate method.

The phase methods that may be implemented are:

  • pre-parse: Before parse time - that is to say, before the AST node has taken any parse-time action. This happens before symbols are resolved. The no strict pragma would be implemented by first attempting to resolve a variable reference, and if it fails, synthesizing a declaration. Since undeclared variables whine at parse time, this is the only phase that is early enough to implement such a feature. It's also the only one where it is possible to prevent declarations being made or change names that are to be declared. For a RakuAST::LexicalScope this is called around the opening curly, and so children won't be attached yet.
  • post-parse: After parse time. Generally, any declarations will have been made at this point, and it is safe to request the stub meta-object. However, full meta-objects should not be requested, as those are not produced until begin time. For a RakuAST::LexicalScope this is called around the opening curly, and so children won't be attached yet.
  • pre-begin: Before begin time. At this point, one can expect child elements to have been attached, and thus to have a complete AST. There may yet be semantic errors in that AST, however. Meta-objects will not have yet been produced. Most AST modifications are still possible at this point (only those like name changes to already declared elements are invalid). For example, one could replace the signature.
  • post-begin: After begin time. At this point, meta-objects (such as for blocks and packages) will have been produced. Changes to elements that would impact upon the produced meta-objects are now too late. Again, the AST may still have semantic errors.
  • pre-check: Before check time. It's still possible to change the AST, to the same degree it was at BEGIN time.
  • post-check: By this point, any semantic errors that would prevent the successful compilation will have been reported. Any changes to the AST also forbidden, however one can know that the program is well formed and will survive compilation.

AST mutation

Mutation of already-formed AST is achieved through replace methods. These will throw an exception in the event that compile-time side-effects (such as parse-time or begin-time meta-object production have already rendered the change "too late").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment