Skip to content

Instantly share code, notes, and snippets.

@edwin0cheng
Last active June 23, 2021 11:27
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save edwin0cheng/7e1214989f304019f61c5a1ae3d8a002 to your computer and use it in GitHub Desktop.
Save edwin0cheng/7e1214989f304019f61c5a1ae3d8a002 to your computer and use it in GitHub Desktop.
RA and rustc macro by examples design

RA

  • ra do not have a TokenStream object
  • ra do not have a SyntaxExtension object
    • SyntaxExtension is a rustc enum, representing the different kinds of syntax extensions.
    • link
  • The current parser implementation about MacroCall:
    • When it see a MACRO_CALL, it just bump all lexer's SyntaxKind to rowan and build a rowan::SyntaxNode
    • Although we have a tt::TokenTree (not confused with syntax::TokenTree), we only generate it during macro parsing and expansion.

rustc

  • rustc macro expander just expand macro_rules as $( $lhs:tt => $rhs:tt );+
  • rustc use two system of non-terminal tokens:
    • defined in ast and token.
    • For example, token::NtExpr for macro and ast::Expr is for ast.
  • rustc lexer :
    • lexer "lex" from string to ast::Token which only contains: (link)
      • single char punct
      • ident
      • lit
  • rustc Parser parse these tokens into ast ptr(e.g, P<ast::Expr>)
  • When parser (link) see a macro invocation (in this case is expression macro ), it return a P<Expr{node=Mac}>
  • After that, when that macro is actually being parsed , it create a Parser (blackbox-parser) to parse (p.parse_expr() link) for corresponding item (in this case is P<Expr>), and wrap it into token::NtExpr.
  • And the macro expander will "transcribe" it into ast tree.
  • rustc AstFragment type, a list of things macro can be transcribed into:

rustc bulitin

Vadim Petrochenkov talked about "where to find the code":

https://rust-lang.zulipchat.com/#narrow/stream/196385-t-compiler.2Fwg-learning/topic/macros.20discussion/near/171633372 Copy from zulip:

Where to find the code:

  • libsyntax_pos/hygiene.rs - structures related to hygiene and expansion that are kept in global data (can be accessed from any Ident without any context)
  • libsyntax_pos/lib.rs - some secondary methods like macro backtrace using primary methods from hygiene.rs
  • libsyntax_ext - implementations of built-in macros (including macro attributes and derives) and some other early code generation facilities like injection of standard library imports or generation of test harness.
  • libsyntax/config.rs - implementation of cfg/cfg_attr (they treated specially from other macros), should probably be moved into libsyntax/ext.
  • libsyntax/tokenstream.rs + libsyntax/parse/token.rs - structures for compiler-side tokens, token trees, and token streams.
  • libsyntax/ext - various expansion-related stuff
    • libsyntax/ext/base.rs - basic structures used by expansion
    • libsyntax/ext/expand.rs - some expansion structures and the bulk of expansion infrastructure code - collecting macro invocations, calling into resolve for them, calling their expanding functions, and integrating the results back into AST
    • libsyntax/ext/placeholder.rs - the part of expand.rs responsible for "integrating the results back into AST" basicallly, "placeholder" is a temporary AST node replaced with macro expansion result nodes
    • libsyntax/ext/builer.rs - helper functions for building AST for built-in macros in libsyntax_ext (and user-defined syntactic plugins previously), can probably be moved into libsyntax_ext these days
    • libsyntax/ext/proc_macro.rs + libsyntax/ext/proc_macro_server.rs - interfaces between the compiler and the stable proc_macro library, converting tokens and token streams between the two representations and sending them through C ABI l* ibsyntax/ext/tt - implementation of macro_rules, turns macro_rules DSL into something with signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, @mark-i-m knows more about this
  • librustc_resolve/macros.rs - resolving macro paths, validating those resolutions, reporting various "not found"/"found, but it's unstable"/"expected x, found y" errors
  • librustc/hir/map/def_collector.rs + librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly expanded from a macro into various parent/child structures like module hierarchy or "definition paths"

Primary structures:

  • HygieneData - global piece of data containing hygiene and expansion info that can be accessed from any Ident without any context
  • ExpnId - ID of a macro call or desugaring (and also expansion of that call/desugaring, depending on context)
  • ExpnInfo/InternalExpnData - a subset of properties from both macro definition and macro call available through global data
  • SyntaxContext - ID of a chain of nested macro definitions (identified by ExpnIds)
  • SyntaxContextData - data associated with the given SyntaxContext, mostly a cache for results of filtering that chain in different ways
  • Span - a code location + SyntaxContext
  • Ident - interned string (Symbol) + Span, i.e. a string with attached hygiene data
  • TokenStream - a collection of TokenTrees
  • TokenTree - a token (punctuation, identifier, or literal) or a delimited group (anything inside ()/[]/{})
  • SyntaxExtension - a lowered macro representation, contains its expander function transforming a tokenstream or AST into tokenstream or AST + some additional data like stability, or a list of unstable features allowed inside the macro.
  • SyntaxExtensionKind - expander functions may have several different signatures (take one token stream, or two, or a piece of AST, etc), this is an enum that lists them
  • ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing the expander signatures (TODO: change and rename the signatures into something more consistent)
  • trait Resolver - a trait used to break crate dependencies (so resolver services can be used in libsyntax, despite librustc_resolve and pretty much everything else depending on libsyntax)
  • ExtCtxt/ExpansionData - various intermediate data kept and used by expansion infra in the process of its work
  • AstFragment - a piece of AST that can be produced by a macro (may include multiple homogeneous AST nodes, like e.g. a list of items)
  • Annotatable - a piece of AST that can be an attribute target, almost same thing as AstFragment except for types and patterns that can be produced by macros but cannot be annotated with attributes (TODO: Merge into AstFragment)
  • trait MacResult - a "polymorphic" AST fragment, something that can turn into a different AstFragment depending on its context (aka AstFragmentKind - item, or expression, or pattern etc.)
  • Invocation/InvocationKind - a structure describing a macro call, these structures are collected by the expansion infra (InvocationCollector), queued, resolved, expanded when resolved, etc.

Primary algorithms / actions:

TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment