Skip to content

Instantly share code, notes, and snippets.

@Kestrer
Created October 17, 2020 05:35
Show Gist options
  • Star 95 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save Kestrer/8c05ebd4e0e9347eb05f265dfb7252e1 to your computer and use it in GitHub Desktop.
Save Kestrer/8c05ebd4e0e9347eb05f265dfb7252e1 to your computer and use it in GitHub Desktop.
A guide on how to write hygienic Rust macros

How to Write Hygienic Rust Macros

Macro hygiene is the concept of macros that work in all contexts; they don't affect and aren't affected by anything around them. Ideally all macros would be fully hygienic, but there are lots of pitfalls and traps that make it all too easy to accidentally write unhygienic macros. This guide attempts to provide a comprehensive resource for writing the most hygienic macros.

Understanding the Module System

First, a little aside on the details of Rust's module system, and specifically paths; it is important to know about this for later.

Namespaces

Rust has three independent namespaces: one for types, one for values and one for macros. Traits, braced structs, enums, unions, modules and crates all go in the type namespace. Functions, constants and statics all share the value namespace, while macros unsurprisingly use the macro namespace. This system is what allows there to be derive macros with the same name as the trait it derives - the trait and its derive are actually completely separate items, they just have the same name as they are in different namespaces.

The only interesting case here is tuple structs and unit structs, which go in both the type namespace and the value namespace: tuple structs are also functions that take in the arguments of the tuple, and unit structs are also constants that are the value of the unit struct. This means that these two programs won't compile:

struct Foo();
fn Foo() {}
struct Foo;
fn Foo() {}

But this one will:

struct Foo {}
fn Foo() {}

2015 Edition

The 2015 edition was the first edition of Rust. Although its use is deprecated and is fading out, it's important to know about it so you can write macros that work in all editions. In its module system, there is one central tree that contains everything, and it starts from the crate root. For example, this simple program:

pub struct Foo;

pub mod bar {
    pub fn bar() {}
}

Produces a module tree like so:

Crate Root
├─ type std
│  └─ (the standard library is here)
├─ type Foo
└─ type bar
   └─ value bar

As you can see it contains the standard library and all the items that you have defined.

You can use paths to get items from different parts of the module tree. All paths are relative to the current module, with one exception - paths in a use declaration always start from the crate root. Path resolution works like so:

  • Paths beginning with crate:: or :: get items from the crate root.
  • Paths beginning with super:: get items from the parent module (attempting to go above the crate root causes an error).
  • Paths beginning with self:: get items from the current module.
  • Paths that do not fall into one of the above categories get items either from the current module or from the prelude, but only if it is enabled.

That last point is what makes the prelude special. So for example:

// This works...
const FOO: Vec<()> = Vec::new();
// but this doesn't.
const FOO: self::Vec<()> = Vec::new();
// However if we import Vec properly Vec becomes an item in the current scope:
use std::vec::Vec;
// So this works...
const FOO: Vec<()> = Vec::new();
// and this does too.
const FOO: self::Vec<()> = Vec::new();

So that's the 2015 edition. It worked fine, but extern crate and re-importing all your dependencies in every single module was a pain, and this is precisely what the 2018 edition sought to fix.

2018 Edition

The 2018 edition, released in Rust 1.31, brought about many changes to Rust, but we will be focusing on the changes to the module system. First, let's look at our example above, but this time with Rust 2018:

pub struct Foo;

pub mod bar {
    pub fn bar() {}
}

The module tree then becomes:

Extern Prelude
├─ core
│  └─ (libcore is here)
└─ std
   └─ (the standard library is here)

Crate Root
├─ type Foo
└─ type bar
   └─ value bar

So what's changed? Well, the first thing to note is that the crate root isn't the only root - there is also the extern prelude, and it contains external crates like the standard library. The extern prelude contains any crates passed with --extern to the Rust compiler as well as any crates declared with extern crate in the crate root; Cargo automatically places all of your dependencies in the extern prelude, and core and std are in there by default. The only time you'll have to manually put something in the extern prelude with extern crate is with built-in but not there by default crates, such as alloc.

Use declarations also changed, they are now relative to the current module instead of absolute from the crate root, making them consistent with other paths. Additionally the path resolution algorithm is now different:

  • Paths beginning with :: get items from the extern prelude.
  • Paths beginning with crate:: get items from the crate root (same as before).
  • Paths beginning with super:: get items from the parent module (same as before).
  • Paths beginning with self:: get items from the current module (same as before).
  • Paths that do not fall into one of the above categories get items from, in this order, the current module, the extern prelude, or the prelude, but only if it is enabled.

You may also notice that the crate root no longer contains std. This makes it possible for example to define a module called std at the root of your crate, and not have compilation errors - you'll just have to refer to the actual standard library as ::std.

These changes made dependencies a lot easier to work with - gone are the days of littering your lib.rs with extern crates and re-importing dependencies inside every module, now you can automatically use dependencies wherever you are.

Writing Hygienic Macros

With all that in mind, let's learn how to write hygienic macros that work in all contexts on any edition.

Use Unambiguous Paths

First of all, all your paths for the standard library or another crate should start with ::. Just the crate name on its own will work fine most of the time in the 2018 edition, and crate:: paths will work fine in the 2015 edition, but paths starting in :: will work with both.

let foo: Option<()>;

let foo: std::option::Option<()>;

let foo: ::std::option::Option<()>;

Secondly, if there's something your macro defines in the current module that you need to access, use self:: paths; it avoids ambiguities where there's a crate in the extern prelude with the same name. In general, you should never use any other kind of path as they are simply too unreliable and ambiguous; the prelude can always be overwritten by a user.

struct Foo;

let foo: Foo;

struct Foo;

let foo: self::Foo;

Use Unambiguous Traits

It is never safe to use trait methods inside a macro. Even something seemingly harmless like "Hello World!".to_owned() can cause errors if used in certain contexts - the user could have prelude disabled so the ToOwned trait isn't in scope, or if the user could have another trait that also has a method to_owned implemented on strings.

let s = "Hello World".to_owned();
let new_file = file.clone();

let s = ::std::borrow::ToOwned::to_owned("Hello World");
let new_file = ::std::clone::Clone::clone(&file);

Keep in mind that it is always safe to use inherent methods inside a macro even if a trait that the type implements has a method of the same name - so don't go writing ::std::option::Option::map when you could just use .map.

Declarative macros: Use $crate

Declarative macros provide a very useful feature, $crate, which is a special path root that unambiguously refers to the crate in which the declarative macro was written. It is very useful - you should never refer to crates other than std without using $crate.

::my_crate::do_things()

$crate::do_things()

Procedural macros: Support Renaming the Crate

Procedural macros unfortunately don't have any equivalent feature, and assuming the crate is at ::crate_name will break if the user has renamed the crate or the crate is being re-exported by another crate. You should work around this by accepting a parameter that sets the path to the crate name, and defaults to ::crate_name if it's not present.

quote!(::my_crate::do_things())

let my_crate = get_crate_parameter().unwrap_or_else(|| quote!(::my_crate));
quote!(#my_crate::do_things())

Re-export core and alloc

When writing macros that work in no_std environments, you run into a problem: the 2015 edition doesn't provide a way to access the core crate that works in all contexts, and neither edition provides a way to access the alloc crate that works in all contexts. A good workaround for this is to export everything from your crate root:

#[doc(hidden)]
pub use ::core;
#[doc(hidden)]
pub use ::alloc;

You can then just use your crate name to access those two crates.

Procedural Macros: Use mixed_site where possible

This one only applies to procedural macros, as macro_rules! macros get the hygienic behaviour by default.

Procedural macros emit tokens, which carry hygiene information with them in Spans. There are two stable types of hygiene: call site and mixed site. Call site hygiene, which essentially just pastes in the macro output as if it was written directly there, is dangerous, as your code's variables and items can interfere with the user's code's stuff. So when possible use mixed_site hygiene for your Spans, which protects against many common hygiene issues.

let temp_var = Ident::new("temp", Span::call_site());
quote! {
    let #temp_var = 5;
    // User code can access `temp` here.
    #user_code
}

let temp_var = Ident::new("temp", Span::mixed_site());
quote! {
    let #temp_var = 5;
    // User code cannot access `temp` so everything is hygienic.
    #user_code
}

Even Primitives Aren't Safe

Primitives are not keywords. This means that they can be overwritten - you can do weird things like write struct i32; if you really want to (which can lead to some pretty interesting code). To be able to function even in these conditions there is the ::core::primitive/::std::primitive module that re-exports all the real primitives for you - simply replace uses of primitives like i32 with absolute paths to that module, like ::std::primitive::i32. To test that your macro uses these paths, you can write:

trait bool {}
trait char {}
trait f32 {}
trait f64 {}
trait i128 {}
trait i16 {}
trait i32 {}
trait i64 {}
trait i8 {}
trait isize {}
trait str {}
trait u128 {}
trait u16 {}
trait u32 {}
trait u64 {}
trait u8 {}
trait usize {}

at the top of a test that invokes your macro. Then any attempt to use the unqualified paths will result in a compilation error.

let number: i32 = 5;

let number: ::std::primitive::i32 = 5;

However, the major downside to this is that this module is fairly new - it was only introduced in Rust 1.43. Is your MSRV is below that, you'll have to stick to the ambiguous standard primitive names.

Test your macro with #![no_implicit_prelude]

The #![no_implicit_prelude] crate attribute changes path resolution, and makes it so that previously ambiguous paths that could refer to items in the current module, the extern prelude or the prelude can now only refer to items in the current module - all those paths become equivalent to self:: paths.

Having a test that uses this attribute makes sure that you never accidentally rely on the prelude. It can be very difficult to be properly hygienic otherwise.

// tests/hygiene.rs
#![no_implicit_prelude]

::my_crate::invoke_all_paths_of_my_macro!()

Conclusion

This post should contain everything you need to know to write good macros. So please, if you own a macro crate, make sure to make it hygienic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment