Skip to content

Instantly share code, notes, and snippets.

@wonbyte
Last active November 22, 2023 22:51
Show Gist options
  • Save wonbyte/26776262725ee02a32c4f62ed9b3c475 to your computer and use it in GitHub Desktop.
Save wonbyte/26776262725ee02a32c4f62ed9b3c475 to your computer and use it in GitHub Desktop.

πŸ¦€ TL;DR Rust πŸ¦€

Table of content:

✨Data Layout✨

All types have an alignment specified in bytes. The alignment of a type specifies what addresses are valid to store the value at. A value with alignment n must only be stored at an address that is a multiple of n. So alignment 2 means you must be stored at an even address, and 1 means that you can be stored anywhere. Alignment is at least 1, and always a power of 2.

A type's size must always be a multiple of its alignment (Zero being a valid size for any alignment). This ensures that an array of that type may always be indexed by offsetting by a multiple of its size.

The reason why the alignment requirement exists is to ensure efficient access to memory. Modern processors access memory in "chunks" of a certain size (usually the word size of the processor, like 4 bytes for a 32-bit processor or 8 bytes for a 64-bit processor). If a type's size is a multiple of its alignment, then elements of that type can be read from or written to memory in one operation, without any need to fetch extra data or perform any complicated manipulation.

Rust gives you the following ways to lay out composite data:

  • structs (named product types)
  • tuples (anonymous product types)
  • arrays (homogeneous product types)
  • enums (named sum types: "a type that can have different values which may be different types".)
  • unions (untagged unions)

An enum is said to be field-less if none of its variants have associated data.

By default, composite structures have an alignment equal to the maximum of their fields' alignments. Rust will consequently insert padding where necessary to ensure that all fields are properly aligned and that the overall type's size is a multiple of its alignment.

struct A {
    a: u8,
    b: u32,
    c: u16,
}

will be 32-bit aligned on a target that aligns these primitives to their respective sizes. The whole struct will therefore have a size that is a multiple of 32-bits.

There is no indirection for these types; all data is stored within the struct, as you would expect in C. However with the exception of arrays (which are densely packed and in-order), the layout of data is not specified by default.

Given:

struct A {
    a: i32,
    b: u64,
}

struct B {
    a: i32,
    b: u64,
}

Rust does guarantee that two instances of A have their data laid out in exactly the same way. However Rust does not currently guarantee that an instance of A has the same field ordering or padding as an instance of B.

✨Ownership 101✨

There are 3 primary forms that self can take: self, &mut self, and &self. These 3 forms represent the three primary forms of ownership in Rust:

  • self - Value
  • &mut self - mutable reference
  • &self - shared reference

A value represents true ownership. You can do whatever you want with a value: move it, destroy it, mutate it, or loan it out via a reference. When you pass something by value, it's moved to the new location. The new location now owns the value, and the old location can no longer access it.

A mutable reference represents temporary exclusive access to a value that you don't own. You're allowed to do absolutely anything you want to a value you have a mutable reference to as long you leave it in a valid state when you're done (it would be rude to the owner otherwise!). This means you can actually completely overwrite the value.

A shared reference represents temporary shared access to a value that you don't own. Because you have shared access, you're generally not allowed to mutate anything. Think of & as putting the value out on display in a museum.

References

There are two kinds of reference:

  • Shared reference: &
  • Mutable reference: &mut

Which obey the following rules:

  • A reference cannot outlive its referent
  • A mutable reference cannot be aliased

Aliases

 * Rust conceptually handles reborrows by maintaining a "borrow stack"
 * Only the one on the top of the stack is "live" (has exclusive access)
 * When you access a lower one it becomes "live" and the ones above it get popped
 * You're not allowed to use pointers that have been popped from the borrow stack
 * The borrowchecker ensures safe code code obeys this
 * Miri theoretically checks that raw pointers obey this at runtime

That's the (very simplified) crux of pointer aliasing: when can the compiler assume it's safe to "remember" (cache) values instead of loading them over and over? To know that, the compiler needs to know whenever there could be little angry men mutating the memory behind your back.

When we reborrow a mutable pointer, the original pointer can't be used anymore until the borrower is done with it (no more uses).

Once a shared reference is on the borrow-stack, everything that gets pushed on top of it only has read permissions.

This is how we can have reborrows and still have aliasing information: all of our reborrows clearly nest, so we can consider only one of them "live" at any given time.

Whatever's at the top of the borrow stack is "live" and knows it's effectively unaliased. When you reborrow a pointer, the new pointer is pushed onto the stack, becoming the live pointer. When you use an older pointer it's brought back to life by popping everything on the borrow stack above it. At this point the pointer "knows" it was reborrowed and that the memory might have been modified, but that it once more has exclusive access

NOTE: Randomly mixing safe pointers like &, &mut, and Box with unsafe pointers like *mut and *const is a recipe for Undefined Behaviour because the safe pointers introduce extra constraints that we aren't obeying with the raw pointers.

Lifetimes

Lifetimes are named regions of code that a reference must be valid for. A lifetime position is anywhere you can write a lifetime in a type:

&'a T
&'a mut T
T<'a>

Lifetime positions can appear as either "input" or "output":

  • For fn definitions, fn types, and the traits Fn, FnMut, and FnOnce, input refers to the types of the formal arguments, while output refers to result types. So fn foo(s: &str) -> (&str, &str) has elided one lifetime in input position and two lifetimes in output position. Note that the input positions of a fn method definition do not include the lifetimes that occur in the method's impl header (nor lifetimes that occur in the trait header, for a default method).
  • For impl headers, all types are input. So impl Trait<&T> for Struct<&T> has elided two lifetimes in input position, while impl Struct<&T> has elided one.
pub struct Foo<'a> {
    data: &'a str,
}

impl<'a> Foo<'a> {
    pub fn new(data: &'a str) -> Foo<'a> {
        Foo { data }
    }

    pub fn get(&self) -> &str {
        self.data
    }
}

In the impl block for Foo<'a>, the 'a in impl<'a> Foo<'a> is a lifetime declaration for the Foo struct. This lifetime is not included in the input positions of the methods new and get.

The new function has an elided lifetime in the input position for the data argument (&'a str), and in the output position for the Foo<'a> return type. The get method has an elided lifetime in the output position for its return type (&str).

Even though 'a is used in the new function, it is considered as part of the function's signature, not part of the impl header. The lifetime declared in the impl block is essentially indicating that the methods within that block are associated with the lifetime of data in the Foo struct.

Elision Rules

  • Each elided lifetime in input position becomes a distinct lifetime parameter.
  • If there is exactly one input lifetime position (elided or not), that lifetime is assigned to all elided output lifetimes.
  • If there are multiple input lifetime positions, but one of them is &self or &mut self, the lifetime of self is assigned to all elided output lifetimes.
  • Otherwise, it is an error to elide an output lifetime.

Subtyping

Because Cats are just Animals and more, we say Cat is a subtype of Animal (because Cats are a subset of all the Animals). Equivalently, we say that Animal is a supertype of Cat. With subtypes, we can tweak our overly strict static type system with a simple rule: anywhere a value of type T is expected, we will also accept values that are subtypes of T.

Lifetimes are just regions of code, and regions can be partially ordered with the contains (outlives) relationship. Subtyping on lifetimes is in terms of that relationship: if 'big: 'small ("big contains small" or "big outlives small"), then 'big is a subtype of 'small.

It makes sense if you consider our Animal example: Cat is an Animal and more, just as 'big is 'small and more.

Put another way, if someone wants a reference that lives for 'small, usually what they actually mean is that they want a reference that lives for at least 'small. They don't actually care if the lifetimes match exactly. So it should be ok for us to forget that something lives for 'big and only remember that it lives for 'small.

note that 'static, the forever lifetime, is a subtype of every lifetime because by definition it outlives everything.

Variance

Variance is a set of rules governing how subtyping should compose. Most importantly, variance defines situations where subtyping should be disabled.

** Note: For convenience we will often refer to F<T> as a type constructor just so that we can easily talk about T.**

A type constructor F's variance is how the subtyping of its inputs affects the subtyping of its outputs. There are three kinds of variance in Rust. Given two types Sub and Super, where Sub is a subtype of Super:

  • F is covariant if F<Sub> is a subtype of F<Super> (subtyping "passes through")
  • F is contravariant if F<Super> is a subtype of F<Sub> (subtyping is "inverted")
  • F is invariant otherwise (no subtyping relationship exists)

If F has multiple type parameters, we can talk about the individual variances by saying that, for example, F<T, U> is covariant over T and invariant over U.

It is very useful to keep in mind that covariance is, in practical terms, "the" variance. Almost all consideration of variance is in terms of whether something should be covariant or invariant. Actually witnessing contravariance is quite difficult in Rust, though it does in fact exist.

'a T U
&'a T covariant covariant
&'a mut T covariant invariant
Box<T> covariant
UnsafeCell<T> invariant
fn(T) -> U contravariant covariant

All the others can be understood by analogy to the others:

  • Vec<T> and all other owning pointers and collections follow the same logic as Box<T>
  • Cell<T> and all other interior mutability types follow the same logic as UnsafeCell<T>
  • *const T follows the logic of &T
  • *mut T follows the logic of &mut T (or UnsafeCell<T>)

We can easily see why &T being covariant over T is sound: it doesn't let you modify the value, only look at it. Without any way to mutate, there's no way for us to mess with any details. We can also see why UnsafeCell and all the other interior mutability types must be invariant: they make &T work like &mut T.

So even though references are covariant over their lifetimes, they "inherit" invariance whenever they're put into a context that could do something bad with that.

Owned values can be covariant because they make you change everything. There is no connection between old locations and new locations. Applying by-value subtyping is an irreversible act of knowledge destruction, and without any memory of how things used to be, no one can be tricked into acting on that old information!

Function types, unlike anything else in the language, are contravariant over their arguments. If we need a function that can handle anything that lives for at least 'long, it's perfectly fine for it to be able to handle anything that lives for at least 'short.

A struct, informally speaking, inherits the variance of its fields. If a struct MyType has a generic argument A that is used in a field a, then MyType's variance over A is exactly a's variance over A.

However if A is used in multiple fields:

  • If all uses of A are covariant, then MyType is covariant over A
  • If all uses of A are contravariant, then MyType is contravariant over A
  • Otherwise, MyType is invariant over A

&'a mut T is covariant over 'a but invariant over T.

Drop Check

Variables are dropped in the reverse order of their definition, fields of structs and tuples in order of their definition.

Implementing Drop lets a Struct execute some arbitrary code during its death. This means it can potentially observe that types that are supposed to live as long as it does actually were destroyed first.

Sound generic drop: For a generic type to soundly implement drop, its generics arguments must strictly outlive it. As far as the borrow checker knows while it is analyzing a function, the body of a Structs's destructor might access that borrowed data. Therefore, the drop checker forces all borrowed data in a value to strictly outlive that value.

While the drop order of fields inside a struct is defined, relying on it is fragile and subtle. When the order matters, it is better to use the ManuallyDrop wrapper.

Phantom Data

PhantomData is a way for us to give the compiler an extra "example" field that conceptually exists in your type but for various reasons (indirection, type erasure, ...) doesn't.

Unbounded lifetimes and types are forbidden in struct definitions. Therefore we must somehow refer to these types in the body. Correctly doing this is necessary to have correct variance and drop checking.

struct Iter<'a, T: 'a> {
    ptr: *const T,
    end: *const T,
}

Because 'a is unused within the struct's body, it's unbounded.

We do this using PhantomData, which is a special marker type. PhantomData consumes no space, but simulates a field of the given type for the purpose of static analysis.

use std::marker;

struct Iter<'a, T: 'a> {
    ptr: *const T,
    end: *const T,
    _marker: marker::PhantomData<&'a T>,
}

and that's it. The lifetime will be bounded, and the iterator will be covariant over 'a and T.

Phantom type 'a T
PhantomData - covariant (with drop check)
PhantomData<&'a T> covariant covariant
PhantomData<&'a mut T> covariant invariant
PhantomData<*const T> - covariant
PhantomData<*mut T> - invariant
PhantomData<fn(T)> - contravariant
PhantomData<fn() -> T> - covariant
PhantomData<fn(T) -> T> - invariant
PhantomData<Cell<&'a ()>> invariant -

Any time you do use NonNull (or just raw pointers in general), you should always add PhantomData to be safe and make it clear to the compiler and others what you think you're doing. In this case using NonNull because we're claiming our type behaves "as if" it stored a value T, so we add a PhantomData to make that explicit.

Rust understands that you can safely split a mutable reference into subfields. We can then encode permanently consuming a reference via Options (or in the case of slices, replacing with an empty slice).

✨Type Conversions✨

At the end of the day, everything is just a pile of bits somewhere, and type systems are just there to help us use those bits right. There are two common problems with typing bits: needing to reinterpret those exact bits as a different type, and needing to change the bits to have equivalent meaning for a different type.

Dot operator

Suppose we have a function foo that has a receiver (a self, &self or &mut self parameter). If we call value.foo(), the compiler needs to determine what type Self is before it can call the correct implementation of the function. For this example, we will say that value has type T.

  • First, the compiler checks if it can call T::foo(value) directly. This is called a "by value" method call.
  • If it can't call this function (for example, if the function has the wrong type or a trait isn't implemented for Self), then the compiler tries to add in an automatic reference. This means that the compiler tries <&T>::foo(value) and <&mut T>::foo(value). This is called an "autoref" method call.
  • If none of these candidates worked, it dereferences T and tries again. This uses the Deref trait - if T: Deref<Target = U> then it tries again with type U instead of T. If it can't dereference T, it can also try unsizing T. This just means that if T has a size parameter known at compile time, it "forgets" it for the purpose of resolving methods. For instance, this unsizing step can convert [i32; 2] into [i32] by "forgetting" the size of the array.

Casting

Casts are a superset of coercions: every coercion can be explicitly invoked via a cast. However some conversions require a cast. While coercions are pervasive and largely harmless, these "true casts" are rare and potentially dangerous. As such, casts must be explicitly invoked using the as keyword: expr as Type.

  • Note that lengths are not adjusted when casting raw slices; *const [u16] as *const [u8] creates a slice that only includes half of the original memory.
  • Casting is not transitive, that is, even if e as U1 as U2 is a valid expression, e as U2 is not necessarily so.

✨Uninitialized Memory✨

All runtime-allocated memory in a Rust program begins its life as uninitialized. In this state the value of the memory is an indeterminate pile of bits that may or may not even reflect a valid state for the type that is supposed to inhabit that location of memory. Attempting to interpret this memory as a value of any type will cause Undefined Behavior.

Checked

Like C, all stack variables in Rust are uninitialized until a value is explicitly assigned to them. Unlike C, Rust statically prevents you from ever reading them until you do.

This is based off of a basic branch analysis: every branch must assign a value to x before it is first used. For short, we also say that "x is init(initialized)" or "x is uninit(declared but not yet initialized)".

Rust doesn't require the variable to be mutable to perform a delayed initialization if every branch assigns exactly once. However the analysis does not take advantage of constant analysis or anything like that.

Drop Flags

Rust actually tracks whether a type should be dropped or not at runtime. As a variable becomes initialized and uninitialized, a drop flag for that variable is toggled. When a variable might need to be dropped, this flag is evaluated to determine if it should be dropped.

The drop flags are tracked on the stack.

✨Random Bits✨

Null Pointer Optimization

enum Foo {
    A,
    B(ContainsANonNullPtr),
}

in this scenario null pointer optimization kicks in, which eliminates the space needed for the tag. If the variant is A, the whole enum is set to all 0's. Otherwise, the variant is B. This works because B can never be all 0's, since it contains a non-zero pointer.

Typically, when an Option is used, Rust must reserve extra space in memory to store this additional "tag" information. For example, Option might use an extra byte (or more) beyond the 4 bytes required for a u32, to store whether it's Some or None.

However, with the null pointer optimization, Rust can cleverly use a certain "impossible" or "invalid" value of T to represent None, thereby eliminating the need for additional storage. In the case of pointer types (like &, &mut, Box, Rc, Arc), the "null" pointer value (a pointer that points to no valid memory address, typically represented as all 0's) is invalid and thus can be used to represent None. Similarly, Vec also utilizes this optimization where an empty Vec (a vector with no allocated memory) represents None.

This means that Option<&T>, Option<Box>, Option<Rc>, Option<Arc>, and Option<Vec> all use the same amount of memory as their non-Option counterparts. Hence, there is "no overhead" when these types are wrapped in an Option. This is a powerful optimization, as it allows Rust programs to use Option types without any additional memory cost in many situations.

A null pointer can safely be interpreted as the unit (None) variant. The net result is that, for example,

size_of::<Option<&T>>() == size_of::<&T>().

Dynamically Sized Types (DSTs)

DSTs are types without a statically known size or alignment. These types can only exist behind a pointer. Any pointer to a DST consequently becomes a wide pointer consisting of the pointer and the information that "completes" them.

There are two major DSTs exposed by the language:

  • trait objects: dyn MyTrait
  • slices: [T], str, and others

A trait object represents some type that implements the traits it specifies. The exact original type is erased in favor of runtime reflection with a vtable containing all the information necessary to use the type. The information that completes a trait object pointer is the vtable pointer. The runtime size of the pointee can be dynamically requested from the vtable.

A slice is simply a view into some contiguous storage -- typically an array or Vec. The information that completes a slice pointer is just the number of elements it points to. The runtime size of the pointee is just the statically known size of an element multiplied by the number of elements.

Note: Currently the only properly supported way to create a custom DST is by making your type generic and performing an unsizing coercion

struct MySuperSliceable<T: ?Sized> {
    info: u32,
    data: T,
}

fn main() {
    let sized: MySuperSliceable<[u8; 8]> = MySuperSliceable {
        info: 17,
        data: [0; 8],
    };

    let dynamic: &MySuperSliceable<[u8]> = &sized;

    // prints: "17 [0, 0, 0, 0, 0, 0, 0, 0]"
    println!("{} {:?}", dynamic.info, &dynamic.data);
}

Zero Sized Types (ZSTs)

 struct Nothing; // No fields = no size

// All fields have no size = no size
struct LotsOfNothing {
    foo: Nothing,
    qux: (),      // empty tuple has no size
    baz: [u8; 0], // empty array has no size
}

Rust largely understands that any operation that produces or stores a ZST can be reduced to a no-op. First off, storing it doesn't even make sense -- it doesn't occupy any space. Also there's only one value of that type, so anything that loads it can just produce it from the aether -- which is also a no-op since it doesn't occupy any space.

As an example when you use a Map<Key, ()> as a Set<Key>, you're effectively getting a set data structure that's as efficient as a custom implementation, but without having to write the custom implementation yourself. The Rust type system, and the way it handles the () type, allows this optimization to be done automatically.

Safe Rust never needs to care about this, but Vec is very intensive on raw pointers and raw allocations, which are exactly the two things that care about zero-sized types. We need to be careful of two things:

The raw allocator API has undefined behavior if you pass in 0 for an allocation size. raw pointer offsets are no-ops for zero-sized types, which will break our C-style pointer iterator.

Almost every operation with a ZST is a no-op since ZSTs have exactly one value, and therefore no state needs to be considered to store or load them. This actually extends to ptr::read and ptr::write: they won't actually look at the pointer at all. As such we never need to change the pointer.

Exception Safety

By default, panics are unwinding. Unwinding is just a fancy way to say "make every single function immediately return".

We have to care for two reasons: destructors run when a function returns, and the unwind can be caught. In both cases, code can keep running after a panic, so we need to be very careful and make sure our unsafe collections are always in some kind of coherent state whenever a panic could occur, because each panic is an implicit early return!

"invariants" are a really useful concept for panic-safety! Basically, to an outside observer of our collection, there are certain property we're always upholding. For a LinkedList, one of those is that any node that is reachable in our list is still allocated and initialized.

Inside the implementation we have a bit more flexibility to break invariants temporarily as long as we make sure to repair them before anyone notices. This is actually one of the "killer apps" of Rust's ownership and borrowing system for collections: if an operation requires an &mut Self, then we are guaranteed that we have exclusive access to our collection and that it's fine for us to temporarily break invariants, safe in the knowledge that no one can sneakily mess with it.

Try to make sure anything that can panic is at the start or end of our methods, where our invariants should be known to hold.

Hash

If collections don't hash in lengths, they can accidentally make themselves vulnerable to prefix collisions. For instance, what distinguishes ["he", "llo"] from ["hello"]? If no one is hashing lengths or some other "separator", nothing! Making it too easy for hash collisions to accidentally or maliciously happen can result in serious sadness, so just do it!

Send + Sync

Like Copy, these traits have absolutely no code associated with them, and are just markers that your type has a particular property.

  • Send says that your type is safe to send to another thread.
  • Sync says your type is safe to share between threads(T is Sync if and only if &T is Send).

Send and Sync are also automatically derived traits. This means that, unlike every other trait, if a type is composed entirely of Send or Sync types, then it is Send or Sync. Almost all primitives are Send and Sync, and as a consequence pretty much all types you'll ever interact with are Send and Sync.

Major exceptions include:

  • raw pointers are neither Send nor Sync (because they have no safety guards).
  • UnsafeCell isn't Sync (and therefore Cell and RefCell aren't).
  • Rc isn't Send or Sync (because the refcount is shared and unsynchronized).

Note that in and of itself it is impossible to incorrectly derive Send and Sync. Only types that are ascribed special meaning by other unsafe code can possibly cause trouble by being incorrectly Send or Sync.

*const AND *mut explicitly opt out of Send and Sync to be safe.

Cursors

Cursors are exactly like the little blinking | you get when you're editing some text on a computer. It's a position in a sequence (the text) that you can move around (with the arrow keys), and whenever you type the edits happen at that point.

A Cursor is like an iterator, except that it can freely seek back-and-forth, and can safely mutate the list during iteration. This is because the lifetime of its yielded references are tied to its own lifetime, instead of just the underlying list. This means cursors cannot yield multiple elements at once.

Cursors always rest between two elements in the list, and index in a logically circular way. To accomadate this, there is a "ghost" non-element that yields None between the head and tail of the List.

When created, cursors start between the ghost and the front of the list. That is, next will yield the front of the list, and prev will yield None. Calling prev again will yield the tail.

Unsafe

The unsafe keyword has two uses: to declare the existence of contracts the compiler can't check, and to declare that a programmer has checked that these contracts have been upheld.

  • You can use unsafe to indicate the existence of unchecked contracts on functions and trait declarations. On functions, unsafe means that users of the function must check that function's documentation to ensure they are using it in a way that maintains the contracts the function requires. On trait declarations, unsafe means that implementors of the trait must check the trait documentation to ensure their implementation maintains the contracts the trait requires.
  • You can use unsafe on a block to declare that all unsafe actions performed within are verified to uphold the contracts of those operations.
  • You can use unsafe on a trait implementation to declare that the implementation upholds the trait's contract.

Copy + Clone

Rust does provide two facilities for providing C++'s copy-oriented semantics: Copy and Clone.

  • Clone is our moral equivalent of a copy constructor, but it's never implicitly invoked. You have to explicitly call clone on an element you want to be cloned.
  • Copy is a special case of Clone where the implementation is just "copy the bits". Copy types are implicitly cloned whenever they're moved, but because of the definition of Copy this just means not treating the old copy as uninitialized -- a no-op.

Data Races

Safe Rust guarantees an absence of data races, which are defined as:

  • two or more threads concurrently accessing a location of memory
  • one or more of them is a write
  • one or more of them is unsynchronized

Atomics

Rust pretty blatantly just inherits the memory model for atomics from C++20. The C++ memory model is fundamentally about trying to bridge the gap between the semantics we want, the optimizations compilers want, and the inconsistent chaos our hardware wants.

LLVM's GEP inbounds instruction

It's super important to an optimizing compiler to be able to reason about data dependencies and aliasing. As a simple example, consider the following fragment of code:

*x *= 7;
*y *= 3;

If the compiler can prove that x and y point to different locations in memory, the two operations can in theory be executed in parallel (by e.g. loading them into different registers and working on them independently). However the compiler can't do this in general because if x and y point to the same location in memory, the operations need to be done to the same value, and they can't just be merged afterwards.

When you use GEP inbounds, you are specifically telling LLVM that the offsets you're about to do are within the bounds of a single "allocated" entity. The ultimate payoff being that LLVM can assume that if two pointers are known to point to two disjoint objects, all the offsets of those pointers are also known to not alias (because you won't just end up in some random place in memory). LLVM is heavily optimized to work with GEP offsets, and inbounds offsets are the best of all, so it's important that we use them as much as possible.

LLVM's notion of an allocation is significantly more abstract than how we usually use it. Because LLVM needs to work with different languages' semantics and custom allocators, it can't really intimately understand allocation. Instead, the main idea behind allocation is "doesn't overlap with other stuff". That is, heap allocations, stack allocations, and globals don't randomly overlap. Yep, it's about alias analysis. As such, Rust can technically play a bit fast and loose with the notion of an allocation as long as it's consistent.

In LLVM, the GetElementPointer (GEP) instruction is used to compute the address of a sub-element of an aggregate data structure. The GEP instruction can take an inbounds keyword, which asserts that the resulting pointer will be used to access the same aggregate object that the original pointer was pointing to.

The assertion that the GEP is "inbounds" gives the LLVM optimizer more information about the intent of the programmer. If a GEP is marked as inbounds, and at runtime it computes an out-of-bounds address, then it has undefined behavior. Because undefined behavior means anything can happen (from the compiler's perspective), the optimizer assumes it will never occur. This provides LLVM with more freedom to perform aggressive optimizations.

So when the statement says "inbounds offsets are the best of all," it means that these offsets, which assert they don't stray outside of the original aggregate object, give the optimizer the most leeway to perform its optimizations. In other words, it's a way of telling the optimizer: "I promise this offset will stay within bounds; therefore, you can optimize under that assumption." This is different than just saying "valid offsets" because "inbounds" comes with an explicit promise to the compiler, which in turn gives it more aggressive optimization potential.

Async

What if we could write an abstraction like thread::spawn that let us write our tasks as individual units, and handle the scheduling and event handling for all tasks in a single place?

This idea is generally referred to as asynchronous programming.

Handling a request is a task. Reading or writing from/to the connection is also a task. A task is really just a piece of code that needs to be run, representing a value that will resolve sometime in the future.

trait Future {
    type Output;

    fn poll(&mut self) -> Option<Self::Output>;
}

Alright, imagine a scheduler that runs these new futures. After initiating the future, the scheduler should only try to call poll when the given future is able to make progress, like when epoll returns an event.

What we need is for the scheduler to pass each future an ID, so that the future can register any I/O resources with epoll using the same ID, instead of their file descriptors. That way the scheduler has a way of mapping epoll events to runnable futures.

What if we gave the futures themselves more control? Instead of just an ID, what if we give every future a way to wake itself up, notifying the scheduler that it's ready to make progress?

A simple callback should do the trick.

#[derive(Clone)]
struct Waker(Arc<dyn Fn() + Send + Sync>);

impl Waker {
    fn wake(&self) {
        (self.0)()
    }
}

trait Future {
    type Output;

    fn poll(&mut self, waker: Waker) -> Option<Self::Output>;
}

The scheduler can provide each future a callback, that when called, updates the scheduler's state for that future, marking it as ready. That way our scheduler is completely disconnected from epoll, or any other individual notification system.

Waker is thread-safe, allowing us to use background threads to wake futures.

Reactor

All I/O futures need a way to give their wakers to epoll. In fact, they need more than that, they need some sort of background service that drives epoll, so we can register wakers with it.

This service is commonly known as a reactor.

The reactor is a simple object holding the epoll descriptor and a map of tasks keyed by file descriptor, it holds the wakers.

thread_local! {
    static REACTOR: Reactor = Reactor::new();
}

struct Reactor {
    epoll: RawFd,
    tasks: RefCell<HashMap<RawFd, Waker>>,
}

impl Reactor {
    pub fn new() -> Reactor {
        Reactor {
            epoll: epoll::create(false).unwrap(),
            tasks: RefCell::new(HashMap::new()),
        }
    }
}

The reactor is a thread-local object, mutated through a RefCell. This is important because the reactor will be modified and accessed by different tasks throughout the program.

The reactor needs to support a couple simple operations.

Adding a task:

impl Reactor {
    // Add a file descriptor with read and write interest.
    //
    // `waker` will be called when the descriptor becomes ready.
    pub fn add(&self, fd: RawFd, waker: Waker) {
        let event = epoll::Event::new(Events::EPOLLIN | Events::EPOLLOUT, fd as u64);
        epoll::ctl(self.epoll, EPOLL_CTL_ADD, fd, event).unwrap();
        self.tasks.borrow_mut().insert(fd, waker);
    }
}

Removing a task:

impl Reactor {
    // Remove the given descriptor from epoll.
    //
    // It will no longer receive any notifications.
    pub fn remove(&self, fd: RawFd) {
        self.tasks.borrow_mut().remove(&fd);
    }
}

And driving epoll.

We'll be running the reactor in a loop, just like we were running epoll in a loop before. All the reactor has to do is wake the associated future for every event. Remember, this will trigger the scheduler to run the future later, and continue the cycle.

impl Reactor {
    // Drive tasks forward, blocking forever until an event arrives.
    pub fn wait(&self) {
       let mut events = [Event::new(Events::empty(), 0); 1024];
       let timeout = -1; // forever
       let num_events = epoll::wait(self.epoll, timeout, &mut events).unwrap();

       for event in &events[..num_events] {
           let fd = event.data as i32;

           // wake the task
           if let Some(waker) = self.tasks.borrow().get(&fd) {
               waker.wake();
           }
       }
    }
}

Scheduling Tasks

Now we need a scheduler to run our tasks. One thing to keep in mind is that the scheduler must be global and thread-safe because wakers are Send, meaning wake may be called concurrently from other threads.

We want to be able to spawn tasks onto our scheduler, just like we could spawn threads.

To start, we'll only store runnable tasks in a queue:

use std::collections::VecDeque;

type SharedTask = Arc<Mutex<dyn Future<Output = ()> + Send>>;

#[derive(Default)]
struct Scheduler {
    runnable: Mutex<VecDeque<SharedTask>>,
}

When a task is spawned, it's pushed onto the back of the queue:

impl Scheduler {
    pub fn spawn(&self, task: impl Future<Output = ()> + Send + 'static) {
        self.runnable.lock().unwrap().push_back(Arc::new(Mutex::new(task)));
    }
}

The scheduler pops tasks off the queue one by one and calls poll:

impl Scheduler {
  pub fn run(&self) {
    loop {
        loop {
            // pop a runnable task off the queue
            let Some(task) = self.runnable.lock().unwrap().pop_front() else { break };
            let t2 = task.clone();

            // create a waker that pushes the task back on
            let wake = Arc::new(move || {
                SCHEDULER.runnable.lock().unwrap().push_back(t2.clone());
            });

            // poll the task
            task.lock().unwrap().poll(Waker(wake));
        }

        // if there are no runnable tasks, block on epoll until something becomes ready
        REACTOR.with(|reactor| reactor.wait());
    }
}

when a task is woken, it's simply pushed back onto the queue. Once we've dealt with all runnable tasks, we need to block on the reactor until another task becomes ready. Once a task becomes ready, the reactor will call wake and push the future back onto our queue for us to run it again, continuing the cycle.

Together, the scheduler and reactor form a runtime for our futures. The scheduler keeps tracks of which tasks are runnable and polls them, and the reactor marks tasks as runnable when epoll tells us something they are interested in becomes ready.

trait Future {
    type Output;
    fn poll(&mut self, waker: Waker) -> Option<Self::Output>;
}

static SCHEDULER: Scheduler = Scheduler { /* ... */ };

// The scheduler.
#[derive(Default)]
struct Scheduler {
    runnable: Mutex<VecDeque<SharedTask>>,
}

type SharedTask = Arc<Mutex<dyn Future<Output = ()> + Send>>;

impl Scheduler {
    pub fn spawn(&self, task: impl Future<Output = ()> + Send + 'static);
    pub fn run(&self);
}

thread_local! {
    static REACTOR: Reactor = Reactor::new();
}

// The reactor.
struct Reactor {
    epoll: RawFd,
    tasks: RefCell<HashMap<RawFd, Waker>>,
}

impl Reactor {
    pub fn new() -> Reactor;
    pub fn add(&self, fd: RawFd, waker: Waker);
    pub fn remove(&self, fd: RawFd);
    pub fn wait(&self);
}

Async Server

It's time to actually write the tasks that our scheduler is going to run. We'll use enums as state machines to manage the different states of our program. Each task will manage it's own state independent from other tasks.

To start everything off, we need to write the main task. This task will be pushed on and off the scheduler's run queue for the entirety of our program.

fn main() {
    SCHEDULER.spawn(Main::Start);
    SCHEDULER.run();
}

enum Main {
    Start,
}

impl Future for Main {
    type Output = ();

    fn poll(&mut self, waker: Waker) -> Option<()> {
        // ...
    }
}

Our task starts off by creating the TCP listener and setting it to non-blocking mode.

// impl Future for Main {
fn poll(&mut self, waker: Waker) -> Option<()> {
    if let Main::Start = self {
        let listener = TcpListener::bind("localhost:3000").unwrap();
        listener.set_nonblocking(true).unwrap();
    }

    None
}

Now we need to register the listener with epoll. We can do that using our new Reactor:

// impl Future for Main {
fn poll(&mut self, waker: Waker) -> Option<()> {
    if let Main::Start = self {
        let listener = TcpListener::bind("localhost:3000").unwrap();
        listener.set_nonblocking(true).unwrap();

        REACTOR.with(|reactor| {
            reactor.add(listener.as_raw_fd(), waker);
        });
    }

    None
}

Notice how we give the reactor the waker provided to us by the scheduler. When a connection comes, epoll will return an event and the Reactor will wake the task, causing the scheduler to push our task back onto the queue and poll us again. The waker keeps everything connected.

We now need a second state for the next time we're run, Accept. The main task will stay in the Accept state for the rest of the program, attempting to accept new connections:

enum Main {
    Start,
   Accept { listener: TcpListener }, // πŸ‘ˆ
}

fn poll(&mut self, waker: Waker) -> Option<()> {
    if let Main::Start = self {
        // ...
        *self = Main::Accept { listener };
    }

    if let Main::Accept { listener } = self {
        match listener.accept() {
            Ok((connection, _)) => {
                // ...
            }
            Err(e) if e.kind() == io::ErrorKind::WouldBlock => {
                return None;
            }
            Err(e) => panic!("{e}"),
        }
    }

    None
}

If the listener is not ready, we can simply return None. Remember, this tells the scheduler the future is not yet ready, and it will be rescheduled once the reactor wakes us.

If we do accept a new connection, we need to again set it to non-blocking mode:

fn poll(&mut self, waker: Waker) -> Option<()> {
    if let Main::Start = self {
        // ...
    }

    if let Main::Accept { listener } = self {
        match listener.accept() {
            Ok((connection, _)) => {
                connection.set_nonblocking(true).unwrap(); // πŸ‘ˆ 
            }
            Err(e) if e.kind() == io::ErrorKind::WouldBlock => return None,
            Err(e) => panic!("{e}"),
        }
    }

    None
}

And now we need to spawn a new task to handle the request:

fn poll(&mut self, waker: Waker) -> Option<()> {
    if let Main::Start = self {
        // ...
    }

    if let Main::Accept { listener } = self {
        match listener.accept() {
            Ok((connection, _)) => {
                connection.set_nonblocking(true).unwrap();

                SCHEDULER.spawn(Handler { // πŸ‘ˆ
                    connection,
                    state: HandlerState::Start,
                });
            }
            Err(e) if e.kind() == io::ErrorKind::WouldBlock => return None,
            Err(e) => panic!("{e}"),
        }
    }
}

The handler task looks similar to before, but now it manages the connection itself along with its current state:

struct Handler {
    connection: TcpStream,
    state: HandlerState,
}

enum HandlerState {
    Start,
    Read {
        request: [u8; 1024],
        read: usize,
    },
    Write {
        response: &'static [u8],
        written: usize,
    },
    Flush,
}

The handler task starts by registering its connection with the reactor to be notified when the connection is ready to read/write to. Again, it passes the waker so that the scheduler knows when to run it again:

impl Future for Handler {
    type Output = ();

    fn poll(&mut self, waker: Waker) -> Option<Self::Output> {
        if let HandlerState::Start = self.state {
            // start by registering our connection for notifications
            REACTOR.with(|reactor| {
                reactor.add(self.connection.as_raw_fd(), waker);
            });

            self.state = HandlerState::Read {
                request: [0u8; 1024],
                read: 0,
            };
        }

        // ...
    }
}

The Read, Write, and Flush states work exactly the same as before, but now when we encounter WouldBlock, we can simply return None, knowing that we'll be run again once our future is woken:

// impl Future for Handler {
fn poll(&mut self, waker: Waker) -> Option<Self::Output> {
    if let HandlerState::Start = self.state {
        // ...
    }

    // read the request
    if let HandlerState::Read { request, read } = &mut self.state {
        loop {
            match self.connection.read(&mut request[*read..]) {
                Ok(0) => {
                    println!("client disconnected unexpectedly");
                    return Some(());
                }
                Ok(n) => *read += n,
                Err(e) if e.kind() == io::ErrorKind::WouldBlock => return None, // πŸ‘ˆ
                Err(e) => panic!("{e}"),
            }

            // did we reach the end of the request?
            let read = *read;
            if read >= 4 && &request[read - 4..read] == b"\r\n\r\n" {
                break;
            }
        }

        // we're done, print the request
        let request = String::from_utf8_lossy(&request[..*read]);
        println!("{}", request);

        // and move into the write state
        let response = /* ... */;

        self.state = HandlerState::Write {
            response: response.as_bytes(),
            written: 0,
        };
    }

    // write the response
    if let HandlerState::Write { response, written } = &mut self.state {
        // self.connection.write...

        // successfully wrote the response, try flushing next
        self.state = HandlerState::Flush;
    }

    // flush the response
    if let HandlerState::Flush = self.state {
        match self.connection.flush() {
            Ok(_) => {}
            Err(e) if e.kind() == io::ErrorKind::WouldBlock => return None, // πŸ‘ˆ
            Err(e) => panic!("{e}"),
        }
    }
}

At the end of the task's lifecycle, it removes its connection from the reactor and returns Some. It will never be run again after that point:

fn poll(&mut self, waker: Waker) -> Option<Self::Output> {
    // ...

    REACTOR.with(|reactor| {
        reactor.remove(self.connection.as_raw_fd());
    });

    Some(())
}

Each future has a number of states. At each state, some code is run. If that code completes successfully, we transition into the next state. If it encounters WouldBlock, we return None, indicating that the future is not yet ready.

Real World

Now that we've thoroughly explored concurrency and async ourselves, let's see how it works in the real world.

The standard library defines a trait like Future, which looks remarkably similar to the trait we designed:

trait Future {
    type Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

However, there are a few noticeable differences.

The first is that instead of a Waker argument, poll takes a &mut Context. It turns out this isn't much of a difference at all, because Context is simply a wrapper around a Waker:

impl Context<'_> {
    pub fn from_waker(waker: &'a Waker) -> Context<'a>  { /* ... */ }
    pub fn waker(&self) -> &'a Waker  { /* ... */ }
}

And Waker, along with a few other utility methods, has the familiar wake method:

impl Waker {
    pub fn wake(self) { /* ... */ }
}

Constructing a Waker is a little more complicated, but it's essentially a manual trait object just like the Arc<dyn Fn()> we used for our version. All of that happens through the RawWaker type.

The second difference is that instead of returning an Option, poll returns a new type called Poll, which is really just a rebranding of Option:

pub enum Poll<T> {
    Ready(T),
    Pending,
}

The final difference is a little more complicated.

Pin

Instead of poll taking a mutable reference to self, it takes a pinned mutable reference to self β€” Pin<&mut Self>:

#[derive(Copy, Clone)]
pub struct Pin<P> {
    pointer: P,
}

It turns out, what makes Pin special is how you create one:

impl<P: Deref> Pin<P> {
    pub fn new(pointer: P) -> Pin<P> where P::Target: Unpin { /* ... */ }
    pub unsafe fn new_unchecked(pointer: P) -> Pin<P>  { /* ... */ }
}

impl<P: Deref> Deref for Pin<P> {
    type Target = P::Target;
}

impl<P: Deref> DerefMut for Pin<P>
where
    P::Target: Unpin
{
    type Target = P::Target;
}

So you can only create a Pin<&mut T> safely if T is Unpin:

pub auto trait Unpin {}

/// A marker type which does not implement `Unpin`.
pub struct PhantomPinned;
impl !Unpin for PhantomPinned {}

Unpin seems to be automatically implemented for all types except PhantomPinned. So creating a Pin is safe, except for types that contain PhantomPinned and Pin just dereferences to T normally.

The problem is that you can't just go handing out a self-referential struct in general because moving a self-referential struct breaks its internal references and is unsound.

This is where Pin comes in. You can only create a Pin<&mut T> if you guarantee that the T will stay in a stable location until it is dropped, meaning that any self-references will remain valid.

For most types, Pin doesn't mean anything, which is why Unpin exists. Unpin essentially tells Pin that a type is not self-referential, so pinning it is completely safe and always valid.

Pin will even hand out mutable references to Unpin types and let you use mem::swap or mem::replace to move them around.

Because you can't safely create a self-referential struct, Unpin is the default and implemented by types automatically.

If you did want to create a self-referential future though, you can use the PhantomPinned marker struct to make it !Unpin. Pinning a !Unpin type requires unsafe, so because poll requires Pin<&mut Self>, it cannot be called safely on a self-referential future.

You can move around the future all you want before pinning it because the self-references are only created after you first call poll. Once you do pin it though, you must uphold the Pin safety contract.

There are a couple safe ways of creating a pin though, even for !Unpin types.

The first way is with Box::pin:

let mut future1: Pin<Box<SelfReferential>> = Box::pin(SelfReferential::new());
future1.as_mut().poll(/* ... */);

let mut moved = future1;
moved.as_mut().poll(/* ... */);

At first glance this may seem unsound, but remember, Box is an allocation. Once the future is allocated it has a stable location on the heap, so you can move around the Box pointer all you want, the internal references will remain stable.

The second way you can safely create a pin is with the pin! macro:

use std::pin::pin;

let mut future1: Pin<&mut SelfReferential> = pin!(SelfReferential::new());
future1.as_mut().poll(/* ... */);

With pin!, you can safely pin a struct without even allocating it! The trick is that pin! takes ownership of the future, making it impossible to access except through the Pin<&mut T>, which remember, will never give you a mutable reference if T isn't Unpin. The T is completely hidden and thus safe from being tampered with.

Async/Await

Instead of using poll_fn to create futures, you can attach the async keyword to functions:

async fn foo() -> usize {
    1
}

An async function is really just a function that returns an async block:

fn foo() -> impl Future<Output = usize> {
    async { 1 }
}

Which is really just a function that returns a poll_fn future:

fn foo() -> impl Future<Output = usize> {
    poll_fn(|| Poll::Ready(1))
}

The magic comes with the await keyword. await waits for the completion of another future, propagating Poll::Pending until the future is resolved.

With async functions, you can hold onto local state across await points:

async fn foo() {
    let x = vec![1, 2, 3];
    bar(&x).await;
    baz(&x).await;
    println!("{x:?}");
}

Under the hood the compiler has to generate a self-referential future to give bar and baz access to the state:

struct FooFuture {
    x: Vec<i32>, // (X)
    state: FooFutureState,
}

enum FooFutureState {
    Bar(BarFuture),
    Baz(BazFuture),
}

struct BarFuture { x: *mut Vec<i32> /* pointer to (X)! */ }
struct BazFuture { x: *mut Vec<i32> /* pointer to (X)! */ }

The compiler takes care of all the unsafe code involved in this, allowing us to work with local state just like we would in a regular function. For this reason, the futures generated by async blocks or functions are !Unpin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment