Skip to content

Instantly share code, notes, and snippets.

@demurgos
Last active February 29, 2024 12:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save demurgos/a6587bf316e80daf275e5ac54884d178 to your computer and use it in GitHub Desktop.
Save demurgos/a6587bf316e80daf275e5ac54884d178 to your computer and use it in GitHub Desktop.
self-ref rust

Rust self-references

The goal of this article is to explain the Rust Pin higher order type by exploring how Rust handles self-referential structs. Rust works best when data is modelled without cycles and there is no way to express self-referential structs directly, so creating and manipulating these kinds of struct can be a bit tricky.

What is a self-referential structs?

A self-referential struct is a struct containing fields with a lifetime corresponding to the struct itself; in other words a struct with references pointing to other fields inside the struct. Despite sounding exotic, use-cases for this kind of struct are relatively common and can be found in parsing, dependency injection, async code, generators, etc.

Self-referential parser result

This section describes the use case where a parser returns a struct holding both the input data and its parsed representation.

Simple result

Let's say we want to parse a URL such as https://example.com/foo/bar to extract the scheme, hostname and path as string slices. We can represent the result using the following struct, where 'input represents the lifetime of the whole input URL:

struct ParsedUrl<'input> {
  scheme: &'input str,
  hostname: &'input str,
  path: &'input str,
}

fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> {
  let url = std::str::from_utf8(url).expect("input must be utf8");
  let (scheme, tail) = url.split_once("://").expect("input must have scheme");
  let path_start = tail.find('/').expect("input must have path");
  let hostname = &tail[..path_start];
  let path = &tail[path_start..];
  ParsedUrl { scheme, hostname, path }
}

fn main() {
  let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
  let parsed = parse_url(&input);
  println!("the hostname is {}", parsed.hostname); // the hostname is example.com
}

Playground

So far this is regular Rust. Each line of main represents an important step:

  1. Create an owned value
  2. Create a value with references to the owned value
  3. Use the references

Now, what if we wanted to split main into sub-functions? Splitting after the first line is easy and the borrow-checker can perform all it's lifetime analysis:

struct ParsedUrl<'input> { ... }
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }

fn main() {
  let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
  parse_and_print(&input);
}

fn parse_and_print(input: &[u8]) {
  let parsed = parse_url(&input);
  println!("the hostname is {}", parsed.hostname); // the hostname is example.com
}

Playground

Self-referential result and issues

However, what if we want to split it so the first lines are together? We would need to return the allocated input and its parsed view. The following code does not work, but this is what we would like:

struct ParsedUrl<'input> { ... }
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }

struct Url {
  input: Vec<u8>,
  parsed: ParsedUrl<'Self::input>, // self-reference! (this syntax does not exist)
}

fn main() {
  let url = allocate_and_parse();
  println!("the hostname is {}", url.parsed.hostname); // the hostname is example.com
}

fn allocate_and_parse() -> Url {
  let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
  let parsed = parse_url(&input);
  Url { input, parsed }
}

The Url struct is self-referential. The field parsed has a lifetime that can't be expressed in current Rust, but that corresponds to the input field. You can't write such a lifetime today. There's also the issue that input is allocated inside allocate_and_parse, and parsed then creates its references pointing into the stack of allocate_and_parse. When input is then moved inside the struct and this function returns, the references of parsed become invalid.

Existing workarounds

How does current Rust code handle those situations?

  1. Use copies instead of references

    // no more lifetime: `&'input str` replaced with `String` (you can also use your favorite
    // short string lib)
    struct ParsedUrl {
      scheme: String,
      hostname: String,
      path: String,
    }
    struct Url {
      input: Vec<u8>,
      parsed: UrlView
    }

    It's a radical solution that requires extra allocations and copies, but it entirely eliminates the problem and allows to stop caring about lifetimes. This is actually the solution used by rowan, the parsing library used by Rust Analyzer.

  2. Use handles/indexes instead of references.

    // store offsets relative to the input start
    struct ParsedUrl {
      scheme: Range<usize>,
      hostname: Range<usize>,
      path: Range<usize>,
    }
    struct Url {
      input: Vec<u8>,
      parsed: ParsedUrl
    }
    impl Url {
      fn hostname(&self) -> &str {
        std::str::from_utf8(&self.input[self.parsed.hostname]).expect("the `hostname` getter always succeeds")
      }
    }

    Delay the creation of references, and use some handle in the parsed struct. The handle is always valid even when the input is moved. In this case we use offsets relative from the input start, but in general you can use any id. This is used in most graph libs, ECS libs, cross-process code, etc. The downside is that you need a getter performing the final resolution.

  3. Inversion of Control

    struct ParsedUrl<'input> { ... }
    fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }
    
    struct Url<'a> {
      input: &'a [u8],
      parsed: ParsedUrl<'a>,
    }
    
    fn main() {
      allocate_and_parse(|url| {
        println!("the hostname is {}", url.parsed.hostname); // the hostname is example.com
      });
    }
    
    fn allocate_and_parse<F, R>(f: F) -> R
    where
      F: FnOnce(Url) -> R
    {
      let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
      let parsed = parse_url(&input);
      f(Url { input: &input, parsed })
    }

    Playground

    Very powerful for scoped tasks, but requires the caller to pass a closure or handler, which is pretty inconvenient (breaks NLL, control flow, etc.)

Existing solutions

  1. Helper lib for self-referential structs (e.g. ouroboros, aliasable)

    use ouroboros::self_referencing;
    
    struct ParsedUrl<'input> { ... }
    fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }
    
    #[self_referencing]
    struct Url {
      input: Vec<u8>,
      #[borrows(input)]
      #[covariant]
      parsed: ParsedUrl<'this>,
    }
    
    pub fn main() {
      let url = allocate_and_parse();
      println!("the hostname is {}", url.borrow_parsed().hostname);
    }
    
    fn allocate_and_parse() -> Url {
      let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
      UrlBuilder {
        input,
        parsed_builder: |input: &Vec<u8>| parse_url(input),
      }.build()
    }

    Uses unsafe internally to move "head" values (owning) to a "static" box. This relies on stable_deref_trait: even if Url is moved, the inner boxed input does not move.

  2. Manual unsafe impl of the previous one

  3. Pinning

struct Url {
  pub input: Vec<u8>,
  parsed: Option<ParsedUrl<'static>>,
}

impl Url {
  pub fn get_parsed(self: Pin<Self>) -> String {

  }
}

Pin

This brings us finally to pinning.

use core::pin::{pin, Pin};
use core::marker::PhantomPinned;

#[allow(unused)]
struct ParsedUrl<'input> { ... }

fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }

struct Url {
  pub input: Vec<u8>,
  parsed: Option<ParsedUrl<'static>>, // fake `'static`
  _pin: PhantomPinned,
}

impl Url {
  pub fn get_parsed<'a>(self: Pin<&'a mut Self>) -> &'a ParsedUrl<'a> {
    let pinned: &'a mut Self = unsafe { Pin::into_inner_unchecked(self) };
    let pinned: &'static mut Self = unsafe { change_lifetime_mut(pinned) };
    pinned.parsed.get_or_insert_with(|| parse_url(&pinned.input))
  }
}

pub unsafe fn change_lifetime_mut<'old, 'new: 'old, T: 'new>(data: &'old mut T) -> &'new mut T {
  &mut *(data as *mut _)
}

pub fn main() {
  let mut url = allocate_and_parse();
  let url = pin!(url);
  println!("the hostname is {}", url.get_parsed().hostname);
}

fn allocate_and_parse() -> Url {
  let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
  Url { input, parsed: None, _pin: PhantomPinned }
}

Callee box pin

use core::pin::{pin, Pin};
use core::marker::PhantomPinned;

#[allow(unused)]
struct ParsedUrl<'input> { ... }

fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }

struct Url {
  pub input: Vec<u8>,
  parsed: Option<ParsedUrl<'static>>, // fake `'static`
  _pin: PhantomPinned,
}

impl Url {
  pub fn get_parsed<'a>(self: Pin<&'a mut Self>) -> &'a ParsedUrl<'a> {
    let pinned: &'a mut Self = unsafe { Pin::into_inner_unchecked(self) };
    let pinned: &'static mut Self = unsafe { change_lifetime_mut(pinned) };
    pinned.parsed.get_or_insert_with(|| parse_url(&pinned.input))
  }
}

pub unsafe fn change_lifetime_mut<'old, 'new: 'old, T: 'new>(data: &'old mut T) -> &'new mut T {
  &mut *(data as *mut _)
}

pub fn main() {
  let mut url = allocate_and_parse();
  let url = pin!(url);
  println!("the hostname is {}", url.get_parsed().hostname);
}

fn allocate_and_parse() -> Pin<Box<Url>> {
  let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
  Url { input, parsed: None, _pin: PhantomPinned }
}
use ouroboros::self_referencing;
#[allow(unused)]
struct ParsedUrl<'input> {
scheme: &'input str,
hostname: &'input str,
path: &'input str,
}
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> {
let url = std::str::from_utf8(url).expect("input must be utf8");
let (scheme, tail) = url.split_once("://").expect("input must have scheme");
let path_start = tail.find('/').expect("input must have path");
let hostname = &tail[..path_start];
let path = &tail[path_start..];
ParsedUrl { scheme, hostname, path }
}
#[self_referencing]
struct Url {
input: Vec<u8>,
#[borrows(input)]
#[covariant]
parsed: ParsedUrl<'this>,
}
pub fn main() {
let url = allocate_and_parse();
println!("the hostname is {}", url.borrow_parsed().hostname);
}
fn allocate_and_parse() -> Url {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
UrlBuilder {
input,
parsed_builder: |input: &Vec<u8>| parse_url(input),
}.build()
}
use std::mem::MaybeUninit;
use std::ptr::NonNull;
use ouroboros::self_referencing;
#[allow(unused)]
struct ParsedUrl<'input> {
scheme: &'input str,
hostname: &'input str,
path: &'input str,
}
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> {
let url = std::str::from_utf8(url).expect("input must be utf8");
let (scheme, tail) = url.split_once("://").expect("input must have scheme");
let path_start = tail.find('/').expect("input must have path");
let hostname = &tail[..path_start];
let path = &tail[path_start..];
ParsedUrl { scheme, hostname, path }
}
struct Url {
input: Vec<u8>,
parsed: MaybeUninit<ParsedUrl<'static>>, // fake static
}
impl Url {
pub fn borrow_int<'this>(&'this self) -> &'this Vec<u8> {
&self.input
}
pub fn borrow_parsed<'this>(&'this self) -> &'this ParsedUrl<'this> {
// safety: `Url` only exposes read-only access to `input`, and moving `Url` does not
// move the vec data referenced by `input`.
unsafe {self.parsed.assume_init_ref()}
}
}
pub fn main() {
let url = allocate_and_parse();
println!("the hostname is {}", url.borrow_parsed().hostname);
}
fn allocate_and_parse() -> Url {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
let mut result = Url {
input,
parsed: MaybeUninit::uninit(),
};
// safety: access to `input` is only possible through `borrow_parsed` which ensures that `input`
// is valid
result.parsed = MaybeUninit::new(parse_url(unsafe { change_lifetime(&result.input) } ));
result
}
pub unsafe fn change_lifetime<'old, 'new: 'old, T: ?Sized + 'new>(data: &'old T) -> &'new T {
&*(data as *const _)
}
use core::pin::{pin, Pin};
use core::marker::PhantomPinned;
#[allow(unused)]
struct ParsedUrl<'input> {
scheme: &'input str,
hostname: &'input str,
path: &'input str,
}
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> {
let url = std::str::from_utf8(url).expect("input must be utf8");
let (scheme, tail) = url.split_once("://").expect("input must have scheme");
let path_start = tail.find('/').expect("input must have path");
let hostname = &tail[..path_start];
let path = &tail[path_start..];
ParsedUrl { scheme, hostname, path }
}
struct Url {
pub input: Vec<u8>,
parsed: Option<ParsedUrl<'static>>, // fake `'static`
_pin: PhantomPinned,
}
impl Url {
pub fn get_parsed<'a>(self: Pin<&'a mut Self>) -> &'a ParsedUrl<'a> {
let pinned: &'a mut Self = unsafe { Pin::into_inner_unchecked(self) };
let pinned: &'static mut Self = unsafe { change_lifetime_mut(pinned) };
pinned.parsed.get_or_insert_with(|| parse_url(&pinned.input))
}
}
pub unsafe fn change_lifetime_mut<'old, 'new: 'old, T: 'new>(data: &'old mut T) -> &'new mut T {
&mut *(data as *mut _)
}
pub fn main() {
let mut url = allocate_and_parse();
let url = pin!(url);
println!("the hostname is {}", url.get_parsed().hostname);
}
fn allocate_and_parse() -> Url {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
Url { input, parsed: None, _pin: PhantomPinned }
}
use ouroboros::self_referencing;
use typed_arena::Arena;
#[allow(unused)]
struct ParsedUrl<'input> {
scheme: &'input str,
hostname: &'input str,
path: &'input str,
}
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> {
let url = std::str::from_utf8(url).expect("input must be utf8");
let (scheme, tail) = url.split_once("://").expect("input must have scheme");
let path_start = tail.find('/').expect("input must have path");
let hostname = &tail[..path_start];
let path = &tail[path_start..];
ParsedUrl { scheme, hostname, path }
}
struct Url<'arena> {
input: &'arena Vec<u8>,
parsed: ParsedUrl<'arena>,
}
pub fn main() {
let arena: Arena<Vec<u8>> = Arena::new();
let url = allocate_and_parse(&arena);
println!("the hostname is {}", url.parsed.hostname);
}
fn allocate_and_parse(arena: &Arena<Vec<u8>>) -> Url {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static
let input = arena.alloc(input);
Url {
input,
parsed: parse_url(input),
}
}
use core::marker::PhantomPinned;
use core::mem::MaybeUninit;
use moveit::{new};
use moveit::new::New;
use moveit::moveit;
#[allow(unused)]
struct ParsedUrl<'input> {
scheme: &'input str,
hostname: &'input str,
path: &'input str,
}
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> {
let url = std::str::from_utf8(url).expect("input must be utf8");
let (scheme, tail) = url.split_once("://").expect("input must have scheme");
let path_start = tail.find('/').expect("input must have path");
let hostname = &tail[..path_start];
let path = &tail[path_start..];
ParsedUrl { scheme, hostname, path }
}
struct Url {
pub input: Vec<u8>,
parsed: MaybeUninit<ParsedUrl<'static>>, // fake `'static`
_pin: PhantomPinned,
}
impl Url {
pub fn get_parsed<'a>(&'a self) -> &'a ParsedUrl<'a> {
unsafe {change_lifetime(self.parsed.assume_init_ref())}
}
}
pub fn main() {
moveit! {
let url = allocate_and_parse();
}
println!("the hostname is {}", url.get_parsed().hostname);
}
fn allocate_and_parse() -> impl New<Output = Url> {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
new::of(Url {
input,
parsed: MaybeUninit::uninit(),
_pin: PhantomPinned
}).with(|this| unsafe {
let this = this.get_unchecked_mut();
this.parsed = MaybeUninit::new(parse_url(unsafe { change_lifetime(&this.input) } ));
})
}
pub unsafe fn change_lifetime<'old, 'new: 'old, T: ?Sized + 'new>(data: &'old T) -> &'new T {
&*(data as *const _)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment