The goal of this article is to explain the Rust Pin
higher order type by exploring how Rust
handles self-referential structs. Rust works best when data is modelled without cycles and there
is no way to express self-referential structs directly, so creating and manipulating these kinds of
struct can be a bit tricky.
A self-referential struct is a struct containing fields with a lifetime corresponding to the struct itself; in other words a struct with references pointing to other fields inside the struct. Despite sounding exotic, use-cases for this kind of struct are relatively common and can be found in parsing, dependency injection, async code, generators, etc.
This section describes the use case where a parser returns a struct holding both the input data and its parsed representation.
Let's say we want to parse a URL such as https://example.com/foo/bar
to extract the scheme,
hostname and path as string slices. We can represent the result using the following struct, where
'input
represents the lifetime of the whole input URL:
struct ParsedUrl<'input> {
scheme: &'input str,
hostname: &'input str,
path: &'input str,
}
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> {
let url = std::str::from_utf8(url).expect("input must be utf8");
let (scheme, tail) = url.split_once("://").expect("input must have scheme");
let path_start = tail.find('/').expect("input must have path");
let hostname = &tail[..path_start];
let path = &tail[path_start..];
ParsedUrl { scheme, hostname, path }
}
fn main() {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
let parsed = parse_url(&input);
println!("the hostname is {}", parsed.hostname); // the hostname is example.com
}
So far this is regular Rust. Each line of main
represents an important step:
- Create an owned value
- Create a value with references to the owned value
- Use the references
Now, what if we wanted to split main
into sub-functions? Splitting after the first line is
easy and the borrow-checker can perform all it's lifetime analysis:
struct ParsedUrl<'input> { ... }
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }
fn main() {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
parse_and_print(&input);
}
fn parse_and_print(input: &[u8]) {
let parsed = parse_url(&input);
println!("the hostname is {}", parsed.hostname); // the hostname is example.com
}
However, what if we want to split it so the first lines are together? We would need to return the allocated input and its parsed view. The following code does not work, but this is what we would like:
struct ParsedUrl<'input> { ... }
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }
struct Url {
input: Vec<u8>,
parsed: ParsedUrl<'Self::input>, // self-reference! (this syntax does not exist)
}
fn main() {
let url = allocate_and_parse();
println!("the hostname is {}", url.parsed.hostname); // the hostname is example.com
}
fn allocate_and_parse() -> Url {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
let parsed = parse_url(&input);
Url { input, parsed }
}
The Url
struct is self-referential. The field parsed
has a lifetime that can't be expressed in
current Rust, but that corresponds to the input
field. You can't write such a lifetime today.
There's also the issue that input
is allocated inside allocate_and_parse
, and parsed
then
creates its references pointing into the stack of allocate_and_parse
. When input
is then moved
inside the struct and this function returns, the references of parsed
become invalid.
How does current Rust code handle those situations?
-
Use copies instead of references
// no more lifetime: `&'input str` replaced with `String` (you can also use your favorite // short string lib) struct ParsedUrl { scheme: String, hostname: String, path: String, } struct Url { input: Vec<u8>, parsed: UrlView }
It's a radical solution that requires extra allocations and copies, but it entirely eliminates the problem and allows to stop caring about lifetimes. This is actually the solution used by
rowan
, the parsing library used by Rust Analyzer. -
Use handles/indexes instead of references.
// store offsets relative to the input start struct ParsedUrl { scheme: Range<usize>, hostname: Range<usize>, path: Range<usize>, } struct Url { input: Vec<u8>, parsed: ParsedUrl } impl Url { fn hostname(&self) -> &str { std::str::from_utf8(&self.input[self.parsed.hostname]).expect("the `hostname` getter always succeeds") } }
Delay the creation of references, and use some handle in the parsed struct. The handle is always valid even when the input is moved. In this case we use offsets relative from the input start, but in general you can use any id. This is used in most graph libs, ECS libs, cross-process code, etc. The downside is that you need a getter performing the final resolution.
-
Inversion of Control
struct ParsedUrl<'input> { ... } fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... } struct Url<'a> { input: &'a [u8], parsed: ParsedUrl<'a>, } fn main() { allocate_and_parse(|url| { println!("the hostname is {}", url.parsed.hostname); // the hostname is example.com }); } fn allocate_and_parse<F, R>(f: F) -> R where F: FnOnce(Url) -> R { let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.) let parsed = parse_url(&input); f(Url { input: &input, parsed }) }
Very powerful for scoped tasks, but requires the caller to pass a closure or handler, which is pretty inconvenient (breaks NLL, control flow, etc.)
-
Helper lib for self-referential structs (e.g.
ouroboros
,aliasable
)use ouroboros::self_referencing; struct ParsedUrl<'input> { ... } fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... } #[self_referencing] struct Url { input: Vec<u8>, #[borrows(input)] #[covariant] parsed: ParsedUrl<'this>, } pub fn main() { let url = allocate_and_parse(); println!("the hostname is {}", url.borrow_parsed().hostname); } fn allocate_and_parse() -> Url { let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.) UrlBuilder { input, parsed_builder: |input: &Vec<u8>| parse_url(input), }.build() }
Uses unsafe internally to move "head" values (owning) to a "static" box. This relies on
stable_deref_trait
: even ifUrl
is moved, the inner boxedinput
does not move. -
Manual unsafe impl of the previous one
-
Pinning
struct Url {
pub input: Vec<u8>,
parsed: Option<ParsedUrl<'static>>,
}
impl Url {
pub fn get_parsed(self: Pin<Self>) -> String {
}
}
This brings us finally to pinning.
use core::pin::{pin, Pin};
use core::marker::PhantomPinned;
#[allow(unused)]
struct ParsedUrl<'input> { ... }
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }
struct Url {
pub input: Vec<u8>,
parsed: Option<ParsedUrl<'static>>, // fake `'static`
_pin: PhantomPinned,
}
impl Url {
pub fn get_parsed<'a>(self: Pin<&'a mut Self>) -> &'a ParsedUrl<'a> {
let pinned: &'a mut Self = unsafe { Pin::into_inner_unchecked(self) };
let pinned: &'static mut Self = unsafe { change_lifetime_mut(pinned) };
pinned.parsed.get_or_insert_with(|| parse_url(&pinned.input))
}
}
pub unsafe fn change_lifetime_mut<'old, 'new: 'old, T: 'new>(data: &'old mut T) -> &'new mut T {
&mut *(data as *mut _)
}
pub fn main() {
let mut url = allocate_and_parse();
let url = pin!(url);
println!("the hostname is {}", url.get_parsed().hostname);
}
fn allocate_and_parse() -> Url {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
Url { input, parsed: None, _pin: PhantomPinned }
}
use core::pin::{pin, Pin};
use core::marker::PhantomPinned;
#[allow(unused)]
struct ParsedUrl<'input> { ... }
fn parse_url<'input>(url: &'input [u8]) -> ParsedUrl<'input> { ... }
struct Url {
pub input: Vec<u8>,
parsed: Option<ParsedUrl<'static>>, // fake `'static`
_pin: PhantomPinned,
}
impl Url {
pub fn get_parsed<'a>(self: Pin<&'a mut Self>) -> &'a ParsedUrl<'a> {
let pinned: &'a mut Self = unsafe { Pin::into_inner_unchecked(self) };
let pinned: &'static mut Self = unsafe { change_lifetime_mut(pinned) };
pinned.parsed.get_or_insert_with(|| parse_url(&pinned.input))
}
}
pub unsafe fn change_lifetime_mut<'old, 'new: 'old, T: 'new>(data: &'old mut T) -> &'new mut T {
&mut *(data as *mut _)
}
pub fn main() {
let mut url = allocate_and_parse();
let url = pin!(url);
println!("the hostname is {}", url.get_parsed().hostname);
}
fn allocate_and_parse() -> Pin<Box<Url>> {
let input: Vec<u8> = b"https://example.com/foo/bar".to_vec(); // assume it's not static (user input, read from file, etc.)
Url { input, parsed: None, _pin: PhantomPinned }
}