Skip to content

Instantly share code, notes, and snippets.

@joboet
Last active March 3, 2023 09:49
Show Gist options
  • Save joboet/17b2e9773a8c6ae5d19ad6092fd5d3b5 to your computer and use it in GitHub Desktop.
Save joboet/17b2e9773a8c6ae5d19ad6092fd5d3b5 to your computer and use it in GitHub Desktop.
Empty lifetime and `Destruct` trait

Empty lifetime and Destruct trait

Reference-level explanation

Terms

A lifetime defines a region of time in the execution of a program. It is the continuous range of operations bounded by the creation of a reference and its invalidation.

To enable useful reasoning about lifetimes, lifetimes have a relation between them:

Containment

A lifetime 'a contains another lifetime 'b if all operations within 'b are also in 'a:

'a: 'b

Note that, from this definition, it follows that all lifetimes contain themselves:

for<'a> 'a: 'a

To be able to dereference a reference, the dereferencing operation need to be contained in the lifetime of the reference (otherwise you would access data that has been destroyed or changed).

Empty lifetime

The empty lifetime '! denotes an empty range of operations. Therefore, it is contained in all other lifetimes:

for<'a> 'a: '!

Since the empty lifetime does not include any operations, references with the '! lifetime are not dereferencable. However, they can be casted to pointers1, which allows reading their address.

Normally, functions taking references require the lifetime of the reference to contain the function:

fn takes_reference<'a, T>(r: &'a T);

implies (pseudo-syntax)

fn takes_reference<'a: 'fn, T>(r: &'a T);

This requirement can be relaxed by "reminding" the compiler that any lifetime can be used. This is written by adding an 'a: '! bound:

fn takes_reference<'a: '!>(r: &'a T);

Since all lifetimes contain the empty lifetime, all references can be passed to this function, even outside of their lifetime. As inside the function the lifetime cannot be assumed to be greater than the empty lifetime, the reference cannot be dereferenced, maintaining soundness.

Examples

Returning a struct borrowing local items:

struct Calculation<'a> {
    local: &'a u32,
    output: bool,
}

fn calculate() -> Calculation<'!> {
    let local = 3;
    let calc = Calculation {
        local: &local,
        output: true,
    };
    println!("{calc.local}"); // Inside the function, the reference can be used like normal
    calc // Since &'a u32 is a subtype of &'! u32, `Calculation<'local>` is a subtype of `Calculation<'!>` and can be coerced into it.
}

fn read_output() {
    let calc = calculate();
    println!("{calc.output}");
    // println!("{calc.local}"); // Error: reference cannot be dereferenced after its lifetime has ended.
}

Prints:

3
true

Destruction of values

A type is said to be destructable if it implements the Destruct trait (automatically implemented by the compiler):

trait Destruct {}

The compiler checks that a type implements Destruct to prove that it can be destructed (go out of scope, calling the destructor if present), in a fashion similar to how the Copy trait is used to prove something can be copied.

By default, all types are destructable, meaning Destruct is an implied trait like Sized. If it is known that the type will not be destructed, the implicit bounds can be relaxed by adding a ?Destruct bound to generic bounds. For instance, ManuallyDrop's destructor skips the destructor, so it does not need the type to be destructable:

for<T: ?Destruct> ManuallyDrop<T>: Destruct

Since local variables are dropped on unwind, all locals are required to be destructable. It is however possible to reference undroppable types (for example by dereferencing a ManuallyDrop or an abstraction like a LeakyBox).

The Destruct trait is implemented for all primitive types. Its implementation on references is special in that it is applicable to all lifetimes and does not need the inner type to be destructable:

for<'a: '!, T: ?Destruct> &'a T: Destruct,
for<'a: '!, T: ?Destruct> &'a mut T: Destruct,

Similarily, dropping pointers does not require the type to be destructable:

for<T: ?Destruct> *const T: Destruct,
for<T: ?Destruct> *mut T: Destruct,
fn: Destruct,

The Destruct trait is similar to auto traits like Send or Sync in that it is automatically implemented for composite types if all inner types are destructable. It differs however in that the required bounds on generics can be overwritten by adding an explicit Drop implementation:

struct WithDestructor<'a, T> {
    reference: &'a T,
}

impl<'a, T> Drop for WithDestructor<'a, T> {
    fn drop(&mut self) {}
}

leads to

for<'a, T> WithDestructor<'a, T>: Destruct

but not

for<'a: '!, T: ?Destruct> WithDestructor<'a, T>: Destruct

since drop must be callable for the type to be destructable.

These bounds can be relaxed by weakening the bounds on the Drop implementation:

impl<'a: '!, T: ?Destruct> Drop for WithDestructor<'a, T> { ... }

leads to

for<'a: '!, T: ?Destruct> WithDestructor<'a, T>: Destruct

Since it cannot be assumed that 'a is larger than '!, the reference field in the example is not dereferencable in the destructor.

Examples

Implementing Box:

use std::alloc::Global;
use std::mem::drop_in_place;

struct Box<T> {
    ptr: *mut T,
}

impl<T> Box<T> {
    pub fn new(val: T) -> Box<T> {
        let ptr = Global.allocate(Layout::new::<T>());
        unsafe {
            ptr.write(val);
        }
        Box { ptr }
    }
}

impl<T> Deref for Box<T> {
    fn deref(&self) -> &T {
        unsafe { &*self.ptr }
    }
}

impl<T> Drop for Box<T> {
    fn drop(&mut self) {
        unsafe {
            // `Destruct` is implied.
            drop_in_place(self.ptr);
            Global.deallocate(self.ptr, Layout::new::<T>());
        }
    }
}

fn drop_after_end() {
    let local = 314;
    let boxed = Box::new(&local);
    drop(local);
    // Dropping `Box` only requires `T` to be destructable, which references are,
    // even if their lifetime has ended.
    drop(boxed);
}

This would not compile, since Destruct is not implemented for the empty lifetime:

struct WithDestructor<'a>(&'a i32);

impl<'a> Drop for WithDestructor<'a> {
    fn drop(&mut self) {
        println!("{self.0}")
    }
}

fn drop_after_end() {
    let local = 314;
    let print_on_drop = WithDestructor(&local);
    drop(local);
    drop(print_on_drop); // Error: WithDestructor cannot be dropped after its lifetime has ended.
}

Breakage

Since the proposed semantics allow dropping dangling data without #[may_dangle], some breakage could result if the Drop implementation assumes more than what can be proven with the trait system. In particular, the following code is currently sound, but would now be unsound (although all currently writable code still cannot cause UB):

pub struct Generic<T>(T);

impl<'a> Generic<&'a u32> {
    pub fn construct(r: &u32) -> Generic<&u32> {
        Generic(r)
    }
}

impl<T> Drop for Generic<T> {
    fn drop(&mut self) {
        // Currently sound because only `Generic<&'a u32>` can be constructed.
        // With the proposed `Destruct` semantics, the reference may be dangling.
        let r = unsafe { std::mem::transmute_copy::<T, &'static u32>(&self.0) };
        println!("{r}");
    }
}

I would consider examples like these highly unlikely, but this will have to be evaluated carefully if this is implemented.

Formal explanation

Consider the dereferencing operation on pointers:

$$\textrm{deref}: P \to T$$

This operation is not always sound (for example, the data needs to be valid). The set of operations for which dereferencing is sound is called a lifetime. References in Rust can be considered a tuple of a pointer $P$ and a lifetime $L$:

$$R = P \times L$$

To prove that a reference can be dereferenced, the compiler checks that the dereferencing operation is an element of the lifetime:

$$\forall (O = \textrm{deref}), (R = (P, L)): O \in L \implies \textrm{valid}(O)$$

The lifetime (or a subset thereof) can be constructed at compile time by using the borrowing ules.

To call a function $f$ that takes a reference, the compiler proves that

$$\textrm{call}(f) = \{\textrm{O} | O \in f \} \subset L$$

where $L$ is the lifetime of the reference. Since

$$O = \textrm{deref} \in f \implies O \in L$$

the reference can be dereferenced within the function.

However, there is another operation on references: casting to a pointer

$$\textrm{ptr-from-fn}: P \times L \to P$$

Currently, this operation requires the casting operation to be within the lifetime of the reference. This is unnessesarily restrictive, since the pointer can be safely taken even if the reference is not alive anymore (only dereferencing would be unsound). To allow this, this proposal introduces a notation for the empty lifetime (an empty set of operations). Since, by definition, any dereferencing operations is only sound if it is an element of the lifetime, the compiler can statically prove that a reference with empty lifetime cannot be dereferenced.

Similarily, since $f \not\subset \emptyset$, functions requiring $f \subset L$ cannot be called with empty lifetimes (this is trivial to prove in the compiler, as it is just another formulation of liveliness checks). To enable casting references to pointers outside of their lifetime, functions can relax their bounds so that $L$ can be any set of operations, even if it does not include the function. Within these functions, the reference cannot be dereferenced, as the compiler cannot prove that $O = \textrm{deref} \in L$, and thus, soundness is maintained.

These bounds can already be proven by the compiler, just like it can reject calling fn f<a: 'static>(r: &'a u8) with a reference that has less than static lifetime.

Since $L_1 \subset L_2$ implies $R_1 = (P, L_1) \textrm{subtype of} R_2 = (P, L_2)$, all references are a subtype of the same reference, but with empty lifetime.

Acknowledgements

Credit goes to @SoniEx2 for the idea of reworking dropck around borrowck.

Footnotes

  1. The pointers cannot be used to access the referent in any way, in accordance with stacked borrows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment