joboet/better_dropck.md

## better_dropck.md

      
    Raw
  

              better_dropck.md
            
          
    Empty lifetime and Destruct trait

Reference-level explanation

Terms

A lifetime defines a region of time in the execution of a program. It is the continuous
range of operations bounded by the creation of a reference and its invalidation.
To enable useful reasoning about lifetimes, lifetimes have a relation between them:
Containment

A lifetime 'a contains another lifetime 'b if all operations within 'b are also in 'a:
'a: 'b
Note that, from this definition, it follows that all lifetimes contain themselves:
for<'a> 'a: 'a
To be able to dereference a reference, the dereferencing operation need to be contained in
the lifetime of the reference (otherwise you would access data that has been destroyed or
changed).
Empty lifetime

The empty lifetime '! denotes an empty range of operations. Therefore, it is contained
in all other lifetimes:
for<'a> 'a: '!
Since the empty lifetime does not include any operations, references with the '! lifetime
are not dereferencable. However, they can be casted to pointers¹, which allows reading
their address.
Normally, functions taking references require the lifetime of the reference to
contain the function:
fn takes_reference<'a, T>(r: &'a T);
implies (pseudo-syntax)
fn takes_reference<'a: 'fn, T>(r: &'a T);
This requirement can be relaxed by "reminding" the compiler that any lifetime can be
used. This is written by adding an 'a: '! bound:
fn takes_reference<'a: '!>(r: &'a T);
Since all lifetimes contain the empty lifetime, all references can be passed to this
function, even outside of their lifetime. As inside the function the lifetime
cannot be assumed to be greater than the empty lifetime, the reference cannot be
dereferenced, maintaining soundness.
Examples

Returning a struct borrowing local items:
struct Calculation<'a> {
    local: &'a u32,
    output: bool,
}

fn calculate() -> Calculation<'!> {
    let local = 3;
    let calc = Calculation {
        local: &local,
        output: true,
    };
    println!("{calc.local}"); // Inside the function, the reference can be used like normal
    calc // Since &'a u32 is a subtype of &'! u32, `Calculation<'local>` is a subtype of `Calculation<'!>` and can be coerced into it.
}

fn read_output() {
    let calc = calculate();
    println!("{calc.output}");
    // println!("{calc.local}"); // Error: reference cannot be dereferenced after its lifetime has ended.
}
Prints:
3
true

Destruction of values

A type is said to be destructable if it implements the Destruct trait
(automatically implemented by the compiler):
trait Destruct {}
The compiler checks that a type implements Destruct to prove that it
can be destructed (go out of scope, calling the destructor if present),
in a fashion similar to how the Copy trait is used to prove something
can be copied.
By default, all types are destructable, meaning Destruct is an
implied trait like Sized. If it is known that the type will not be
destructed, the implicit bounds can be relaxed by adding a ?Destruct
bound to generic bounds. For instance, ManuallyDrop's destructor skips
the destructor, so it does not need the type to be destructable:
for<T: ?Destruct> ManuallyDrop<T>: Destruct
Since local variables are dropped on unwind, all locals are required to
be destructable. It is however possible to reference undroppable types
(for example by dereferencing a ManuallyDrop or an abstraction like a
LeakyBox).
The Destruct trait is implemented for all primitive types. Its
implementation on references is special in that it is applicable to all
lifetimes and does not need the inner type to be destructable:
for<'a: '!, T: ?Destruct> &'a T: Destruct,
for<'a: '!, T: ?Destruct> &'a mut T: Destruct,
Similarily, dropping pointers does not require the type to be destructable:
for<T: ?Destruct> *const T: Destruct,
for<T: ?Destruct> *mut T: Destruct,
fn: Destruct,
The Destruct trait is similar to auto traits like Send or Sync in
that it is automatically implemented for composite types if all inner types
are destructable. It differs however in that the required bounds on generics
can be overwritten by adding an explicit Drop implementation:
struct WithDestructor<'a, T> {
    reference: &'a T,
}

impl<'a, T> Drop for WithDestructor<'a, T> {
    fn drop(&mut self) {}
}
leads to
for<'a, T> WithDestructor<'a, T>: Destruct
but not
for<'a: '!, T: ?Destruct> WithDestructor<'a, T>: Destruct
since drop must be callable for the type to be destructable.
These bounds can be relaxed by weakening the bounds on the Drop implementation:
impl<'a: '!, T: ?Destruct> Drop for WithDestructor<'a, T> { ... }
leads to
for<'a: '!, T: ?Destruct> WithDestructor<'a, T>: Destruct
Since it cannot be assumed that 'a is larger than '!, the reference field
in the example is not dereferencable in the destructor.
Examples

Implementing Box:
use std::alloc::Global;
use std::mem::drop_in_place;

struct Box<T> {
    ptr: *mut T,
}

impl<T> Box<T> {
    pub fn new(val: T) -> Box<T> {
        let ptr = Global.allocate(Layout::new::<T>());
        unsafe {
            ptr.write(val);
        }
        Box { ptr }
    }
}

impl<T> Deref for Box<T> {
    fn deref(&self) -> &T {
        unsafe { &*self.ptr }
    }
}

impl<T> Drop for Box<T> {
    fn drop(&mut self) {
        unsafe {
            // `Destruct` is implied.
            drop_in_place(self.ptr);
            Global.deallocate(self.ptr, Layout::new::<T>());
        }
    }
}

fn drop_after_end() {
    let local = 314;
    let boxed = Box::new(&local);
    drop(local);
    // Dropping `Box` only requires `T` to be destructable, which references are,
    // even if their lifetime has ended.
    drop(boxed);
}
This would not compile, since Destruct is not implemented for the empty lifetime:
struct WithDestructor<'a>(&'a i32);

impl<'a> Drop for WithDestructor<'a> {
    fn drop(&mut self) {
        println!("{self.0}")
    }
}

fn drop_after_end() {
    let local = 314;
    let print_on_drop = WithDestructor(&local);
    drop(local);
    drop(print_on_drop); // Error: WithDestructor cannot be dropped after its lifetime has ended.
}
Breakage

Since the proposed semantics allow dropping dangling data without #[may_dangle], some breakage could
result if the Drop implementation assumes more than what can be proven with the trait system. In particular,
the following code is currently sound, but would now be unsound (although all currently writable code
still cannot cause UB):
pub struct Generic<T>(T);

impl<'a> Generic<&'a u32> {
    pub fn construct(r: &u32) -> Generic<&u32> {
        Generic(r)
    }
}

impl<T> Drop for Generic<T> {
    fn drop(&mut self) {
        // Currently sound because only `Generic<&'a u32>` can be constructed.
        // With the proposed `Destruct` semantics, the reference may be dangling.
        let r = unsafe { std::mem::transmute_copy::<T, &'static u32>(&self.0) };
        println!("{r}");
    }
}
I would consider examples like these highly unlikely, but this will have to be evaluated
carefully if this is implemented.
Formal explanation

Consider the dereferencing operation on pointers:
$$\textrm{deref}: P \to T$$
This operation is not always sound (for example, the data needs to be valid). The set of
operations for which dereferencing is sound is called a lifetime. References in Rust can
be considered a tuple of a pointer $P$ and a lifetime $L$:
$$R = P \times L$$
To prove that a reference can be dereferenced, the compiler checks that the dereferencing
operation is an element of the lifetime:
$$\forall (O = \textrm{deref}), (R = (P, L)): O \in L \implies \textrm{valid}(O)$$
The lifetime (or a subset thereof) can be constructed at compile time by using the borrowing
ules.
To call a function $f$ that takes a reference, the compiler proves that
$$\textrm{call}(f) = \{\textrm{O} | O \in f \} \subset L$$
where $L$ is the lifetime of the reference. Since
$$O = \textrm{deref} \in f \implies O \in L$$
the reference can be dereferenced within the function.
However, there is another operation on references: casting to a pointer
$$\textrm{ptr-from-fn}: P \times L \to P$$
Currently, this operation requires the casting operation to be within the
lifetime of the reference. This is unnessesarily restrictive, since the
pointer can be safely taken even if the reference is not alive anymore
(only dereferencing would be unsound). To allow this, this proposal introduces
a notation for the empty lifetime (an empty set of operations). Since,
by definition, any dereferencing operations is only sound if it is an element
of the lifetime, the compiler can statically prove that a reference with empty
lifetime cannot be dereferenced.
Similarily, since $f \not\subset \emptyset$, functions
requiring $f \subset L$ cannot be called with empty lifetimes (this is trivial
to prove in the compiler, as it is just another formulation of liveliness checks).
To enable casting references to pointers outside of their lifetime, functions
can relax their bounds so that $L$ can be any set of operations, even if it does
not include the function. Within these functions, the reference cannot be dereferenced,
as the compiler cannot prove that $O = \textrm{deref} \in L$, and thus, soundness is
maintained.
These bounds can already be proven by the compiler, just like it can reject calling
fn f<a: 'static>(r: &'a u8)
with a reference that has less than static lifetime.
Since $L_1 \subset L_2$ implies $R_1 = (P, L_1) \textrm{subtype of} R_2 = (P, L_2)$,
all references are a subtype of the same reference, but with empty lifetime.
Acknowledgements

Credit goes to @SoniEx2 for the idea of reworking dropck around borrowck.
Footnotes


The pointers cannot be used to access the referent in any way, in accordance
with stacked borrows. ↩