asajeffrey/rust-safe-reachability.md

## rust-safe-reachability.md

      
    Raw
  

              rust-safe-reachability.md
            
          
    The problem at hand is trying to come up with a definition of legal unsafe code for Rust. The context is the
Rust memory model. A (probably incomplete) list of goals is:

Be simple enough, and close enough to common practice that most idiomatic C code which programmers would expect to
maintain the Rust memory invariants is legal.
Support common program optimizations of safe code, by compilers, hardware, or software developers.
Not allow for undefined behaviour.

The Tootsie Pop model
is a simple model of unsafety, based on call boundaries into and out of unsafe code. The model has a temporal flavour,
since the notion of whether unsafe code is currently executing changes over time.
An alternative is to look at a spatial model, where the invariant is defined based on the shape of the memory,
rather than the sequence of fucntional calls and returns. The observation is that safe code is limited as to how
it reads and writes the heap:

Safe code cannot access fields unless they are public.
Safe code cannot dereference pointers unless they are of type &T or &mut T.
Safe code cannot produce references out of thin air (e.g. cannot cast a usize to a &T).

A first shot at trying to formalize this is as follows (there are lots of gaps, but the idea is to see whether any
of the gaps are serious enough to kill an approach something like this one):
Let A be the set of memory addresses (i.e. usize values).
A heap is a partial mapping H: A → A.
A typed address is of the form a:&T or a:&mut T, where a is an address, and T is a Rust type.
A typed rooted heap is a heap together with a set of typed addresses (the roots).
Given a typed rooted heap, a typed address is safely reachable whenever:

If r is a root, then r is safely reachable.
If a:&T is safely reachable, and T contains a public field f:U with offset i, then a+i:&U is safely reachable.
If a:&&T is safely reachable, and H(a) = b then b:&T is safely reachable.
Ditto, &mut.
Er, other accesses? For example enums? Interior mutability?

A typed rooted heap is type-safe whenever:

If a:T is safely reachable, then H(a) is well-defined.
If a:&&mut T and b:&U are safely reachable, and H(a) = H(b) then a = b and &mut T = U.
Er, other conditions? Lifetimes? Use-after-free? Ownership?

Consider an unsafe program execution to be a series of typed rooted heaps, where the typed roots are:

Initially, any globals defined by the program.
On any incoming function call f(v), remove any root a:&mut T such that there is an b:&mut U which is safely reachable from both a:T and v.
On any matching outgoing return with result w, restore those roots, and add new root w.
On any outgoing callback f(v), add new root v.
On any matching incoming return with result w, remove any root a:&mut T such that there is an b:&mut U which is safely reachable from both a:T and w.

(This isn't quite right, I suspect, but it's got the right flavour of shrinking and growing the roots as owership is transferred.)
An unsafe program maintains type safety if, during any program execution which transforms the heap from
H to H′, if H is type-safe, then H′ is type-safe.
An unsafe program maintains immutability if, during any program execution which transforms the heap from
H to H′, if a is safely reachable in both H and H′ then H(a) = H′(a). (This isn't quite right,
as we want that a was safely rechable at all times during the execution, e.g. the context didn't transfer ownership
of a to the unsafe code and then get it back again. Also, we need to think about interior mutability, which doesn't
preserve this property.)
A candidate requirement for legality of unsafe code is that it mainains type safety and immutability.