Skip to content

Instantly share code, notes, and snippets.

@rylev

rylev/rfc.md Secret

Created December 18, 2019 20:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rylev/f0c103340d81f3ce9237184ed8e7abe6 to your computer and use it in GitHub Desktop.
Save rylev/f0c103340d81f3ce9237184ed8e7abe6 to your computer and use it in GitHub Desktop.
Safe Transmute RFC Draft

Summary

This proposal introduces several traits that allow for transmuting between two types without the use of unsafe: ByteCompatible<T> for indicating one type's in memory representation is the same as another type, FromBytes and AsBytes for indicating types that can be viewed from bytes and as bytes as well as methods for casting between types in a way that may fail.

Motivation

Transmuting one type to another type and vice versa in Rust is extremely dangerous---so much so that the docs for std::mem::transmute are essentially a long list of how to avoid doing so. However, transmuting is sometimes necessary. For instance, in extremely performance-sensitive use cases, it may be necessary to transmute from bytes instead of explicitly deserializing and copying bytes from a buffer into a struct.

Some concrete uses of this feature might be:

  • Viewing a chunk of bytes in a structured way without copying. For example, doing one large memcopy of a buffer from the network and then viewing that buffer in a structured way without needing to do further copying of bytes.
  • packing and unpacking data such as __m128 to [f32;4]
  • FFI calls where a *const void and a length parameter could be interpreted as &[T]

Causes of Unsafety and Undefined Behavior (UB)

At the core of understanding the safety properties of transmutation is understanding Rust's layout properties (i.e., how Rust represents types in memory). The best resource I've found for understanding this is Alexis Beingessner's blog post on the matter.

The following are the reasons that transmutation from a buffer of bytes is generally unsafe:

  • Illegal Representations: Safe transmutation of a slice of bytes to a type T is only possible if every possible value of those bytes corresponds to a valid value of type T. For example, this property doesn't hold for bool or for most enum types. While size_of::<bool>() == 1, a bool can only legally be either 0b1 or 0b0 - transmuting 0b10 to bool is UB.
  • Wrong Size: A buffer of bytes might not contain the correct number of bytes to encode a given type. Referring to uninitialized fields of a struct is UB. Of course, this assumes that the size of a given type is known ahead of time which is not always the case.
  • Alignment: Types must be "well-aligned" meaning that where they are in memory falls on a certain memory address interval (usually some power of 2). For example the alignment of u32 is 4 meaning that a valid u32 must always start at a memory address evenly divisible by 4. Transmuting a slice of bytes to a type T that does not have proper alignment for type T is UB.
  • Non-Deterministic Layout: Certain types might not have a deterministic layout in memory. The Rust compiler is allowed to rearrange the layout of any type that does not have a well defined layout associated with it. Explicitly setting the layout of a type is done through #[repr(..)]. To be deterministic, both the order of fields of a complex type as well as the exact value of their offsets from the beginning of the type must be well known. This is generally only possible by marking a complex type #[repr(C)] and recursively ensuring that all fields of the struct are composed of types with deterministic layout.

Transmuting from a type T to a slice of bytes can also be unsafe or cause UB:

  • Padding: Since padding bytes (i.e., bytes internally inserted to ensure all elements of a complex type have proper alignment) are not initialized, viewing them is UB. For instance, (u8, u32) has 3 bytes of padding to align the u32. Note that a type may have padding at the end, not just in the middle, to ensure that its size is a multiple of its alignment: (u32, u8) has 3 bytes of padding at the end to make its size 8, a multiple of the 4-byte alignment required for u32.
  • Non-Deterministic Layout: The same issue for transmuting from bytes to type T apply when going the other direction.

Design Goals and Constraints

  • performance: users that do not care about performance can use existing solutions to copy from one type to another.

  • sound: There must be no way to use this feature in such a way that introduces soundness issues or requires the use of unsafe.

Guide-level explanation

Sometimes, it may be desirable to directly treat some type in memory as another type without performing any runtime operations. This can be useful in extremely performance sensitive cases that are listed above.

Because we want to cast between arbitrary types, we use their in-memory representation as an intermediary. When casting between types A and B, we can first view type A as its in-memory representation and then view those bytes as type B.

Viewing a Type as Bytes with AsBytes

In order to be able to safely view a types in-memory representation, the following criteria must be met:

  • Have a well defined layout: it is not an implementation detail of the compiler how the type will be laid out in memory. This is usually achieved with the #[repr(C)], #[repr(transparent)], or #[repr(packed)] annotations.
  • Be byte complete: the type can be validly represented in memory with any combination of bytes large enough to represent the type. bool can only legally be either 0b0 or 0b1 and thus does not fulfill this criteria.
  • Have no padding: all bytes of the in-memory representation of the type are data and not providing padding between fields. A struct with two fields, one u16 and u8, has padding and thus does not fulfill this criteria.

To indicate that it's safe to view the type's in-memory byte representation, the type must derive the AsBytestrait.

The compiler will verify that the above criteria are met and return an error if it does not:

error[E0XYZ]: cannot safely implement `AsBytes`
 --> src/building.rs:1:1
  |
1 | #[derive(AsBytes)]
2 | struct Building {
           ^^^^^^^^
3 |   height_in_meters: u16,
4 |   number_of_elevators: u8                          
5 | }          
  |
  = note: `Building` contains padding

AsBytes provides, among other things, the ability to provide a view of its in memory representation through as_bytes method. Let's take a look at its signature:

trait AsBytes {
  fn as_bytes(&self) -> Aligned<[u8; size_of::<Self>()], align_of::<Self>()> { /* */ }
  
  // ...
}

Notice that as_bytes returns returns a structure called Aligned<T, U>. This structure indicates that a pointer pointing to type T is aligned on a multiple of U. For example Aligned<MyStruct, 4> is a pointer to MyStruct that has an alignment that is a multiple of 4. This means the pointer returned from as_bytes is aligned properly aligned with whatever type implements AsBytes.

Notice also that the data that the Aligned pointer is pointing to is of size size_of::<Self>. This means that the as_bytes returns a pointer to an array of bytes exactly as long as however big the type implementing AsBytes is.

Viewing Bytes as a Type with FromAnyBytes and FromBytes

We now have a way to view a type as the bytes that are used to represent it in memory. We now need a way to take some bytes and view them as a type.

In order to be able to safely view bytes as a type, the type must also have a well defined layout and be byte complete. It is not necessary for the type to not contain padding.

There are additional characteristics that the bytes themselves must have. The bytes must have an alignment that is compatible with the type and must be the same length as the number of bytes required to represent the type. This will become important below.

To indicate that a type is safe to convert bytes in memory to a type, the type must derive the FromAnyBytes trait. Again, the compiler will verify that the above criteria are met and return an error if it does not:

error[E0XYZ]: cannot safely implement `FromAnyBytes`
 --> src/building.rs:1:1
  |
1 | #[derive(FromAnyBytes)]
2 | struct Building {
3 |   height_in_meters: u16,
4 |   has_elevator: bool
                    ^^^^                        
5 | }          
  |
  = note: `bool` cannot be converted from arbitrary bytes

FromAnyBytes is simply a marker trait and does not itself contain any functionality. All it guarantees is that bytes can safely be viewed as the marked type.

We now a way to indicate that well formed bytes can be viewed as a type, but there are many types that are only sometimes safe able to be viewed from bytes. These types require additional validation to ensure that the bytes meet certain criteria. We'll call types that require additional validation before being able to be seen from bytes as "byte incomplete".

A good example of a "byte incomplete" type is bool which can only be safely viewed from a byte when that byte is either 0b0 or 0b1. In order to encapsulate this we introduce another trait called FromBytes which is just like FromAnyBytes except that types that are "byte incomplete" can implement it.

The must fundamental thing that FromBytes provides is a from_bytes method. Let's take a look at it:

trait FromBytes {
  type FromBytesError; 
  
  fn from_bytes(bytes: Aligned<[u8; size_of::<Self>()]; align_of::<Self>()>) -> Result<&Self, FromBytesError> { /* */ }
} 

First, notice that from_bytes take an Aligned pointer to an array of bytes that is the exact sized needed to convert the byte array to the type that implements FromBytes. This guarantees that the bytes we're viewing as some type are properly aligned and of the right size.

Next, notice that from_bytes returns a Result of either a reference to Self or a FromBytesError. FromBytesError represents when the bytes passed to from_bytes could not successfully be viewed as &Self.

FromBytes cannot be implemented by the user. There are blanket implementations for certain types: including all types that implement FromAnyBytes. In that case FromBytesError is ! as converting bytes to a type that implements FromAnyBytes cannot fail.

There is one additional blanket implementation which we'll look at next.

ValidateBytes

As we've seen there are types that can only sometimes be constructed from bytes as long as those bytes meet certain validation. Those types can implement ValidateBytes:

unsafe trait ValidateBytes {
   fn validate_bytes(bytes: Aligned<[u8; size_of::<Self>()]; align_of::<Self>()>) -> bool;
}

Implementors of this trait must return whether the bytes passed in are legal representations of the implementing type. Because improper implementations of this trait can lead to invalid bytes being viewed as a type, the trait is unsafe. Never fear, the trait be derived for many types.

There is a blanket implementation of FromBytes for ValidateBytes where FromBytesError is ByteValidationError. ByteValidationError is an error that indicates that the bytes passed into from_bytes did not meet some validation criteria. For instance, passing a [0b11] to FromBytes::from_bytes for bool would return a ByteValidationError because 0b11 is not a valid representation of bool.

Casting with AsBytes

We now have a way to view types as bytes and bytes as types. We can now look at the rest of AsBytes methods that allow for casting between types:

trait AsBytes {
  // We saw the `as_bytes` method above
  
  /// Safely cast this type in-place to another type, returning a 
  /// reference to the same memory.
  fn cast<T: FromBytes>(&self) -> Result<&T, <T as FromBytes>::FromBytesError> { /*...*/ }

  /// Safely cast this type in-place to another type, returning a 
  /// mutable reference to the same memory. This requires `Self` to 
  /// satisfy `FromAnyBytes`, because writes through the returned 
  /// mutable reference will mutate  `Self` without validation.
  fn cast_mut<T: FromBytes>(&mut self) -> Result<&mut T, <T as FromBytes>::FromBytesError>
    where Self: FromAnyBytes { /*...*/ }
      
  /// Safely cast this type in-place to another type, returning 
  /// the owned value
  fn cast_into<T: FromBytes>(self) -> Result<T, <T as FromBytes>::FromBytesError> { /*...*/ }
}

We now have the ability to cast between arbitrary types!

Reference-level explanation

ByteCompatible

It is important to be able to indicate that two types have the same in-memory representation as each other. Establishing such a relationship allows for safe transmuting between types.

To indicate that two types have such a relationship with each other, the ByteCompatible<T> trait is introduced. If a type A: ByteCompatible<B> then it is completely safe to treat the in-memory representation of A as B.

pub unsafe trait ByteCompatible<T> {}

For example, given a user defined type:

#[repr(C)]
struct Dog {
  age: u32,
}

we can implement:

unsafe impl ByteCompatible<[u8; 4]> for Dog {}

Note: A trait bound of the form T: ByteCompatible<U> is "satisfied" iff given any T: ByteCompatible<U0> there is a type sequence [U_0, U_1, ..., U_N] such that for i in range [1, N) the query U_{i}: Compatible<U_{i+1}> is satisfied and there is a impl of ByteCompatible<U> for U_N. Notice that multiple such sequences could exist, but it suffices that one exists for the query to be satisfied.

It is also possible to encode compatibility between types that requires runtime validation through a related trait TryByteCompaitble<T>.

pub unsafe trait TryByteCompatible<T> { 
    type Error;
    fn try_compatible(&self) -> Result<(), Self::Error>; 
}

// Blanket impl of TryCompatible for Compatible:
unsafe impl<U, T: Compatible<U>> TryCompatible<U> for T { 
    type Error = !; 
    fn try_compatible(&self) -> Result<(), !> {
        Ok(())
    }
}

For example, given this definition of Dog:

#[repr(C)]
struct Dog {
  friendly: bool,
}

we can implement:

unsafe impl TryByteCompatible<Dog> for u8 {
    type Error = ();
    fn try_compatible(self) -> Result<(), ()> {
        if self == 0 || self == 1 {
            Ok(())
        } else {
            Err(())
        }
    }
} 

Along with these trait definitions are the following stand along functions:

fn safe_transmute<T: ByteCompatible<U>, U>(a: T) -> U;
fn try_safe_transmute<T: TryByteCompatible<U>, U>(a: T) -> Result<U, <T as TryByteCompatible>::Error>;

AsBytes and FromBytes

AsBytes and FromBytes are simply implemented in terms of ByteCompatible.

impl <T: Compatible<[u8; size_of::<T>()]>> AsBytes for T {}
impl <T> FromBytes for T where [u8; size_of::<T>()]: Compatible<T> {}

Drawbacks

Rationale and alternatives

Prior art

This design is based on many existing solutions that are available as crates plus previous or concurrent RFCs that try to solve similar issues.

Because it requires additional compiler support, no other official proposal has attempted or suggested something similar to Compatible<T>. As far as we're aware this was first suggested by in the compatible trait proposal which was never officially posted and which this proposal takes great inspiration from. Most proposals either suggest an "MxN" strategy where every type is required to implement a trait for every other type it is compatible with, or they suggest an API that strictly goes through a bytes intermediary. The "MxN" strategy causes a ballooning of trait implementations known to the compiler which makes that strategy untenable in our eyes. The strict bytes intermediary proposal does not allow for completely runtime check free casting between all types that can be statically proven to be equivalent in structure.

Also, because const generics are not yet stable, very few proposals have yet suggested using them to enforce size and alignment constraints.

Here is a list of crates and RFC proposals that have influenced this work:

  • zerocopy has a fairly similar public API for the equivalent types to FromAnyBytes and AsBytes. The crate does not have equivalents to FromBytes, ValidateBytes, and Compatible<T>. This is because that crate is specifically focusing on parsing from byte buffers and arbitrary casting between types is not needed. Additionally the crate does not support the idea of types that need some sort of validation before casting.
  • bytemuck also has a fairly similar API to this proposal and does allow for casting between types but aims to keep the API simple by reducing some of its flexibility. This means there is no distinction between the equivalents ofAsBytes and From{Any,}Bytes instead electing to only have one trait for both. This makes some otherwise safe castings not possible to represent. For example, it is not possible to cast to types that have padding even though it's safe to do so. This is because those types are not safe to cast from (because reading padding is UB), and since there is no distrinction between types that can be cast from and types that can be cast to, this operation is not allowed. The crate also does not attempt to handle cases where validation is required.
  • FromBits/IntoBits RFC allowed for casting between bytes but required that there was an NxM relationship between bytes. For every type, there would be an unbounded number of trait implementations that specified which types were "byte compatible" with that type. In addition this proposal does not handle types that need to be validated before being cast to.
  • FromBits RFC TODO

Unresolved questions

The following questions are unresolved: *

Future possibilities

Here is a list of future possibilities based on this RFC:

  • Since it is possible to know if bytes are always safe to read based on
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment