- Feature Name:
safe-transmute
- Start Date: YYYY-MM-DD
- RFC PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
This proposal introduces several traits that allow for transmuting between two types without the use of unsafe: ByteCompatible<T>
for indicating one type's in memory representation is the same as another type, FromBytes
and AsBytes
for indicating types that can be viewed from bytes and as bytes as well as methods for casting between types in a way that may fail.
Transmuting one type to another type and vice versa in Rust is extremely dangerous---so much so that the docs for std::mem::transmute are essentially a long list of how to avoid doing so. However, transmuting is sometimes necessary. For instance, in extremely performance-sensitive use cases, it may be necessary to transmute from bytes instead of explicitly deserializing and copying bytes from a buffer into a struct.
Some concrete uses of this feature might be:
- Viewing a chunk of bytes in a structured way without copying. For example, doing one large memcopy of a buffer from the network and then viewing that buffer in a structured way without needing to do further copying of bytes.
- packing and unpacking data such as
__m128
to[f32;4]
- FFI calls where a
*const void
and a length parameter could be interpreted as&[T]
At the core of understanding the safety properties of transmutation is understanding Rust's layout properties (i.e., how Rust represents types in memory). The best resource I've found for understanding this is Alexis Beingessner's blog post on the matter.
The following are the reasons that transmutation from a buffer of bytes is generally unsafe:
- Illegal Representations: Safe transmutation of a slice of bytes to a type
T
is only possible if every possible value of those bytes corresponds to a valid value of typeT
. For example, this property doesn't hold forbool
or for mostenum
types. Whilesize_of::<bool>() == 1
, abool
can only legally be either0b1
or0b0
- transmuting0b10
tobool
is UB. - Wrong Size: A buffer of bytes might not contain the correct number of bytes to encode a given type. Referring to uninitialized fields of a struct is UB. Of course, this assumes that the size of a given type is known ahead of time which is not always the case.
- Alignment: Types must be "well-aligned" meaning that where they are in memory falls on a certain memory address interval (usually some power of 2). For example the alignment of
u32
is 4 meaning that a validu32
must always start at a memory address evenly divisible by 4. Transmuting a slice of bytes to a typeT
that does not have proper alignment for typeT
is UB. - Non-Deterministic Layout: Certain types might not have a deterministic layout in memory. The Rust compiler is allowed to rearrange the layout of any type that does not have a well defined layout associated with it. Explicitly setting the layout of a type is done through
#[repr(..)]
. To be deterministic, both the order of fields of a complex type as well as the exact value of their offsets from the beginning of the type must be well known. This is generally only possible by marking a complex type#[repr(C)]
and recursively ensuring that all fields of the struct are composed of types with deterministic layout.
Transmuting from a type T
to a slice of bytes can also be unsafe or cause UB:
- Padding: Since padding bytes (i.e., bytes internally inserted to ensure all elements of a complex type have proper alignment) are not initialized, viewing them is UB. For instance,
(u8, u32)
has 3 bytes of padding to align theu32
. Note that a type may have padding at the end, not just in the middle, to ensure that its size is a multiple of its alignment:(u32, u8)
has 3 bytes of padding at the end to make its size 8, a multiple of the 4-byte alignment required foru32
. - Non-Deterministic Layout: The same issue for transmuting from bytes to type
T
apply when going the other direction.
-
performance: users that do not care about performance can use existing solutions to copy from one type to another.
-
sound: There must be no way to use this feature in such a way that introduces soundness issues or requires the use of unsafe.
Sometimes, it may be desirable to directly treat some type in memory as another type without performing any runtime operations. This can be useful in extremely performance sensitive cases that are listed above.
Because we want to cast between arbitrary types, we use their in-memory representation as an intermediary. When casting between types A
and B
, we can first view type A
as its in-memory representation and then view those bytes as type B
.
In order to be able to safely view a types in-memory representation, the following criteria must be met:
- Have a well defined layout: it is not an implementation detail of the compiler how the type will be laid out in memory. This is usually achieved with the
#[repr(C)]
,#[repr(transparent)]
, or#[repr(packed)]
annotations. - Be byte complete: the type can be validly represented in memory with any combination of bytes large enough to represent the type.
bool
can only legally be either0b0
or0b1
and thus does not fulfill this criteria. - Have no padding: all bytes of the in-memory representation of the type are data and not providing padding between fields. A struct with two fields, one
u16
andu8
, has padding and thus does not fulfill this criteria.
To indicate that it's safe to view the type's in-memory byte representation, the type must derive the AsBytes
trait.
The compiler will verify that the above criteria are met and return an error if it does not:
error[E0XYZ]: cannot safely implement `AsBytes`
--> src/building.rs:1:1
|
1 | #[derive(AsBytes)]
2 | struct Building {
^^^^^^^^
3 | height_in_meters: u16,
4 | number_of_elevators: u8
5 | }
|
= note: `Building` contains padding
AsBytes
provides, among other things, the ability to provide a view of its in memory representation through as_bytes
method. Let's take a look at its signature:
trait AsBytes {
fn as_bytes(&self) -> Aligned<[u8; size_of::<Self>()], align_of::<Self>()> { /* */ }
// ...
}
Notice that as_bytes
returns returns a structure called Aligned<T, U>
. This structure indicates that a pointer pointing to type T
is aligned on a multiple of U
. For example Aligned<MyStruct, 4>
is a pointer to MyStruct
that has an alignment that is a multiple of 4. This means the pointer returned from as_bytes
is aligned properly aligned with whatever type implements AsBytes
.
Notice also that the data that the Aligned
pointer is pointing to is of size size_of::<Self>
. This means that the as_bytes
returns a pointer to an array of bytes exactly as long as however big the type implementing AsBytes
is.
We now have a way to view a type as the bytes that are used to represent it in memory. We now need a way to take some bytes and view them as a type.
In order to be able to safely view bytes as a type, the type must also have a well defined layout and be byte complete. It is not necessary for the type to not contain padding.
There are additional characteristics that the bytes themselves must have. The bytes must have an alignment that is compatible with the type and must be the same length as the number of bytes required to represent the type. This will become important below.
To indicate that a type is safe to convert bytes in memory to a type, the type must derive the FromAnyBytes
trait. Again, the compiler will verify that the above criteria are met and return an error if it does not:
error[E0XYZ]: cannot safely implement `FromAnyBytes`
--> src/building.rs:1:1
|
1 | #[derive(FromAnyBytes)]
2 | struct Building {
3 | height_in_meters: u16,
4 | has_elevator: bool
^^^^
5 | }
|
= note: `bool` cannot be converted from arbitrary bytes
FromAnyBytes
is simply a marker trait and does not itself contain any functionality. All it guarantees is that bytes can safely be viewed as the marked type.
We now a way to indicate that well formed bytes can be viewed as a type, but there are many types that are only sometimes safe able to be viewed from bytes. These types require additional validation to ensure that the bytes meet certain criteria. We'll call types that require additional validation before being able to be seen from bytes as "byte incomplete".
A good example of a "byte incomplete" type is bool
which can only be safely viewed from a byte when that byte is either 0b0
or 0b1
. In order to encapsulate this we introduce another trait called FromBytes
which is just like FromAnyBytes
except that types that are "byte incomplete" can implement it.
The must fundamental thing that FromBytes
provides is a from_bytes
method. Let's take a look at it:
trait FromBytes {
type FromBytesError;
fn from_bytes(bytes: Aligned<[u8; size_of::<Self>()]; align_of::<Self>()>) -> Result<&Self, FromBytesError> { /* */ }
}
First, notice that from_bytes
take an Aligned
pointer to an array of bytes that is the exact sized needed to convert the byte array to the type that implements FromBytes
. This guarantees that the bytes we're viewing as some type are properly aligned and of the right size.
Next, notice that from_bytes
returns a Result
of either a reference to Self
or a FromBytesError
. FromBytesError
represents when the bytes passed to from_bytes
could not successfully be viewed as &Self
.
FromBytes
cannot be implemented by the user. There are blanket implementations for certain types: including all types that implement FromAnyBytes
. In that case FromBytesError
is !
as converting bytes to a type that implements FromAnyBytes
cannot fail.
There is one additional blanket implementation which we'll look at next.
As we've seen there are types that can only sometimes be constructed from bytes as long as those bytes meet certain validation. Those types can implement ValidateBytes
:
unsafe trait ValidateBytes {
fn validate_bytes(bytes: Aligned<[u8; size_of::<Self>()]; align_of::<Self>()>) -> bool;
}
Implementors of this trait must return whether the bytes passed in are legal representations of the implementing type. Because improper implementations of this trait can lead to invalid bytes being viewed as a type, the trait is unsafe. Never fear, the trait be derived for many types.
There is a blanket implementation of FromBytes
for ValidateBytes
where FromBytesError
is ByteValidationError
. ByteValidationError
is an error that indicates that the bytes passed into from_bytes
did not meet some validation criteria. For instance, passing a [0b11]
to FromBytes::from_bytes
for bool
would return a ByteValidationError
because 0b11
is not a valid representation of bool
.
We now have a way to view types as bytes and bytes as types. We can now look at the rest of AsBytes
methods that allow for casting between types:
trait AsBytes {
// We saw the `as_bytes` method above
/// Safely cast this type in-place to another type, returning a
/// reference to the same memory.
fn cast<T: FromBytes>(&self) -> Result<&T, <T as FromBytes>::FromBytesError> { /*...*/ }
/// Safely cast this type in-place to another type, returning a
/// mutable reference to the same memory. This requires `Self` to
/// satisfy `FromAnyBytes`, because writes through the returned
/// mutable reference will mutate `Self` without validation.
fn cast_mut<T: FromBytes>(&mut self) -> Result<&mut T, <T as FromBytes>::FromBytesError>
where Self: FromAnyBytes { /*...*/ }
/// Safely cast this type in-place to another type, returning
/// the owned value
fn cast_into<T: FromBytes>(self) -> Result<T, <T as FromBytes>::FromBytesError> { /*...*/ }
}
We now have the ability to cast between arbitrary types!
It is important to be able to indicate that two types have the same in-memory representation as each other. Establishing such a relationship allows for safe transmuting between types.
To indicate that two types have such a relationship with each other, the ByteCompatible<T>
trait is introduced. If a type A: ByteCompatible<B>
then it is completely safe to treat the in-memory representation of A
as B
.
pub unsafe trait ByteCompatible<T> {}
For example, given a user defined type:
#[repr(C)]
struct Dog {
age: u32,
}
we can implement:
unsafe impl ByteCompatible<[u8; 4]> for Dog {}
Note: A trait bound of the form T: ByteCompatible<U>
is "satisfied" iff given any T: ByteCompatible<U0>
there is a type sequence [U_0, U_1, ..., U_N]
such that for i
in range [1, N)
the query U_{i}: Compatible<U_{i+1}>
is satisfied and there is a impl of ByteCompatible<U>
for U_N
. Notice that multiple such sequences could exist, but it suffices that one exists for the query to be satisfied.
It is also possible to encode compatibility between types that requires runtime validation through a related trait TryByteCompaitble<T>
.
pub unsafe trait TryByteCompatible<T> {
type Error;
fn try_compatible(&self) -> Result<(), Self::Error>;
}
// Blanket impl of TryCompatible for Compatible:
unsafe impl<U, T: Compatible<U>> TryCompatible<U> for T {
type Error = !;
fn try_compatible(&self) -> Result<(), !> {
Ok(())
}
}
For example, given this definition of Dog
:
#[repr(C)]
struct Dog {
friendly: bool,
}
we can implement:
unsafe impl TryByteCompatible<Dog> for u8 {
type Error = ();
fn try_compatible(self) -> Result<(), ()> {
if self == 0 || self == 1 {
Ok(())
} else {
Err(())
}
}
}
Along with these trait definitions are the following stand along functions:
fn safe_transmute<T: ByteCompatible<U>, U>(a: T) -> U;
fn try_safe_transmute<T: TryByteCompatible<U>, U>(a: T) -> Result<U, <T as TryByteCompatible>::Error>;
AsBytes
and FromBytes
are simply implemented in terms of ByteCompatible
.
impl <T: Compatible<[u8; size_of::<T>()]>> AsBytes for T {}
impl <T> FromBytes for T where [u8; size_of::<T>()]: Compatible<T> {}
This design is based on many existing solutions that are available as crates plus previous or concurrent RFCs that try to solve similar issues.
Because it requires additional compiler support, no other official proposal has attempted or suggested something similar to Compatible<T>
. As far as we're aware this was first suggested by in the compatible trait proposal which was never officially posted and which this proposal takes great inspiration from. Most proposals either suggest an "MxN" strategy where every type is required to implement a trait for every other type it is compatible with, or they suggest an API that strictly goes through a bytes intermediary. The "MxN" strategy causes a ballooning of trait implementations known to the compiler which makes that strategy untenable in our eyes. The strict bytes intermediary proposal does not allow for completely runtime check free casting between all types that can be statically proven to be equivalent in structure.
Also, because const generics are not yet stable, very few proposals have yet suggested using them to enforce size and alignment constraints.
Here is a list of crates and RFC proposals that have influenced this work:
- zerocopy has a fairly similar public API for the equivalent types to
FromAnyBytes
andAsBytes
. The crate does not have equivalents toFromBytes
,ValidateBytes
, andCompatible<T>
. This is because that crate is specifically focusing on parsing from byte buffers and arbitrary casting between types is not needed. Additionally the crate does not support the idea of types that need some sort of validation before casting. - bytemuck also has a fairly similar API to this proposal and does allow for casting between types but aims to keep the API simple by reducing some of its flexibility. This means there is no distinction between the equivalents of
AsBytes
andFrom{Any,}Bytes
instead electing to only have one trait for both. This makes some otherwise safe castings not possible to represent. For example, it is not possible to cast to types that have padding even though it's safe to do so. This is because those types are not safe to cast from (because reading padding is UB), and since there is no distrinction between types that can be cast from and types that can be cast to, this operation is not allowed. The crate also does not attempt to handle cases where validation is required. FromBits/IntoBits
RFC allowed for casting between bytes but required that there was an NxM relationship between bytes. For every type, there would be an unbounded number of trait implementations that specified which types were "byte compatible" with that type. In addition this proposal does not handle types that need to be validated before being cast to.FromBits
RFC TODO
The following questions are unresolved: *
Here is a list of future possibilities based on this RFC:
- Since it is possible to know if bytes are always safe to read based on