Skip to content

Instantly share code, notes, and snippets.

@nuew
Created August 8, 2019 05:14
Show Gist options
  • Save nuew/f0a16d56ed8481f51c34285b2c0420db to your computer and use it in GitHub Desktop.
Save nuew/f0a16d56ed8481f51c34285b2c0420db to your computer and use it in GitHub Desktop.
`bytes` partial audit

Audit of bytes (b5d4f87)

I audited the master branch of bytes, currently at commit b5d4f87.

There were a total of sixty-eight unsafe blocks, functions, and impls in this version of bytes.

There are several instances of undefined behaviour in this crate; several have already been publicly discussed, here and here. Several functions are "newly discovered" to be questionably legal or likely illegal, including Bytes::to_mut, Inner::inline_ptr, and [Inner::inline_len]1.

Under a strict definition of undefined behaviour requiring uninitialized memory to be accessible only through untagged unions or raw pointers, undefined behaivour is pervasive throughout src/bytes.rs. Unfortunately, it is currently unclear if this is actually undefined behaviour, so I'm not comfortable making any recommendations. As the bytes authors are already breaking backwards compatibility from the most recently published version in the audited version, it may wish to modify its API to ensure that future clarifications of the scope of UB leave bytes entirely defined.

The files src/debug.rs, src/lib.rs, src/serde.rs, src/buf/from_buf.rs, src/buf/into_buf.rs, src/buf/iter.rs, src/buf/mod.rs, src/buf/reader.rs, src/buf/take.rs, src/buf/vec_deque.rs, and src/buf/writer.rs had no unsafe blocks, functions, or impls and were out of this audit's scope and therefore were not evaluated. Tests (tests/) and benchmarks (benches/) were also out of this audit's scope and therefore were not evaluated. Examples using unsafe were checked when the function that they documented was unsafe or contained unsafe blocks.

Disclaimer

I am not a professional security expert, and I do not have a particular knowledge of the nuances of proposed Rust memory models or esoterically undefined behaviours. This is the first formal audit of a library I have conducted.

Use these findings at your own risk.

Detailed Analysis

src/bytes.rs

src/bytes.rs contains thirty unsafe blocks, twelve unsafe functions, and two unsafe impls.

Bytes::slice

Bytes::slice is a function on Bytes which returns a slice from a provided range. It contains one unsafe block calling two unsafe functions.

The unsafe block makes two function calls, the first to [Bytes::set_end], and the second to [Bytes::set_start]. In this context, these functions requires that all "revealed" space in the buffer is initialized. As slice can only return an identical or smaller view of the buffer, this is trivially true, assuming that the existing buffer is initialized.

Bytes::to_mut

Bytes::to_mut is a function on Bytes which returns a mutable reference to the Bytes. It contains one unsafe block dereferencing one raw pointer.

After checking if the buffer is exclusively owned, and cloning it if not, an unsafe block takes the self reference, casts it to a raw mutable pointer to a Bytes, and casts that to a raw mutable pointer to a [BytesMut]. While both structs have only one member, which is repr(C), the structs themselves are not. While I can't imagine a scenario in which this breaks in practice, this is still likely illegal2, and both Bytes and [BytesMut] ought to be repr(C).

<Bytes as Buf>::advance

<Bytes as Buf>::advance is an unsafe function implementing Buf::advance. It contains one unsafe block calling one unsafe function.

The function checks that there is sufficient initialized space remaining in the buffer to advance by the requested number of bytes, and then uses [Bytes::set_start] to do so.

<Bytes as Clone>::clone

<Bytes as Clone>::clone is a function implementing Clone::clone. It contains one unsafe block calling one unsafe function.

The entire function simply constructs a new Bytes using an Inner::shallow_clone of the passed Bytes. As the Bytes is immutable, this is safe.

BytesMut::set_len

BytesMut::set_len is an unsafe function on [BytesMut] that sets the length of the initialized portion of the [BytesMut]s buffer. It is documented as requiring the caller to ensure that any accessible data is initialized. Ignoring that the backing store is a slice, the storing of uninitialized data within which is questionably legal, it is sound if used properly.

The implementation simply redirects to Inner::set_len.

<BytesMut as Buf>::advance

<BytesMut as Buf>::advance is a function implementing Buf::advance. It contains one unsafe block calling one unsafe function.

This function is exactly identical to <Bytes as Buf>::advance, and shares its soundness.

<BytesMut as BufMut>::advance_mut

<BytesMut as BufMut>::advance_mut is an unsafe function implementing BufMut::advance_mut. It calls one unsafe function.

The function calls Inner::set_len, and is sound if used properly, given standard concerns about accessible uninitialized memory.

<BytesMut as BufMut>::bytes_mut

<BytesMut as BufMut>::bytes_mut is an unsafe function implementing BufMut::bytes_mut. It calls one unsafe function.

The function calls Inner::as_raw, slices the returned slice to exclusively after the cursor, and returns it. It is sound if used properly, given standard concerns about accessible uninitialized memory.

<BytesMut as BufMut>::put_slice

<BytesMut as BufMut>::bytes_mut is a function implementing BufMut::put_slice. It contains one unsafe block calling two unsafe functions.

It calls two unsafe functions: <BytesMut as BufMut>::bytes_mut and <BytesMut as BufMut>::advance_mut. These are used safely, given standard concerns about accessible uninitialized memory.

<BytesMut as From<&'_ u8>>::from

[<BytesMut as From<&'_ u8>>::from] is a function implementing From<_>::from. It contains one unsafe block calling two unsafe functions.

UNAUDITED

<BytesMut as Extend<u8>>::extend

[<BytesMut as Extend<u8>>::from] is a function implementing Extend<_>::extend. It contains one unsafe block calling two unsafe functions.

UNAUDITED

Inner::with_capacity

Inner::with_capacity is a function on [Inner] which constructs an [Inner] with a specified capacity. It contains one unsafe block calling one unsafe function.

UNAUDITED

Inner::as_ref

Inner::as_ref is a function on [Inner] which returns a slice presenting a view of the shared buffer. It contains one unsafe block calling two unsafe functions.

UNAUDITED

Inner::as_mut

Inner::as_mut is a function on [Inner] which returns a mutable slice presenting a view of the shared buffer. It contains one unsafe block calling two unsafe functions.

The function itself should probably be unsafe, as it assumes that the buffer is not static. While this is guarded by a debug assertion, it would be possible to cause UB in safe release code. However, as this is not a public function, this is only a stylistic concern, and is not properly unsound.

Inner::as_raw

Inner::as_raw is an unsafe function on [Inner] which returns a mutable slice presenting a view of the shared buffer, including potentially uninitialized data. It contains one unsafe block calling two unsafe functions.

UNAUDITED

Inner::put_u8

Inner::put_u8 is a function on [Inner] which sets the next byte in the buffer to the provided byte, and advances the cursor by one. It contains two unsafe blocks, each calling one unsafe function and dereferencing one raw pointer.

UNAUDITED

Inner::inline_ptr

Inner::inline_ptr is an unsafe function on [Inner] which produces a mutable raw pointer to the Inner's inline storage. It calls one call to an unsafe function. It also contains a cast from a shared reference to a mutable raw pointer.

UNAUDITED

note possible ub cast shared reference to mutable pointer

Inner::inline_len

Inner::inline_len is a function on [Inner] which returns the initialized length of the inline buffer. It contains one unsafe block calling one unsafe function.

This exhibits the same synchronization issues as Inner::kind.

Inner::set_len

Inner::set_len is an unsafe function on [Inner] which sets the initialized length of the internal buffer. It contains no unsafe operations, but may be used to expose uninitialized data to safe code.

UNAUDITED

Inner::split_off

Inner::split_off is a function on [Inner] which splits the internal buffer at the provided position. It contains two unsafe blocks, the first calling one unsafe function and the second calling two.

UNAUDITED

Inner::split_to

Inner::split_off is a function on [Inner] which splits the internal buffer at the provided position. It contains two unsafe blocks, the first calling one unsafe function and the second calling two.

UNAUDITED

Inner::truncate

Inner::truncate is a function on [Inner] which may shorten the length of the initialized buffer. It contains one unsafe block, calling one unsafe function.

UNAUDITED

Inner::try_unsplit

Inner::try_unsplit is a function on [Inner] which coalesces two previously split adjacent buffers with a reference counted backing. It contains one unsafe block, calling one unsafe function.

UNAUDITED

Inner::resize

Inner::resize is a function on [Inner] which resizes the initialized area of the buffer. It contains one unsafe block, which calls two unsafe functions.

UNAUDITED

Inner::set_start

Inner::set_start is an unsafe function on [Inner] which shifts data off the front of the buffer. It calls six unsafe functions.

UNAUDITED

Inner::set_end

Inner::set_end is an unsafe function on [Inner] which shifts data off the back of the buffer. It calls one unsafe function.

UNAUDITED

Inner::is_mut_safe

Inner::set_end is a function on [Inner] which checks if it is safe to mutate the backing buffer. It contains one unsafe block, which dereferences one raw pointer.

UNAUDITED

Inner::shallow_clone

Inner::shallow_clone is an unsafe function on [Inner] which either increases the reference count on the shared buffer, if an allocated buffer is in use, or copies the internal buffer. It calls three unsafe functions.

UNAUDITED

Inner::shallow_clone_sync

Inner::shallow_clone_sync is an unsafe function on [Inner] which increases the reference count on the shared buffer, converting from an exclusively allocated buffer if necessary. It calls two unsafe functions.

UNAUDITED

Inner::shallow_clone_arc

Inner::shallow_clone_arc is an unsafe function on [Inner] which increases the reference count on the provided shared buffer. It dereferences one raw pointer.

UNAUDITED

Inner::shallow_clone_vec

Inner::shallow_clone_vec is an unsafe function on [Inner] which converts a exclusively owned buffer to a shared buffer with two references. It calls one unsafe function.

UNAUDITED

Inner::reserve

Inner::reserve is a function on [Inner] which reserves the requested amount of uninitialized space at the end of the buffer. It contains two unsafe blocks, the first calling three unsafe functions, and the second calling one and dereferencing four raw pointers.

UNAUDITED

Inner::kind

Inner::kind is a function on [Inner] which returns the backing storage type. It contains two (conditionally compiled) unsafe blocks, each of which calls one unsafe function.

This function exhibits undefined behaviour under all executions. As it explains in a comment at the top of the function:

This function is going to probably raise some eyebrows. The function
returns true if the buffer is stored inline. This is done by checking
the least significant bit in the `arc` field.

Now, you may notice that `arc` is an `AtomicPtr` and this is
accessing it as a normal field without performing an atomic load...

Again, the function only cares about the least significant bit, and
this bit is set when `Inner` is created and never changed after that.
All platforms have atomic "word" operations and won't randomly flip
bits, so even without any explicit atomic operations, reading the
flag will be correct.

This is undefined behavior due to a data race, but experimental
evidence shows that it works in practice (discussion:
<https://internals.rust-lang.org/t/bit-wise-reasoning-for-atomic-accesses/8853>).

This function is very critical performance wise as it is called for
every operation. Performing an atomic load would mess with the
compiler's ability to optimize. Simple benchmarks show up to a 10%
slowdown using a `Relaxed` atomic load on x86.

Inner::uncoordinated_get_vec_pos

Inner::uncoordinated_get_vec_pos is a function on [Inner]. It contains one unsafe block calling one unsafe function.

This exhibits the same synchronization issues as Inner::kind.

Inner::uncoordinated_set_vec_pos

Inner::uncoordinated_set_vec_pos is a function on [Inner]. It contains one unsafe block calling one unsafe function.

This exhibits the same synchronization issues as Inner::kind.

rebuild_vec

rebuild_vec is a free function. It contains one unsafe block that calls two unsafe functions.

UNAUDITED

release_shared

release_shared is a free function. It contains one unsafe block that dereferences one raw pointer and calls one unsafe function.

UNAUDITED

unsafe impl Send for Inner

unsafe impl Send for Inner is an unsafe trait impl. It asserts that Inner.ptr can safely be sent across thread boundaries.

UNAUDITED

unsafe impl Sync for Inner

unsafe impl Sync for Inner is an unsafe trait impl. It asserts that all accesses to Inner.ptr are synchronized.

UNAUDITED

src/either.rs

src/buf/chain.rs contains three unsafe functions. All are implementations of member functions of the BufMut trait.

<Either<_, _> as BufMut>::bytes_mut

<Either<_, _> as BufMut>::bytes_mut is an unsafe function that returns a mutable slice to the remainder of the object's buffer, though it may return short. The implementation on Either<_, _> simply calls the corresponding function on which contained type is active. It appears free of undefined behaviour provided its invariants are upheld and the called functions are sound.

<Either<_, _> as BufMut>::bytes_vectored_mut

<Either<_, _> as BufMut>::bytes_vectored_mut is an unsafe function that fills a mutable destination slice with potentially multiple std::io::IoSliceMuts. The implementation on Either<_, _> simply calls the corresponding function on which contained type is active. It appears free of undefined behaviour provided its invariants are upheld and the called functions are sound.

<Either<_, _> as BufMut>::advance_mut

<Either<_, _> as BufMut>::advance_mut is an unsafe function that advances the internal cursor of the object implementing BufMut. The implementation on Either<_, _> simply calls the corresponding function on which contained type is active. It appears free of undefined behaviour provided its invariants are upheld and the called functions are sound.

src/buf/buf_mut.rs

src/buf/buf_mut.rs contains four unsafe blocks and thirteen unsafe functions.

BufMut::advance_mut

BufMut::advance_mut is an unsafe required method for the BufMut trait. It advances the implementing object's internal cursor by a requested number of bytes. The documentation is somewhat unclear on what its invariants are, and there are two reasonable interpretations: the first is that it requires that the number of bytes being advanced past within the buffer immediately after the cursor be defined; this is correct in all cases. The second interpretation is that while those bytes may be undefined, you must not read them. This interpretation is problematic as it may be undefined behaviour to have uninitialized memory accessible other than through an untagged union or a raw pointer.

The example seems to indicate that the second interpretation is correct, and under that strict definition of undefined behaviour, the example does exhibit UB.

BufMut::bytes_mut

BufMut::bytes_mut is an unsafe required method which returns the implementing object's byte buffer. Under a loose definition of undefined behaviour (see the section above on BufMut::advance_mut), this is a perfectly acceptable API. Under a stricter definition of UB, this function may only be called if the caller is sure that the entire buffer is initialized. The documentation does not reflect this potential restriction.

Again, under the strict definition of UB, the example is incorrect.

BufMut::bytes_vectored_mut

BufMut::bytes_vectored_mut is an unsafe provided method which is readv to BufMut::bytes_mut's read. It is potentially unsound for the same reasons.

BufMut::put

BufMut::put is a function containing two unsafe blocks that copies bytes from a provided Buf into the implementing object.

The first unsafe block involves two unsafe function calls.

The first call uses BufMut::bytes_mut to get the slice to be copied into by the second. Under the strict interpretation of UB, this is unsound as the returned slice may contain uninitialized bytes, which cannot be accessible except through an untagged union or raw pointer. Under a looser interpretation of UB, this call is fine, as the slice is never read from.

The second call uses std::ptr::copy_nonoverlapping to copy bytes from an input buffer to the implementing object's buffer, via an immutable and mutable reference, respectively. Rust's ownership rules ensure this block is safe, as the destination cannot be aliased, and the source cannot be mutably aliased. A buffer overrun is prevented by bounding count to the minimum of the length of the source and destination slices. As additional safety, an assertion requires that the amount of spare buffer space in the implementing object is greater than or equal to the remaining amount of bytes in the provided Buf.

The second unsafe block uses BufMut::advance_mut to advance the implementing object's cursor by the number of bytes copied into the destination slice. This is safe, as those bytes have been initialized by the earlier unsafe block.

Stylistic concerns suggest that either the first unsafe block should be broken up into two blocks, or that both unsafe blocks should be combined.

BufMut::put_slice

BufMut::put_slice is a function containing two unsafe blocks that copies bytes from a provided slice into the implementing object. As byte slices implement Buf, this function simply is a specialized implementation of a subset of BufMut::puts functionality.

The first unsafe block involves two unsafe function calls.

The first call uses BufMut::bytes_mut to get the slice to be copied into. See the discussion of the similar call within BufMut::put, above.

The second call uses std::ptr::copy_nonoverlapping to copy the provided slice into the implementing object's buffer. See the discussion of the similar call within BufMut::put, above.

The second unsafe block uses BufMut::advance_mut to advance the implementing object's cursor by the number of bytes copied into the destination slice. See the discussion of the similar call within BufMut::put, above.

Stylistic concerns suggest that either the first unsafe block should be broken up into two blocks, or that both unsafe blocks should be combined.

<&'_ mut _ as BufMut>::bytes_mut

<&'_ mut _ as BufMut>::bytes_mut is an unsafe function implementing BufMut::bytes_mut. As it simply defers to the concrete implementation through the reference, it has the same soundness properties as the referenced object.

<&'_ mut _ as BufMut>::bytes_vectored_mut

<&'_ mut _ as BufMut>::bytes_vectored_mut is an unsafe function implementing BufMut::bytes_vectored_mut. As it simply defers to the concrete implementation through the reference, it has the same soundness properties as the referenced object.

<&'_ mut _ as BufMut>::advance_mut

<&'_ mut _ as BufMut>::advance_mut is an unsafe function implementing BufMut::advance_mut. As it simply defers to the concrete implementation through the reference, it has the same soundness properties as the referenced object.

<Box<_> as BufMut>::bytes_mut

<Box<_> as BufMut>::bytes_mut is an unsafe function implementing BufMut::bytes_mut. As it simply defers to the concrete implementation through the reference, it has the same soundness properties as the referenced object.

<Box<_> as BufMut>::bytes_vectored_mut

<Box<_> as BufMut>::bytes_vectored_mut is an unsafe function implementing BufMut::bytes_vectored_mut. As it simply defers to the concrete implementation through the reference, it has the same soundness properties as the referenced object.

<Box<_> as BufMut>::advance_mut

<Box<_> as BufMut>::advance_mut is an unsafe function implementing BufMut::advance_mut. As it simply defers to the concrete implementation through the reference, it has the same soundness properties as the referenced object.

<&'_ mut &#x91;u8&#x93; as BufMut>::bytes_mut

[<&'_ mut &$x91;u8&#x93; as BufMut>::bytes_mut] is an unsafe function implementing BufMut::bytes_mut. It simply returns a mutable reference to the slice. Assuming that the slice is completely initialized when the function is called, it is clearly sound. If the slice is not fully initialized, as the documentation implies is a valid use, it may be unsound.

<&'_ mut &#x91;u8&#x93; as BufMut>::advance_mut

<&'_ mut &#x91;u8&#x93; as BufMut>::advance_mut is an unsafe function implementing BufMut::advance_mut. It calls no unsafe functions, and, assuming the slice is completely initialized when the function is called, it is clearly sound. If the slice is not fully initialized, as the documentation implies is a valid use, it may be unsound.

<Vec<u8> as BufMut>::advance_mut

<Vec<u8> as BufMut>::advance_mut is an unsafe function implementing BufMut::advance_mut. If necessary, it reserves sufficient additional capacity to set the length of the vector using Vec::set_len to the vector's current length plus the provided usize. This may be unsound if access to uninitialized memory except through a raw pointer or untagged union is undefined behaviour.

<Vec<u8> as BufMut>::bytes_mut

<Vec<u8> as BufMut>::bytes_mut is an unsafe function implementing BufMut::bytes_mut. It uses [std::slice::from_raw_parts_mut] to convert the vector into a mutable slice and then slices the slice to everything between the vector's length and the vector's maximum capacity. If there is no capacity beyond the length, 64 bytes of capacity are first reserved. This may be unsound if access to uninitialized memory except through a raw pointer or untagged union is undefined behaviour.

src/buf/buf.rs

src/buf/buf.rs contains one unsafe block.

Buf::copy_to_slice

Buf::copy_to_slice is a default method for the Buf trait. It uses std::ptr::copy_nonoverlapping to copy the elements of a &[u8] returned by another Buf method on self to a &mut [u8] passed as an argument. Rust's ownership rules ensure that this function is safe, as the destination cannot be aliased, and the source cannot be mutably aliased. A buffer overrun is prevented by bounding count to the minimum of the length of the source and destination slices.

src/buf/chain.rs

src/buf/chain.rs contains three unsafe functions. All are implementations of member functions of the BufMut trait.

<Chain<_, _> as BufMut>::bytes_mut

<Chain<_, _> as BufMut>::bytes_mut is an unsafe function that returns a mutable slice to the remainder of the object's buffer, though it may return short. The implementation on Chain<_, _> simply calls the corresponding function on its first object if it is not yet exhausted; otherwise, it calls it on its second. It appears free of undefined behaviour provided its invariants are upheld and the called functions are sound.

<Chain<_, _> as BufMut>::bytes_vectored_mut

<Chain<_, _> as BufMut>::bytes_vectored_mut is an unsafe function that fills a mutable destination slice with potentially multiple std::io::IoSliceMuts. The implementation on Chain<_, _> first calls the corresponding function on its first object, and then calls the function on its second object, but offsets the slice by the number of std::io::IoSliceMuts filled in the destination slice by the first function. It appears free of undefined behaviour provided its invariants are upheld and the called functions are sound.

<Chain<_, _> as BufMut>::advance_mut

<Chain<_, _> as BufMut>::advance_mut is an unsafe function that advances the internal cursor of the object implementing BufMut. The implementation on Chain<_, _> first advances its first object by the requested number of bytes, or to saturation. If it is saturated, it advances its second object by the requested number of bytes minus the number of bytes required to saturate the first object. Both advancements are done using the corresponding function for each object. It appears free of undefined behaviour provided its invariants are upheld and the called functions are sound.

Footnotes

  1. It's possible that this was known to be UB, but was just not documented as such.

  2. I checked the Nomicon, and it didn't provide clear guidance on this. I'd imagine that it's illegal though, as it's not specially brought up as sound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment