Skip to content

Instantly share code, notes, and snippets.

@naftulikay
Last active July 27, 2020 19:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save naftulikay/02eecbb352a7b1504016a865e58c5cbd to your computer and use it in GitHub Desktop.
Save naftulikay/02eecbb352a7b1504016a865e58c5cbd to your computer and use it in GitHub Desktop.

Write::write_vectored Collection

Current usage is for writing PhatNoise database files and indices to disk.

Links

users.rust-lang.org questions:

man pages:

Rust docs:

Rust crates:

GitHub issues and pull requests:

Code

Single Allocation, Linear Memory

type Offset = u32;

struct Offsets(Vec<Offset>);

impl Offsets {
    fn serialize(&self, output: &mut impl io::Write) -> io::Result<()> {
        let mut buffer = Vec::with_capacity(size_of::<Offset>() + (size_of::<Offset>() * self.0.len());
       
        // insert the count
        buffer.extend_from_slice(&(self.0.len() as u32).to_le_bytes());

        // insert the offsets
        for offset in &self.0 {
            buffer.extend_from_slice(&offset.to_le_bytes());
        }

        // dump
        output.write_all(buffer.as_slice())       
    }
}

Constant Memory via Fixed-Size Buffer

unimplemented!()

The general idea is to allocate a fixed-size slice (e.g 4096 bytes in size) and when this buffer fills, write it to the output. This is basically the same thing as a BufWriter so it's not worth implementing.

Constant Memory via IoSlice

use std::io;
use std::mem::size_of;

type Offset = u32;

struct Offsets(Vec<Offset>);

impl Offsets {
    #[cfg(target_endian = "little")]
    fn serialize(&self, mut output: impl io::Write) -> io::Result<()> {
        let len = (self.0.len() as u32).to_le_bytes();

        let offsets: &[u8] = zerocopy::AsBytes::as_bytes(self.0.as_slice());

        let to_write = len.len() + offsets.len();
        let mut written = 0;
        while written < to_write {
            if written < 4 {
                let io_slices = [
                    io::IoSlice::new(&len[written..]),
                    io::IoSlice::new(offsets),
                ];

                written += output.write_vectored(&io_slices)?;
            } else {
                written += output.write(&offsets[(written-4)..])?;
            }
        }

        Ok(())
    }
    #[cfg(target_endian = "big")]
    fn serialize(&self, mut output: impl io::Write) -> io::Result<()> {
        let mut buffer = Vec::with_capacity(size_of::<u32>() + (size_of::<u32>() * self.0.len()));

        // insert the count
        buffer.extend_from_slice(&(self.0.len() as u32).to_le_bytes());

        // insert the offsets
        for offset in &self.0 {
            buffer.extend_from_slice(&offset.to_le_bytes());
        }

        // dump
        output.write_all(buffer.as_slice())
    }
}

In the best case, zero allocations are performed, and the code is largely zero-copy if at all possible. The only allocations that may occur would be within Write, and we can't control that. Memory overhead is two usize pointers, so 16 bytes, regardless of how large the underlying vector is. This is only possible on little-endian hardware. On big-endian hardware, we need a buffer, so we fall back to the first implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment