Skip to content

Instantly share code, notes, and snippets.

@KodrAus
Last active November 4, 2021 05:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KodrAus/2ba698aa7b1f4abe721e79dc98af2c24 to your computer and use it in GitHub Desktop.
Save KodrAus/2ba698aa7b1f4abe721e79dc98af2c24 to your computer and use it in GitHub Desktop.
Serializing fixed-size arrays in serde

Summary

Add the following method to Serializer:

fn serialize_byte_array<const N: usize>(self, bytes: &[u8; N]) -> Result<Self::Ok, Self::Error> {
    self.serialize_bytes(bytes)
}

to support cases where a binary format can take advantage of the fact that a byte slice has a fixed size.

Background

Motivations

In uuid we've been looking at the trade-offs of various representations for a 128bit value for binary formats. The current options are:

  1. Serializer::serialize_bytes using &[u8]. This is a natural fit, but requires a redundant length field, even though the value is guaranteed to always be 16 bytes. That redundancy may result in anywhere from 1 to 8 additional bytes of overhead.
  2. Serializer::serialize_tuple using [T; N]. This can avoid the need for a redundant field, but the lazy serialization may impact performance. It can also introduce more overhead in other formats that don't encode tuples as sequences.
  3. Serializer::serialize_u128 using u128. This can avoid the drawbacks of the above approaches, but support is still spotty. Some formats that need to interoperate with other languages simply won't or can't support 128bit numbers.

Many of these approaches comes with drawbacks that are problematic for different groups of end-users. Each can be mitigated with the introduction of a new serialize_byte_array method:

  1. Formats can optimize away the redundant length, since the datatype has explicitly declared the byte array as having a fixed size.
  2. Formats can serialize the array in-place without needing to run through the lazy machinery.
  3. Support for byte array is universal.

Proposal

Add a new complimentary method to Serializer::serialize_bytes that allows datatypes to communicate to a format that the byte buffer has a fixed length. The format may choose to optimize that case by treating the byte buffer as a tuple instead of as a slice. serde considers [T; N] to be equivalent to (T, ..N). This proposal doesn't attempt to change that.

A datatype that serializes using serialize_byte_array may need to support deserializing through any of:

  • Deserializer::deserialize_bytes
  • Deserializer::deserialize_seq
  • Deserializer::deserialize_tuple

depending on how formats consider fixed-size byte arrays.

Formats like bincode will need to be updated to make use of this new method in coordination with a serde release that enabled them, so they have a chance to decide what semantics they want before inheriting the default.

Drawbacks

This is an arguably niche case that increases the burden on formats. It requires coordination and consideration to support.

It may also not be possible for serde's MSRV to parse the const N: usize syntax.

Alternatives

Avoid const generics in favor of something like:

fn serialize_byte_array(self, bytes: &[u8]) -> Result<Self::Ok, Self::Error> { .. }

where the length is implicitly fixed by the length of the passed in slice.

Should there be an equivalent Deserializer::deserialize_byte_array method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment