Skip to content

Instantly share code, notes, and snippets.

@toksdotdev
Last active May 9, 2022 16:21
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save toksdotdev/343f82e3df87dd0081033447801ae85d to your computer and use it in GitHub Desktop.
Save toksdotdev/343f82e3df87dd0081033447801ae85d to your computer and use it in GitHub Desktop.
An explanation of why the size of Option<T> is always often double the size of T

Someone asked a question some weeks back about why size_of::<Option<T>> is always double. Answer is because of alignment.

Explanation

How are Rust enum represented in C?

C doesn't have the ability to directly represent complex Rust enum, hence, the need for a workaround. To understand that, let's take a look at how Option<i32> is represnted in C.

E.g. Given:

#[repr(C)]
enum Option {
    Some(i32),
    None
}

It will get expanded to the following Rust code which is compatible with C:

/// This is our focus
#[repr(C)]
struct OptionRepr {
    tag: OptionDiscriminant,
    payload: OptionUnion
}

/// Before now, I didn't know there was a `Union` type in Rust. Apparently
/// it requires unsafe for performing operations on it. You can read more here: 
/// https://doc.rust-lang.org/reference/items/unions.html). 

#[repr(C)]
union OptionUnion {
    Some: OptionSomeVariant,
    None: OptionNoneVariant
}

#[repr(C)]
struct OptionSomeVariant(i32)

#[repr(C)]
struct OptionNoneVariant;

With the above, you'll noticed that to properly represent an enum, you'll require:

  1. OptionUnion which holds the variants of the enum (OptionSomeVariant and OptionNoneVariant), and
  2. OptionDiscriminant which holds the keys of the variant (just like a dictionary key).

The above are then combined to have the OptionRepr. That means if we ever want to access the field in an Option, we just do something like option_repr.payload[option_repr.tag] (just like a dictionary).

For reference, there's a part in Rust book explains how enums are expanded to Unions and struct.

Why is the size of an enum always double size_of(T)?

For context:

println!("{}", std::mem::size_of::<i32>()); // 4
println!("{}", std::mem::size_of::<Option<i32>>()); // 8

Still using our Rust-to-C representation of Option<i32>:

  • size_of::<OptionRepr>() == size_of::<OptionDiscriminant>() + size_of::<OptionUnion>().

Let's do some maths:

  • size_of::<i32>() == 4
  • size_of::<OptionDiscriminant>() == 1
  • size_of::<OptionSomeVariant>() == size_of::<i32>() == 4
  • size_of::<OptionNoneVariant>() == 1
  • size_of::<OptionUnion>() == max(size_of::<OptionSomeVariant>, size_of::<OptionNoneVariant>()) == max(1, 4) == 4
  • size_of::<OptionRepr>() == size_of::<OptionDiscriminant>() + size_of::<OptionUnion> == 1 + 4 == 5

By our calculation above, that means the size of our enum by default is 5 (which is very different from the 8 we were expecting).

Why is there a disparity between 5 and 8?

That's because of the concept of alignment. Aligment allows us know the address a value can be stored in, and it usually a multiple of 2. Because of it's a multiple of a known value, along with the size of the value being stored, we can:

  • Easily allocate memory before-hand,
  • Know the upper and lower bound position of the value in memory,
  • It's location in memory is more-deterministic.

To learn more about alignment, you can read this article.

Let's find out what the alignment is for our example:

println!("{}", std::mem::align_of::<i32>()); // 4
println!("{}", std::mem::align_of::<Option<i32>>()); // 4

What the above tells us is that, the alignment for Option<i32> is always the alignment of i32.

Relationship between Size & Alignment?

Now that we know the alignment, to get the size of any value, we basically round up the size of the value to the next multiple of the alignment.

To explain the above, lets look at this function:

fn size_by_alignment(raw_size_of_t: usize, alignment_of_t: usize) -> usize {
    let padding = alignment_of_t - (raw_size_of_t % alignment_of_t);

    raw_size_of_t + padding
}

You'll notice that alignment is basically rounding up the original size of T to the next multiple of it's alignment.

That means, for our example OptionRepr, our function will be called as size_by_alignment(5, 4), which returns 8 because that's the neareast alignment multiple for our original size.

That is the reason why size_of::<Option<i32>>() is always 8 because it's rounded off to the next alignment value from 5 to 8.

Exception for size doubling

There's an exception, and that's for simple enum types. And in such cases, the size remains 1 even when wrapped in an Option.

Let's look at an example:

enum Animal {
  Chicken,
  Goat,
}

The above will be represented in C as:

#[repr(C)]
enum Animal {
  Chicken = 0,
  Goat = 1
}

Let's get the size of the Animal enum:

println!("{}", std::mem::size_of::<Animal>()); // 1
println!("{}", std::mem::size_of::<Option<Animal>>()); // 1

Explanation

Since the above size is directly compatible with C enums, there's no need for us to represent is as we did for OptionRepr because it's an overkill. That will mean, the size of Animal will always size_of::<u8>(), which is 1. That's because the value of simple enum are stored as u8 representation in memory.

Remember, if we then proceed to call size_by_alignment(1, 1), we'll always still get a 1 which is consistent with the explanation given earlier.

Conclusion

Explaining the reasoning behind alignment is outside the scope of this article, but this article does a very good job explaining it.

@gilescope
Copy link

gilescope commented May 5, 2022

Nice. Also worth remembering the superpowers of Option<NonZeroI32>
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=738bb6ba20d11063f9eb45ec959ada5b

(If you need zero there's also the nonmax crate https://crates.io/crates/nonmax )

@klara-meyer
Copy link

Besides enums and the NonZero types, there are even more types T where the size of Option<T> does not double. This is called niche filling, and happens whenever the compiler can determine that a certain value (usually the zero value) is uninhabited for a type.

  • Option<&T> for any type T. This is because there are no null references in Rust (just constructing one using unsafe methods is undefined behaviour). The same holds for other references and smart pointers such as &dyn T, Box<T>, Rc<T> or Arc<T>.
  • Option<bool> and Option<char>. Although one could argue that both bool and char are essentially enums.

And just some minor comments regarding enums:

  • The size of a simple enum is not always 1 - if there are more than 256 variants, Rust will chose a larger type than u8.
  • For a simple enum T with exactly 256 values the size of Option<T> will also double.
  • For the never type ! or the empty enum T {} we have size_of<T> = size_of<Option<T>> = 0.
  • The niche optimization also works with non-simple enums such as enum T { A(u32), B(char), C }.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment