Skip to content

Instantly share code, notes, and snippets.

@mahkoh
Last active February 15, 2021 20:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mahkoh/eb95f518a8a4b25237ffd2edd2730c6b to your computer and use it in GitHub Desktop.
Save mahkoh/eb95f518a8a4b25237ffd2edd2730c6b to your computer and use it in GitHub Desktop.

Summary

This RFC

  • clarifies the behavior of repr(C),
  • defines the term equivalent C type,
  • soft-deprecates repr(packed) on repr(C) types,
  • adds repr(pragma_pack) on repr(C) types, and
  • adds repr(simple) whose behavior is the behavior of repr(C) today.

This is a breaking change because

  • it changes the layout of certain repr(C) types on certain Windows targets and
  • it removes support for repr(C, align) enums.

Motivation

repr(C) was originally conceived to give structs the same layout as the equivalent struct in C. In 2017, language was added to the reference that specifies the behavior of repr(C) in terms of algorithms. This change was made without an RFC and the behavior described by these algorithms is not the behavior of MSVC. This violates the original intent of repr(C).

Since then, these algorithms have also found their way into the standard library in form of the Layout::extend function which makes guarantees about the layout of repr(C) structs.

The changes in this RFC restore the original design of repr(C), increasing the portability of repr(C), allowing us to fix the behavior of repr(C) on MSVC targets. For users that rely on the precise layout currently described in the reference, it provides an upgrade path in the form of repr(simple).

Guide-level explanation

This section describes the changes in terms of changes to the reference and standard library.

Changes to Layout::extend

All mentions of repr(C) in the doc comment are replaced by repr(simple).

Changes to the current The C Representation section

The section The C Representation in the reference is renamed to The simple Representation. The following changes are made to its body:

The C representation is designed for dual purposes. One purpose is for creating types that are interoperable with the C Language. The second purpose is to create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type.

Because of this dual purpose, it is possible to create types that are not useful for interfacing with the C programming language.

The layout of a container type annotated with repr(simple) depends only on the layout of its fields but not on the target platform or the Rust version.

This representation can be applied to structs, unions, and enums. The exception is zero-variant enums for which the C representation is an error.

#[repr(C)] #[repr(simple)] Structs

...

Note: This algorithm can produce zero-sized structs. In C, an empty struct declaration like struct Foo { } is illegal. However, both gcc and clang support options to enable such structs, and assign them size zero. C++, in contrast, gives empty structs a size of 1, unless they are inherited from or they are fields that have the [[no_unique_address]] attribute, in which case they do not increase the overall size of the struct.

#[repr(C)] #[repr(simple)] Unions

A union declared with #[repr(C)] will have the same size and alignment as an equivalent C union declaration in the C language for the target platform. The union A union declared with #[repr(simple)] will have a size of the maximum size of all of its fields rounded to its alignment, and an alignment of the maximum alignment of all of its fields. These maximums may come from different fields.

...

#[repr(C)] #[repr(simple)] Field-less Enums

For [field-less enums], the C representation has the size and alignment of the default enum size and alignment for the target platform's C ABI.

For [field-less enums], the layout of the simple representation depends on the range of the discriminants. The layout is chosen from the layouts of the following types, choosing the first type that can represent all discriminants:

  1. i32, u32
  2. i64, u64
  3. i128, u128

NOTE: While repr(C) on field-less enums was designed to use the representation used by C, the current implementation always uses i32 as the base type.

Note: The enum representation in C is implementation defined, so this is really a "best guess". In particular, this may be incorrect when the C code of interest is compiled with certain flags.

Warning: There are crucial differences between an enum in the C language and Rust's [field-less enums] with this representation. An enum in C is mostly a typedef plus some named constants; in other words, an object of an enum type can hold any integer value. For example, this is often used for bitflags in C. In contrast, Rust’s [field-less enums] can only legally hold the discrimnant values, everything else is [undefined behavior]. Therefore, using a field-less enum in FFI to model a C enum is often wrong.

#[repr(C)] #[repr(simple)] Enums With Fields

The representation of a repr(C) repr(simple) enum with fields is a repr(C) repr(simple) struct with two fields, also called a "tagged union" in C:

  • a repr(simple) version of the enum with all fields removed ("the tag")
  • a repr(simple) union of repr(simple) structs for the fields of each variant that had them ("the payload")

Note: Due to the representation of repr(C) repr(simple) structs and unions, if a variant has a single field there is no difference between putting that field directly in the union or wrapping it in a struct; any system which wishes to manipulate such an enum's representation may therefore use whichever form is more convenient or consistent for them.

...

NOTE: At the end of the body are examples. Here all mentions of repr(C) are replaced by repr(simple) and references to C and C++ are removed.

The new The C Representation section

A new repr(C) section is added in place of the old one. The body text is as follows:

The C representation is designed for interoperability with the C Language, including various vendor extensions. The layout of a type annotated with repr(C) is defined in terms of its equivalent C type and the target the program is compiled for. For this purpose, for each target, we define a normative compiler whose layout algorithm will be used to determine the layout of the equivalent C type. Here, layout means the size and alignment of the type and, for record types, the offset and size of its fields.

NOTE: GCC, Clang, and MSVC allow the user to specify flags on the command line that change the default layout of types. Such custom default representations are currently not supported. (For example: -fshort-enums and /Zp1.)

Whether the equivalent C type exists depends on the normative compiler. The algorithm used to compute the equivalent C type is described in Appendix B. If a Rust type annotated with repr(C) does not have an equivalent C type, then the layout of the Rust type is unspecified.

This representation can be applied to structs, unions, and enums.

Warning: There are crucial differences between an enum in the C language and Rust's [field-less enums] with this representation. An enum in C is mostly a typedef plus some named constants; in other words, an object of an enum type can hold any integer value. For example, this is often used for bitflags in C. In contrast, Rust’s [field-less enums] can only legally hold the discrimnant values, everything else is [undefined behavior]. Therefore, using a field-less enum in FFI to model a C enum is often wrong.

Alignment Modifiers

On structs and unions, the C representation can be combined with the following alignment modifiers:

  • repr(align(N)): This is equivalent to __declspec(align(N)) and __attribute__((aligned(N))) in C.
  • repr(pragma_pack(N)): This is equivalent to #pragma pack(N) in C.
  • repr(packed): This is an alias for repr(pragma_pack(1)).
  • repr(packed(N)): This is an alias for repr(pragma_pack(N)).

NOTE: repr(packed) and repr(packed(N)) on repr(C) types is deprecated. A future version of rustc will emit a warning when these annotations are used together.

Changes to the Primitive representations section

All mentions of repr(C) in this section are replaced by repr(simple). References to C and C++ are removed.

Changes to the The alignment modifiers section

The following paragraph is added:

This section describes the behavior for representations other than the C representation. For the behavior of these modifiers for repr(C) types, see the section describing the C representation.

Changes to the The transparent Representation section

All mentions of repr(C) in this section are replaced by repr(simple).

Reference-level explanation

TODO: Almost everything is already described in the section above. Here we add some details and talk about lints etc.

Drawbacks

  • This changes the layout of repr(C) types on some Windows targets.
  • This adds an additional type representation.
  • This adds a new alignment modifier on repr(C) types that behaves exactly like one of the existing modifiers.
  • This defines the layout of repr(C) types in terms of specific C compiler implementations.

Rationale and alternatives

TODO

  • In Clang and GCC there are two kinds of packing attributes:

    • __attribute__((packed))
    • #pragma pack

    These behave significantly differently. In anticipation of us adding support for __attribute__((packed)) in the future, we deprecate the ambiguously named repr(packed).

Prior art

TODO

Unresolved questions

TODO

Future possibilities

TODO

Appendix A

This appendix contains a map from Rust targets to their normative C compilers.

GCC Targets

  • aarch64-unknown-linux-gnu
  • aarch64-unknown-linux-musl
  • aarch64-wrs-vxworks
  • arm-unknown-linux-gnueabi
  • arm-unknown-linux-gnueabihf
  • arm-unknown-linux-musleabi
  • arm-unknown-linux-musleabihf
  • armv4t-unknown-linux-gnueabi
  • armv5te-unknown-linux-gnueabi
  • armv5te-unknown-linux-musleabi
  • armv5te-unknown-linux-uclibceabi
  • armv7-unknown-linux-gnueabi
  • armv7-unknown-linux-gnueabihf
  • armv7-unknown-linux-musleabi
  • armv7-unknown-linux-musleabihf
  • armv7-wrs-vxworks-eabihf
  • avr-unknown-gnu-atmega328
  • i586-unknown-linux-gnu
  • i586-unknown-linux-musl
  • i686-pc-windows-gnu
  • i686-unknown-linux-gnu
  • i686-unknown-linux-musl
  • i686-uwp-windows-gnu
  • i686-wrs-vxworks
  • mips64el-unknown-linux-gnuabi64
  • mips64el-unknown-linux-muslabi64
  • mips64-unknown-linux-gnuabi64
  • mips64-unknown-linux-muslabi64
  • mipsel-unknown-linux-gnu
  • mipsel-unknown-linux-musl
  • mipsel-unknown-linux-uclibc
  • mipsisa32r6el-unknown-linux-gnu
  • mipsisa32r6-unknown-linux-gnu
  • mipsisa64r6el-unknown-linux-gnuabi64
  • mipsisa64r6-unknown-linux-gnuabi64
  • mips-unknown-linux-gnu
  • mips-unknown-linux-musl
  • mips-unknown-linux-uclibc
  • powerpc64le-unknown-linux-gnu
  • powerpc64le-unknown-linux-musl
  • powerpc64-unknown-linux-gnu
  • powerpc64-unknown-linux-musl
  • powerpc64-wrs-vxworks
  • powerpc-unknown-linux-gnu
  • powerpc-unknown-linux-musl
  • powerpc-wrs-vxworks
  • riscv32gc-unknown-linux-gnu
  • riscv64gc-unknown-linux-gnu
  • s390x-unknown-linux-gnu
  • sparc64-unknown-linux-gnu
  • sparc-unknown-linux-gnu
  • thumbv7neon-unknown-linux-gnueabihf
  • thumbv7neon-unknown-linux-musleabihf
  • x86_64-linux-kernel
  • x86_64-pc-windows-gnu
  • x86_64-unknown-linux-gnu
  • x86_64-unknown-linux-gnux32
  • x86_64-unknown-linux-musl
  • x86_64-uwp-windows-gnu
  • x86_64-wrs-vxworks

Clang Targets

  • aarch64-apple-darwin
  • aarch64-apple-ios
  • aarch64-apple-ios-macabi
  • aarch64-apple-tvos
  • aarch64-fuchsia
  • aarch64-linux-android
  • aarch64-unknown-freebsd
  • aarch64-unknown-hermit
  • aarch64-unknown-netbsd
  • aarch64-unknown-none
  • aarch64-unknown-none-softfloat
  • aarch64-unknown-openbsd
  • aarch64-unknown-redox
  • armebv7r-none-eabi
  • armebv7r-none-eabihf
  • arm-linux-androideabi
  • armv6-unknown-freebsd
  • armv6-unknown-netbsd-eabihf
  • armv7a-none-eabi
  • armv7a-none-eabihf
  • armv7-apple-ios
  • armv7-linux-androideabi
  • armv7r-none-eabi
  • armv7r-none-eabihf
  • armv7s-apple-ios
  • armv7-unknown-freebsd
  • armv7-unknown-netbsd-eabihf
  • asmjs-unknown-emscripten
  • hexagon-unknown-linux-musl
  • i386-apple-ios
  • i686-apple-darwin
  • i686-linux-android
  • i686-unknown-freebsd
  • i686-unknown-haiku
  • i686-unknown-netbsd
  • i686-unknown-openbsd
  • mipsel-sony-psp
  • mipsel-unknown-none
  • msp430-none-elf
  • powerpc64-unknown-freebsd
  • powerpc-unknown-linux-gnuspe
  • powerpc-unknown-netbsd
  • powerpc-wrs-vxworks-spe
  • riscv32imac-unknown-none-elf
  • riscv32imc-unknown-none-elf
  • riscv32i-unknown-none-elf
  • riscv64gc-unknown-none-elf
  • riscv64imac-unknown-none-elf
  • sparc64-unknown-netbsd
  • sparc64-unknown-openbsd
  • sparcv9-sun-solaris
  • thumbv4t-none-eabi
  • thumbv6m-none-eabi
  • thumbv7em-none-eabi
  • thumbv7em-none-eabihf
  • thumbv7m-none-eabi
  • thumbv7neon-linux-androideabi
  • thumbv8m.base-none-eabi
  • thumbv8m.main-none-eabi
  • thumbv8m.main-none-eabihf
  • wasm32-unknown-emscripten
  • wasm32-unknown-unknown
  • wasm32-wasi
  • x86_64-apple-darwin
  • x86_64-apple-ios
  • x86_64-apple-ios-macabi
  • x86_64-apple-tvos
  • x86_64-fortanix-unknown-sgx
  • x86_64-fuchsia
  • x86_64-linux-android
  • x86_64-pc-solaris
  • x86_64-rumprun-netbsd
  • x86_64-sun-solaris
  • x86_64-unknown-dragonfly
  • x86_64-unknown-freebsd
  • x86_64-unknown-haiku
  • x86_64-unknown-hermit
  • x86_64-unknown-hermit-kernel
  • x86_64-unknown-illumos
  • x86_64-unknown-l4re-uclibc
  • x86_64-unknown-netbsd
  • x86_64-unknown-openbsd
  • x86_64-unknown-redox

MSVC Targets

  • aarch64-pc-windows-msvc
  • aarch64-uwp-windows-msvc
  • i586-pc-windows-msvc
  • i686-pc-windows-msvc
  • i686-unknown-uefi
  • i686-uwp-windows-msvc
  • thumbv7a-pc-windows-msvc
  • thumbv7a-uwp-windows-msvc
  • x86_64-pc-windows-msvc
  • x86_64-unknown-uefi
  • x86_64-uwp-windows-msvc

Appendix B

This appendix describes the algorithm used to compute the equivalent C type of a Rust type. The equivalent C type is only used for layout computations.

Let R be a Rust type.

  1. If R is one of u8, i8, u16, i16, u32, i32, u64, i64, isize, or usize, then the equivalent C type is the first of the following C types that has the same size as R:

    1. char,
    2. short,
    3. int,
    4. long,
    5. long long.

    NOTE: On all targets supported by Rust, if two of these C types have the same size, then they also have the same alignment.

  2. If R is one of f32 or f64, then the equivalent C type is the first of the following C types that has the same size as R:

    1. float,
    2. double,
    3. long double,

    NOTE: On avr-unknown-unknown, double is a 32-bit type.

  3. If R is bool, then the equivalent C type is _Bool.

  4. If R is u128 or i128 and the __int128 type is supported by the normative compiler, then the equivalent C type is __int128.

  5. If R is a pointer to a sized type, or a reference to a sized type, or a function pointer, or an Option of a reference to a sized type, or an Option of a function pointer, then the equivalent C type is void *.

  6. If R is an array whose element type has an equivalent C type, then the equivalent C type of R is the array with the same size whose element type is the equivalent C type of R's element type.

    NOTE: In positions where the normative compilers accept both zero-sized array members and flexible array members, the containing records have the same layout.

  7. If R is a fieldless enum with a repr(C) annotation and without a repr(align) annotation:

    1. Let E be the C enum with the same enumeration constants and constant expressions. (This is a purley syntactic construction as the resulting syntax need not be accepted by the normative compiler.)
    2. If E is accepted by the normative compiler and the enumeration constants are the same as the discriminant values of R, then E is the equivalent C type.
    3. Otherwise R has no equivalent C type.

    NOTE: MSVC truncates enumeration constants to int.

  8. If R is an enum with fields with a repr(C) annotation and without a repr(align) annotation:

    1. Let F be the equivalent fieldless Rust enum.
    2. Let U be the union containing one field for each variant of the enum that has at least one field. The type of the field is the tuple or braced struct having the same fields as the enum variant body.
    3. Let E be the struct struct { d: F, u: U }.
    4. The equivalent C type of the enum is the equivalent C type of E, if any.

    NOTE: There are no fields in U for variants without fields because the normative compiler might not accept structs without fields. Since R is an enum with fields, U itself has at least one field.

    NOTE: It is significant that the discriminant is stored outside the union. This can affect the layout of the overall type.

  9. If R is a tuple struct with a repr(C) annotation:

    1. Let S be the braced struct with the same number of fields whose fields are called field1, field2, etc. and whose field types are the types of the fields in the tuple struct.
    2. The equivalent C type is the equivalent C type of S, if any.
  10. If R is a braced struct or a unit-like struct or a union with a repr(C) annotation:

    1. If one of the types of the fields does not have an equivalent C type, then R has no equivalent C type.
    2. Let S be the C struct or union (depending on the type of R) with the same field names and equivalent C field types. (This is a purley syntactic construction as the resulting syntax need not be accepted by the normative compiler.)
    3. If R is annotated with repr(align(N)):
      1. If the normative compiler is MSVC, annotate S with __declspec(align(N)).
      2. If the normative compiler is GCC or Clang, annotate S with __attribute__((aligned(N))).
    4. If R is annotated with repr(pragma_pack(N)), annotate S with #pragma pack(N).
    5. If S is accepted by the normative compiler, then E is the equivalent C type of R.
  11. Otherwise R has no equivalent C type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment