- Feature Name:
repr_simple
- Start Date: 2021-02-15
- RFC PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
This RFC
- clarifies the behavior of
repr(C)
, - defines the term equivalent C type,
- soft-deprecates
repr(packed)
onrepr(C)
types, - adds
repr(pragma_pack)
onrepr(C)
types, and - adds
repr(simple)
whose behavior is the behavior ofrepr(C)
today.
This is a breaking change because
- it changes the layout of certain
repr(C)
types on certain Windows targets and - it removes support for
repr(C, align)
enums.
repr(C)
was originally conceived to give structs the same layout as the
equivalent struct in C. In 2017, language was added to the reference that
specifies the behavior of repr(C)
in terms of algorithms. This change was made
without an RFC and the behavior described by these algorithms is not the
behavior of MSVC. This violates the original intent of repr(C)
.
Since then, these algorithms have also found their way into the standard library
in form of the Layout::extend
function which makes guarantees about
the layout of repr(C)
structs.
The changes in this RFC restore the original design of repr(C)
, increasing the
portability of repr(C)
, allowing us to fix the behavior of repr(C)
on MSVC
targets. For users that rely on the precise layout currently described in the
reference, it provides an upgrade path in the form of repr(simple)
.
This section describes the changes in terms of changes to the reference and standard library.
Changes to Layout::extend
All mentions of repr(C)
in the doc comment are replaced by repr(simple)
.
Changes to the current The C
Representation section
The section The C
Representation in the reference is renamed to The
simple
Representation. The following changes are made to its body:
The C representation is designed for dual purposes. One purpose is for creating types that are interoperable with the C Language. The second purpose is to create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type.
Because of this dual purpose, it is possible to create types that are not useful for interfacing with the C programming language.The layout of a container type annotated with
repr(simple)
depends only on the layout of its fields but not on the target platform or the Rust version.This representation can be applied to structs, unions, and enums.
The exception is zero-variant enums for which the C representation is an error.
...
Note: This algorithm can produce zero-sized structs. In C, an empty struct declaration likestruct Foo { }
is illegal. However, both gcc and clang support options to enable such structs, and assign them size zero. C++, in contrast, gives empty structs a size of 1, unless they are inherited from or they are fields that have the[[no_unique_address]]
attribute, in which case they do not increase the overall size of the struct.
A union declared with#[repr(C)]
will have the same size and alignment as an equivalent C union declaration in the C language for the target platform.The unionA union declared with#[repr(simple)]
will have a size of the maximum size of all of its fields rounded to its alignment, and an alignment of the maximum alignment of all of its fields. These maximums may come from different fields.
...
For [field-less enums], theC
representation has the size and alignment of the defaultenum
size and alignment for the target platform's C ABI.For [field-less enums], the layout of the
simple
representation depends on the range of the discriminants. The layout is chosen from the layouts of the following types, choosing the first type that can represent all discriminants:
- i32, u32
- i64, u64
- i128, u128
NOTE: While repr(C)
on field-less enums was designed to use the representation
used by C, the current implementation always uses i32
as the base type.
Note: The enum representation in C is implementation defined, so this is really a "best guess". In particular, this may be incorrect when the C code of interest is compiled with certain flags.
Warning: There are crucial differences between anenum
in the C language and Rust's [field-less enums] with this representation. Anenum
in C is mostly atypedef
plus some named constants; in other words, an object of anenum
type can hold any integer value. For example, this is often used for bitflags inC
. In contrast, Rust’s [field-less enums] can only legally hold the discrimnant values, everything else is [undefined behavior]. Therefore, using a field-less enum in FFI to model a Cenum
is often wrong.The representation of a
repr(C)
repr(simple)
enum with fields is arepr(C)
repr(simple)
struct with two fields,also called a "tagged union" in C:
- a
repr(simple)
version of the enum with all fields removed ("the tag")- a
repr(simple)
union ofrepr(simple)
structs for the fields of each variant that had them ("the payload")Note: Due to the representation of
repr(C)
repr(simple)
structs and unions, if a variant has a single field there is no difference between putting that field directly in the union or wrapping it in a struct; any system which wishes to manipulate such anenum
's representation may therefore use whichever form is more convenient or consistent for them.
...
NOTE: At the end of the body are examples. Here all mentions of repr(C)
are
replaced by repr(simple)
and references to C and C++ are removed.
A new repr(C)
section is added in place of the old one. The body text is as
follows:
The
C
representation is designed for interoperability with the C Language, including various vendor extensions. The layout of a type annotated withrepr(C)
is defined in terms of its equivalent C type and the target the program is compiled for. For this purpose, for each target, we define a normative compiler whose layout algorithm will be used to determine the layout of the equivalent C type. Here, layout means the size and alignment of the type and, for record types, the offset and size of its fields.NOTE: GCC, Clang, and MSVC allow the user to specify flags on the command line that change the default layout of types. Such custom default representations are currently not supported. (For example:
-fshort-enums
and/Zp1
.)Whether the equivalent C type exists depends on the normative compiler. The algorithm used to compute the equivalent C type is described in Appendix B. If a Rust type annotated with
repr(C)
does not have an equivalent C type, then the layout of the Rust type is unspecified.This representation can be applied to structs, unions, and enums.
Warning: There are crucial differences between an enum in the C language and Rust's [field-less enums] with this representation. An enum in C is mostly a typedef plus some named constants; in other words, an object of an enum type can hold any integer value. For example, this is often used for bitflags in C. In contrast, Rust’s [field-less enums] can only legally hold the discrimnant values, everything else is [undefined behavior]. Therefore, using a field-less enum in FFI to model a C enum is often wrong.
On structs and unions, the
C
representation can be combined with the following alignment modifiers:
repr(align(N))
: This is equivalent to__declspec(align(N))
and__attribute__((aligned(N)))
in C.repr(pragma_pack(N))
: This is equivalent to#pragma pack(N)
in C.repr(packed)
: This is an alias forrepr(pragma_pack(1))
.repr(packed(N))
: This is an alias forrepr(pragma_pack(N))
.NOTE:
repr(packed)
andrepr(packed(N))
onrepr(C)
types is deprecated. A future version of rustc will emit a warning when these annotations are used together.
Changes to the Primitive representations section
All mentions of repr(C)
in this section are replaced by repr(simple)
.
References to C and C++ are removed.
Changes to the The alignment modifiers section
The following paragraph is added:
This section describes the behavior for representations other than the
C
representation. For the behavior of these modifiers forrepr(C)
types, see the section describing theC
representation.
Changes to the The transparent
Representation section
All mentions of repr(C)
in this section are replaced by repr(simple)
.
TODO: Almost everything is already described in the section above. Here we add some details and talk about lints etc.
- This changes the layout of
repr(C)
types on some Windows targets. - This adds an additional type representation.
- This adds a new alignment modifier on
repr(C)
types that behaves exactly like one of the existing modifiers. - This defines the layout of
repr(C)
types in terms of specific C compiler implementations.
TODO
-
In Clang and GCC there are two kinds of packing attributes:
__attribute__((packed))
#pragma pack
These behave significantly differently. In anticipation of us adding support for
__attribute__((packed))
in the future, we deprecate the ambiguously namedrepr(packed)
.
TODO
TODO
TODO
This appendix contains a map from Rust targets to their normative C compilers.
aarch64-unknown-linux-gnu
aarch64-unknown-linux-musl
aarch64-wrs-vxworks
arm-unknown-linux-gnueabi
arm-unknown-linux-gnueabihf
arm-unknown-linux-musleabi
arm-unknown-linux-musleabihf
armv4t-unknown-linux-gnueabi
armv5te-unknown-linux-gnueabi
armv5te-unknown-linux-musleabi
armv5te-unknown-linux-uclibceabi
armv7-unknown-linux-gnueabi
armv7-unknown-linux-gnueabihf
armv7-unknown-linux-musleabi
armv7-unknown-linux-musleabihf
armv7-wrs-vxworks-eabihf
avr-unknown-gnu-atmega328
i586-unknown-linux-gnu
i586-unknown-linux-musl
i686-pc-windows-gnu
i686-unknown-linux-gnu
i686-unknown-linux-musl
i686-uwp-windows-gnu
i686-wrs-vxworks
mips64el-unknown-linux-gnuabi64
mips64el-unknown-linux-muslabi64
mips64-unknown-linux-gnuabi64
mips64-unknown-linux-muslabi64
mipsel-unknown-linux-gnu
mipsel-unknown-linux-musl
mipsel-unknown-linux-uclibc
mipsisa32r6el-unknown-linux-gnu
mipsisa32r6-unknown-linux-gnu
mipsisa64r6el-unknown-linux-gnuabi64
mipsisa64r6-unknown-linux-gnuabi64
mips-unknown-linux-gnu
mips-unknown-linux-musl
mips-unknown-linux-uclibc
powerpc64le-unknown-linux-gnu
powerpc64le-unknown-linux-musl
powerpc64-unknown-linux-gnu
powerpc64-unknown-linux-musl
powerpc64-wrs-vxworks
powerpc-unknown-linux-gnu
powerpc-unknown-linux-musl
powerpc-wrs-vxworks
riscv32gc-unknown-linux-gnu
riscv64gc-unknown-linux-gnu
s390x-unknown-linux-gnu
sparc64-unknown-linux-gnu
sparc-unknown-linux-gnu
thumbv7neon-unknown-linux-gnueabihf
thumbv7neon-unknown-linux-musleabihf
x86_64-linux-kernel
x86_64-pc-windows-gnu
x86_64-unknown-linux-gnu
x86_64-unknown-linux-gnux32
x86_64-unknown-linux-musl
x86_64-uwp-windows-gnu
x86_64-wrs-vxworks
aarch64-apple-darwin
aarch64-apple-ios
aarch64-apple-ios-macabi
aarch64-apple-tvos
aarch64-fuchsia
aarch64-linux-android
aarch64-unknown-freebsd
aarch64-unknown-hermit
aarch64-unknown-netbsd
aarch64-unknown-none
aarch64-unknown-none-softfloat
aarch64-unknown-openbsd
aarch64-unknown-redox
armebv7r-none-eabi
armebv7r-none-eabihf
arm-linux-androideabi
armv6-unknown-freebsd
armv6-unknown-netbsd-eabihf
armv7a-none-eabi
armv7a-none-eabihf
armv7-apple-ios
armv7-linux-androideabi
armv7r-none-eabi
armv7r-none-eabihf
armv7s-apple-ios
armv7-unknown-freebsd
armv7-unknown-netbsd-eabihf
asmjs-unknown-emscripten
hexagon-unknown-linux-musl
i386-apple-ios
i686-apple-darwin
i686-linux-android
i686-unknown-freebsd
i686-unknown-haiku
i686-unknown-netbsd
i686-unknown-openbsd
mipsel-sony-psp
mipsel-unknown-none
msp430-none-elf
powerpc64-unknown-freebsd
powerpc-unknown-linux-gnuspe
powerpc-unknown-netbsd
powerpc-wrs-vxworks-spe
riscv32imac-unknown-none-elf
riscv32imc-unknown-none-elf
riscv32i-unknown-none-elf
riscv64gc-unknown-none-elf
riscv64imac-unknown-none-elf
sparc64-unknown-netbsd
sparc64-unknown-openbsd
sparcv9-sun-solaris
thumbv4t-none-eabi
thumbv6m-none-eabi
thumbv7em-none-eabi
thumbv7em-none-eabihf
thumbv7m-none-eabi
thumbv7neon-linux-androideabi
thumbv8m.base-none-eabi
thumbv8m.main-none-eabi
thumbv8m.main-none-eabihf
wasm32-unknown-emscripten
wasm32-unknown-unknown
wasm32-wasi
x86_64-apple-darwin
x86_64-apple-ios
x86_64-apple-ios-macabi
x86_64-apple-tvos
x86_64-fortanix-unknown-sgx
x86_64-fuchsia
x86_64-linux-android
x86_64-pc-solaris
x86_64-rumprun-netbsd
x86_64-sun-solaris
x86_64-unknown-dragonfly
x86_64-unknown-freebsd
x86_64-unknown-haiku
x86_64-unknown-hermit
x86_64-unknown-hermit-kernel
x86_64-unknown-illumos
x86_64-unknown-l4re-uclibc
x86_64-unknown-netbsd
x86_64-unknown-openbsd
x86_64-unknown-redox
aarch64-pc-windows-msvc
aarch64-uwp-windows-msvc
i586-pc-windows-msvc
i686-pc-windows-msvc
i686-unknown-uefi
i686-uwp-windows-msvc
thumbv7a-pc-windows-msvc
thumbv7a-uwp-windows-msvc
x86_64-pc-windows-msvc
x86_64-unknown-uefi
x86_64-uwp-windows-msvc
This appendix describes the algorithm used to compute the equivalent C type of a Rust type. The equivalent C type is only used for layout computations.
Let R
be a Rust type.
-
If
R
is one ofu8
,i8
,u16
,i16
,u32
,i32
,u64
,i64
,isize
, orusize
, then the equivalent C type is the first of the following C types that has the same size asR
:char
,short
,int
,long
,long long
.
NOTE: On all targets supported by Rust, if two of these C types have the same size, then they also have the same alignment.
-
If
R
is one off32
orf64
, then the equivalent C type is the first of the following C types that has the same size asR
:float
,double
,long double
,
NOTE: On
avr-unknown-unknown
,double
is a 32-bit type. -
If
R
isbool
, then the equivalent C type is_Bool
. -
If
R
isu128
ori128
and the__int128
type is supported by the normative compiler, then the equivalent C type is__int128
. -
If
R
is a pointer to a sized type, or a reference to a sized type, or a function pointer, or anOption
of a reference to a sized type, or anOption
of a function pointer, then the equivalent C type isvoid *
. -
If
R
is an array whose element type has an equivalent C type, then the equivalent C type ofR
is the array with the same size whose element type is the equivalent C type ofR
's element type.NOTE: In positions where the normative compilers accept both zero-sized array members and flexible array members, the containing records have the same layout.
-
If
R
is a fieldless enum with arepr(C)
annotation and without arepr(align)
annotation:- Let
E
be the C enum with the same enumeration constants and constant expressions. (This is a purley syntactic construction as the resulting syntax need not be accepted by the normative compiler.) - If
E
is accepted by the normative compiler and the enumeration constants are the same as the discriminant values ofR
, thenE
is the equivalent C type. - Otherwise
R
has no equivalent C type.
NOTE: MSVC truncates enumeration constants to
int
. - Let
-
If
R
is an enum with fields with arepr(C)
annotation and without arepr(align)
annotation:- Let
F
be the equivalent fieldless Rust enum. - Let
U
be the union containing one field for each variant of the enum that has at least one field. The type of the field is the tuple or braced struct having the same fields as the enum variant body. - Let
E
be the structstruct { d: F, u: U }
. - The equivalent C type of the enum is the equivalent C type of
E
, if any.
NOTE: There are no fields in
U
for variants without fields because the normative compiler might not accept structs without fields. SinceR
is an enum with fields,U
itself has at least one field.NOTE: It is significant that the discriminant is stored outside the union. This can affect the layout of the overall type.
- Let
-
If
R
is a tuple struct with arepr(C)
annotation:- Let
S
be the braced struct with the same number of fields whose fields are calledfield1
,field2
, etc. and whose field types are the types of the fields in the tuple struct. - The equivalent C type is the equivalent C type of
S
, if any.
- Let
-
If
R
is a braced struct or a unit-like struct or a union with arepr(C)
annotation:- If one of the types of the fields does not have an equivalent C type,
then
R
has no equivalent C type. - Let
S
be the C struct or union (depending on the type ofR
) with the same field names and equivalent C field types. (This is a purley syntactic construction as the resulting syntax need not be accepted by the normative compiler.) - If
R
is annotated withrepr(align(N))
:- If the normative compiler is MSVC, annotate
S
with__declspec(align(N))
. - If the normative compiler is GCC or Clang, annotate
S
with__attribute__((aligned(N)))
.
- If the normative compiler is MSVC, annotate
- If
R
is annotated withrepr(pragma_pack(N))
, annotateS
with#pragma pack(N)
. - If
S
is accepted by the normative compiler, thenE
is the equivalent C type ofR
.
- If one of the types of the fields does not have an equivalent C type,
then
-
Otherwise
R
has no equivalent C type.