Skip to content

Instantly share code, notes, and snippets.

@shafik
Last active June 29, 2021 14:45
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save shafik/a956a17d00024b32b35634eeba1eb49e to your computer and use it in GitHub Desktop.
Save shafik/a956a17d00024b32b35634eeba1eb49e to your computer and use it in GitHub Desktop.
How to use bit_cast to type pun a unsigned char array

In C++20 we will hopefully get bit_cast see the proposal and reference implementation. This utility should give us a simple and safe way to type pun.

The one issue I ran into with this utility is that is requires the size of the To and From type to be the same, as well as checking that To and From types are trivially copyable. The static_assert version of the check is as follows:

# define BIT_CAST_STATIC_ASSERTS(TO, FROM) do {                         \
    static_assert(sizeof(TO) == sizeof(FROM));                          \             
    static_assert(std::is_trivially_copyable<TO>::value);               \
    static_assert(std::is_trivially_copyable<FROM>::value);             \
} while (false)

This is not an unreasonable constraint but there may be cases where we would like to, let's say, type pun an array of char into a primitive type like unsigned int.

After discussing this with the JF Bastien the proposal's author as well as the author of the reference implementation, one way around this restriction is to copy the chunk we want to pun into a struct with the same size as the primitive we are punning to. Let’s see how this would work.

struct four_chars {
    unsigned char arr[4] = {} ;  
} ;

unsigned int foo( unsigned char *p ) {
    four_chars f ;
    std::memcpy( f.arr, p, 4) ;
    unsigned int result = bit_cast<unsigned int>(f) ;

    return result ;
}

What is great about this case is that the optimizer is smart enough to recognize the memcpy and bit_cast can be reduced to a single mov directly into a register see gobolt:

foo(unsigned char*): # @foo(unsigned char*)
mov eax, dword ptr [rdi]
ret

constexpr bit_cast

It is worth it to point out an interesting aspect of the constexpr case. First this requires compiler support since memcpy() is not marked constexpr and reinterpret_cast is not allowed in a constant expression, likely via a builtin.

This mainly works since the underlying assumption is that the type puns allowed by bit_cast can be implemented as a mov to a register. This feature is interesting because it now allows type punning at compile time. This also means no undefined behavior since undefined behavior is not allowed in a constant expression and we expect any attempt to invoke UB will be caught at compile time.

Some cases undefined behavior could pop up are a bit_cast from a type whose underlying representation has no value in the To type. For example the standard does not specify the underlying representation of bool therefore a bit_cast to bool could invoke undefined behavior. e.g.

bool b = bit_cast<bool>('a') ; // UBsan catches this case: https://wandbox.org/permlink/P7hlo7AZDx2t0PoY
                               // runtime error: load of value 97, which is not a valid value for type 'bool'

We also have unspecified behavior for cases where the To type could have multiple possibe values for a given From value e.g.:

bit_cast<char>(true) // We are not guaranteed any specific value here.
                     // Although we may have certain expectations.
bit_cast<uintptr_t>(nullptr)  // We would expect zero but it is not specified what the underlying value is

I posted this write-up and a poll on type punning of char arrays on Twitter and some of the responses are interesting:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment