Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Working around offsetof limitations in C++

Working around offsetof limitations in C++:

There is sometimes a situation in which one needs to get the relative offset of a structure field, common examples of this include serialization frameworks which aid to serialize objects, vertex attributes for rendering (D3D, GL.), etc.

The most common technique for getting this information is through the offsetof macro defined in stddef.h. Unfortunately using the macro in C++ comes with a new set of restrictions that prevent some (subjectively valid) uses of it.

Intuition would have you believe that using offsetof on classes/structures with standard-layout would work. However when a base class has data members, the deriving class is not considered standard-layout. Presumably the reason for this is that C++ wants to beable to permit very unusual layouts where a given inherited class might not have the same offset in all instances?

A less relaxed compiler will emit diagnostic for this as a result:

struct vec2 { float x, y; };
struct vec3 : vec2 { float z; };
struct vec4 : vec3 { float w; };

y_offset = offsetof(vec4, y);

Even though vec4 definitely has the following memory layout:

  0   4   8   C
| x | y | z | w |

This becomes a problem and often leads to techniques that lead to illegal code such as the following technique present in many game engines:

vec4 *v = 0;
y_offset = &v->y;

Various people would have you believe it's not undefined since it tends to be the common technique implored to implement the offsetof macro. Modern compilers do provide intrinsics however; (GCC, Clang) now use __builtin_offsetof because they're now beginning to optimize based on the assumption code does not depend on undefined behavior. For more information on why you should no longer write code like that, check out these in depth analysis of the technique:

https://software.intel.com/en-us/blogs/2015/04/20/null-pointer-dereferencing-causes-undefined-behavior

http://www.viva64.com/en/b/0306/

This leaves us in a tough position however. How can we work around this limitation without invoking undefined behavior?

One technique would be to create an object on stack and calculate the offset like so:

vec4 v;
v_offset = size_t(&v.y) - size_t(&v);

However this is not a valid analog for offsetof which is usuable in constant expressions as it yields a constant value. Which means code like the following:

enum { memory_needed = offsetof(vec4, y) };
unsigned char memory_for_x_and_y[memory_needed];

Cannot be realized with this technique.

One option would be to maintain a set of structures with the same layout without any inheritance but that leads to a maintenance hell. So instead I decided to explore C++11s constexpr to see if there was a way to accomplish this.

Unfortunately C++11s constexpr rules are too restrictive to allow us to have a local object in the function, but C++14 did relax those rules allowing us to achieve what we want:

template <typename T1, typename T2>
inline size_t constexpr offset_of(T1 T2::*member) {
    constexpr T2 object {};
    return size_t(&(object.*member)) - size_t(&object);
}

As long as T2 has a default constructor which is constexpr this works. The implicit default constructors for our vec classes are implicitly constexpr so this works:

y_offset = offset_of(&vec4::y);

However C++14 is fairly new and I needed a C++11 compatible option which yields constant values. After a little bit of work, I came up with the following solution:

template <typename T1, typename T2>
struct offset_of_impl {
    static T2 object;
    static constexpr size_t offset(T1 T2::*member) {
        return size_t(&(offset_of_impl<T1, T2>::object.*member)) -
               size_t(&offset_of_impl<T1, T2>::object);
    }
};
template <typename T1, typename T2>
T2 offset_of_impl<T1, T2>::object;

template <typename T1, typename T2>
inline constexpr size_t offset_of(T1 T2::*member) {
    return offset_of_impl<T1, T2>::offset(member);
}

Not as pretty but gets the job done. The compiler diagnostics for misusing this tend to be legible, if you're feeling a bit zealous you could use enable_if to check if the type has a default constructor before stamping out the function. But I found that the diagnostics for that were less legible than not having any check at all.

@Philippe23

This comment has been minimized.

Copy link

Philippe23 commented Dec 23, 2015

Psst.

enum { memory_needed = offsetof(vec4, y) }; unsigned char memory_for_x_and_y[memory_needed];

This only yields enough memory for x, there'll be no room for y - there will just be 4 bytes available as the offset of y is 4.

@pal666

This comment has been minimized.

Copy link

pal666 commented Dec 24, 2015

offsetof's behavior is undefined because standard does not guarantee that z will lay after y. and no amount of constexpr trickery will change that, if your code depends on members laid out sequentially, they all have to be defined in one class.

@IslamAbdelRahman

This comment has been minimized.

Copy link

IslamAbdelRahman commented Feb 7, 2017

Thanks!, I ended up using something like this

template <typename T1, typename T2>
inline size_t offset_of(T1 T2::*member) {
  static T2 obj;
  return size_t(&(obj.*member)) - size_t(&obj);
}
@EvanBalster

This comment has been minimized.

Copy link

EvanBalster commented Sep 21, 2017

Wondering if constructor calls associated with a temporary or static instance could be avoided by declaring a byte or word array and casting that to the necessary object type. In this case the array might get optimized away. Dereferencing a null pointer is undefined behavior, but I'm not sure if the same applies to a non-constructed object (and indeed, this isn't uncommon in C++ programs).

template <typename T1, typename T2>
inline constexpr ptrdiff_t offset_of(T1 T2::*member) {
  static const char obj_dummy[sizeof(T2)];
  const T2 *obj = reinterpret_cast<T2*>(obj_dummy);
  return ptrdiff_t(intptr_t(&(obj->*member)) - intptr_t(obj));
}

It could be sensible to use an array of void* to get the alignment right, but I'm not sure if this matters.

@neoq

This comment has been minimized.

Copy link

neoq commented Sep 23, 2017

I suggest taking a class sample as argument with the default being a default constructed object:

template <typename member_t, typename T>
constexpr std::size_t offset(member_t T::* p, T sample = T()) {
    return std::size_t(&(sample.*p)) - std::size_t(&sample);
}

That way, if a class does not have a constexpr default constructor (for example a trivial class), you can give an object to the function, that you constexpr constructed by other means.

@EvanBalster

This comment has been minimized.

Copy link

EvanBalster commented Oct 11, 2017

The disadvantage of using a "class sample" (including the static instances suggested earlier) is that it makes this technique unusable or problematic when constructing or destructing a class has significant overhead or side effects. Still curious if the char-buffer technique is safe from undefined behavior... Might need an alignas or something?

@RossBencina

This comment has been minimized.

Copy link

RossBencina commented Apr 11, 2018

Greetings friends :),

First some bad news:

reinterpret_cast is not legal in constexpr expressions. To be clear, a cast such as size_t(...) is a reinterpret_cast. @graphitemaster's code above compiles due to a known bug in gcc, see: https://stackoverflow.com/a/24400015/2013747 It does not compile in clang 5.0.0.

And a minor point: operator& could be overloaded, so std::addressof should be used.

Below is my (rather hairy) attempt at a solution. To avoid the "class sample," we can use a union. In C++11 unions can contain members with non-trivial ctors, and you can specify that the default constructs an alternative, trivial, member. (Then the question remains: is it legal to do address computations on the non-active members of a union?). This version also avoids use of reinterpret_cast. [EDIT: but unfortunately, it is not actually constexpr either, at least not in C++11. gcc does compile it in C++17 mode. clang does not]

#include <iostream>
#include <cstdint>
#include <memory>

// version of the gist at: https://gist.github.com/graphitemaster/494f21190bb2c63c5516
// original version by graphitemaster

template <typename T1, typename T2>
struct offset_of_impl {
    union U {
        char c;
        T1 m; // instance of type of member
        T2 object;
        constexpr U() : c(0) {} // make c the active member
    };
    static constexpr U u = {};
    
    static constexpr std::ptrdiff_t offset(T1 T2::*member) {
        // The following avoids use of reinterpret_cast, so is constexpr.
        // The subtraction gives the correct offset because the union layout rules guarantee that all
        // union members have the same starting address. 
        // On the other hand, it will break if object.*member is not aligned.
        // Possible problem: it uses std::addressof on non-active union members.
        // Please let us know at the gist if this is defined or undefined behavior.
        // [EDIT: it is undefined, but for the following reason: expr.add-5.sentence-2
        // "If the expressions P and Q point to, respectively, elements x[i] and x[j] of 
        // the same array object x, the expression P - Q has the value i - j; otherwise, the behavior is undefined."]
        return (std::addressof(offset_of_impl<T1, T2>::u.object.*member) - 
               std::addressof(offset_of_impl<T1, T2>::u.m)) * sizeof(T1);
    }
};

template <typename T1, typename T2>
constexpr typename offset_of_impl<T1, T2>::U offset_of_impl<T1, T2>::u;

template <typename T1, typename T2>
inline constexpr std::ptrdiff_t offset_of(T1 T2::*member) {
    return offset_of_impl<T1, T2>::offset(member);
}

struct S {
    S(int a_, int b_, int c_) : a(a_), b(b_), c(c_) {}
    S() = delete;
    int a;
    int b;
    int c;
};

int main()
{
    constexpr auto x = offset_of(&S::b);   
    std::cout << x;   
}

Here's a code sandbox to play with: https://wandbox.org/permlink/SVcjVI7k6ezgcUgs

One problem is that std::addressof may not have defined behavior when its parameter is not the active union member, or perhaps those expressions that form the parameters of std::addressof may themselves be undefined. We need a language lawyer to resolve this. (See discussion here: https://stackoverflow.com/questions/49775980/is-it-well-defined-to-use-stdaddressof-on-non-active-union-members )

A final note: in C++17, offsetof is conditionally supported, which means that you can use it on any type (not just standard layout types) and the compiler will error if it can't compile it correctly. That appears to be the best option if you can live with C++17 and don't need constexpr support.

@Somnium7

This comment has been minimized.

Copy link

Somnium7 commented May 3, 2019

In Visual Studio with latest C++17 standart enabled, following code works:

struct vec2 { float x, y; };
struct vec3 : vec2 { float z; };
struct vec4 : vec3 { float w; };
int main () {
	constexpr int i = offsetof(vec4, w);
        char c[i];
        return 0;
}

offsetof in VIsual Studio is implemented as:
#define offsetof(s,m) ((::size_t)&reinterpret_cast<char const volatile&>((((s*)0)->m)))

@dutow

This comment has been minimized.

Copy link

dutow commented Jan 5, 2020

(Edited, and significantly changed)

@RossBencina's solution can be improved by using an actual array and a while less loop instead of direct arithmetic. That way it compiles with current gcc & clang, and while it still doesn't work with unaligned members, it is at least able to report an error in this case (e.g. throw 1, and it'll be a compilation error in constexpr context, runtime otherwise). And maybe it doesn't depend on any undefined behavior anymore?

However it (and all previous PTDM based solutions) share a dangerous issue: it gives incorrect results with classes using multiple inheritance.(but doesn't report an error). I don't see any way this can be fixed without injecting the explicit base class parameter into the offset_of template, changing the syntax to something like offset_of<T>(&T::dm). T could be deduced from a defaulted second parameter, and that way it would only have to be specified when multiple inheritance is used - but I'm not sure if that could be detected. And without detecting (and reporting an error) it would be dangerous.

While I don't need support for packed types, I also added support for them: instead of a single array, the code now uses alignof(T1) arrays with increasing offsets, and calculates the misalignment based on which array is used.

Also, the C++17 improvement is nice, but...

  • While offsetof works for non POD types now, all compilers reports warnings. So it's a choice about disabling that warning, or looking for another solution
  • Clang doesn't support offsetof for duplicate member names (when using multiple inheritance)

Updated code:

#include <cstdint>
#include <iostream>
#include <memory>

// version of the gist at:
// https://gist.github.com/graphitemaster/494f21190bb2c63c5516 original version
// by graphitemaster
//

template <size_t maxalign>
struct unioner {
  template <typename T0, typename T1, size_t O>
  union U_inner {
    struct {
      char pad[O];  // offset

      T1 m[sizeof(T0) / (sizeof(T1) + 1)];  // instance of type of member
    } data;
    U_inner<T0, T1, O + 1> other;
  };

  template <typename T0, typename T1>
  union U_inner<T0, T1, 0> {
    struct {
      T1 m[sizeof(T0) / (sizeof(T1) + 1)];  // instance of type of member
    } data;
    U_inner<T0, T1, 1> other;
  };

  template <typename T0, typename T1>
  union U_inner<T0, T1, maxalign> {};
};

template <typename T0, typename T1, typename T2>
struct offset_of_impl {
  using inner_t = typename unioner<alignof(T1)>::template U_inner<T0, T1, 0>;
  union U {
    char c;
    inner_t m;
    T0 object;
    constexpr U() : c(0) {}  // make c the active member
  };
  static constexpr U u = {};

  static constexpr const T1* addr_helper(const T1* base, const T1* target) {
    auto addr = base;
    while (addr < target) {
      addr++;
    }
    return addr;
  }
  static constexpr ptrdiff_t addr_diff(const T1* base, const T1* target) {
    return (target - base) * sizeof(T1);
  }

  template <size_t off, typename TT>
  static constexpr std::ptrdiff_t offset2(T1 T2::*member, TT& union_part) {
    const auto addr_target =
        std::addressof(offset_of_impl<T0, T1, T2>::u.object.*member);
    const auto addr_base = std::addressof(union_part.data.m[0]);
    const auto addr = addr_helper(addr_base, addr_target);

    // != will never return true... but < seems to work?
    if (addr < addr_target) {
      if constexpr (off + 1 < alignof(T1)) {
        return offset2<off + 1>(member, union_part.other);
      } else {
        throw 1;  // shouldn't happen
      }
    }
    return (addr - addr_base) * sizeof(T1) + off;
  }

  static constexpr std::ptrdiff_t offset(T1 T2::*member) {
    const auto addr_target =
        std::addressof(offset_of_impl<T0, T1, T2>::u.object.*member);
    const auto addr_base =
        (std::addressof(offset_of_impl<T0, T1, T2>::u.m.data.m[0]));
    const auto addr = addr_helper(addr_base, addr_target);

    return offset2<0>(member, offset_of_impl<T0, T1, T2>::u.m);

    if (addr != addr_target) {
      return 0;
    }

    return (addr - addr_base) * sizeof(T1);
  }
};

template <typename T0, typename T1, typename T2>
constexpr typename offset_of_impl<T0, T1, T2>::U offset_of_impl<T0, T1, T2>::u;

template <typename T0, typename T1, typename T2>
inline constexpr std::ptrdiff_t offset_of(T1 T2::*member, T0* = nullptr) {
  return offset_of_impl<T0, T1, T2>::offset(member);
}

struct s {
  float a;
  char b;
  int c;
};

#pragma pack(push, 1)
struct s2 {
  float a;
  char b;
  int c;
  double d;
  char e;
};
#pragma pack(pop)

struct a {
  int i;
  int j;
};
struct b {
  int i;
  int k;
};
struct ab : public a, public b {};

int main() {
  constexpr size_t s_b = offset_of<s>(&s::b);
  constexpr size_t s_c = offset_of<s>(&s::c);
  // compilation error with both gcc & clang
  constexpr size_t s2_c = offset_of<s2>(&s2::c);
  std::cout << s_b << std::endl;
  std::cout << s_c << std::endl;
  std::cout << s2_c << std::endl;
  std::cout << offset_of<s2>(&s2::e) << std::endl;
  std::cout << alignof(&s2::e) << std::endl;

  // these only work with gcc, not clang
  // also generates a warning
  // std::cout << offsetof(ab, a::i) << std::endl;
  // std::cout << offsetof(ab, b::i) << std::endl;
  auto ai = &ab::a::i;
  auto bi = &ab::b::i;

  ab v;
  v.*ai = 11;
  v.*bi = 22;

#define DBG_PRINT(s) std::cout << #s << " = " << (s) << std::endl;

  std::cout << ((a&)v).i << " " << ((b&)v).i << std::endl;
  DBG_PRINT(offset_of<ab>(&ab::b::i));
  DBG_PRINT(offset_of<ab>(&ab::a::i));

  DBG_PRINT(offset_of<ab>(&ab::k));
  // incorrect result
  DBG_PRINT(offset_of<ab>(&ab::b::k));
  // doesn't work with clang, correct result with gcc
  // DBG_PRINT((offsetof(ab, b::k)));
  DBG_PRINT(offset_of<ab>(&ab::k));
  DBG_PRINT((offsetof(ab, k)));

  return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.