Skip to content

Instantly share code, notes, and snippets.

@nico
Last active July 25, 2023 18:43
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nico/93df46db7f5e1d0bd941d63663db95b1 to your computer and use it in GitHub Desktop.
Save nico/93df46db7f5e1d0bd941d63663db95b1 to your computer and use it in GitHub Desktop.
C vs C++ inline function semantics, and ODR vs /arch:

The semantics of inline are one of the areas where C and C++ are pretty different. This post is about the C++ semantics, but the history is interesting, so here's a short summary of it.

The meaning of "inline" is intuitively easy to understand: It gives the compiler a hint that it'd be nice if a function could be inlined. It gets a bit complicated because of two issues:

  1. If the function ends up not being inlined, where should the definition of the function be emitted?
  2. If the inline function contains a static local variable, should that be inlined? Should there be several copies of the static local, or just one? Most inline functions don't have static locals, but they still need some defined behavior.

The history is that C++ invented "inline", with semantics that require special linker support. C++'s semantics are that it requires that all an externally visible inline function with the same name has the same definition in all translation units (like for regular functions), else you have an ODR violation. It also guarantees that a local static in an externally-visible inline function has the same address everywhere. The compilation model is that the compiler may or may not inline a function, and if it's not inlined in a TU, the compiler emits a definition of the inline function to that TU. This means the inline function could potentially be emitted to every TU, and the linker at link time picks one of them and throws away all the others. Since they're all required to have the same implementation, this should be ok. This requires that the linker is able to have a visibility model where many TUs define a symbol as externally visible without the symbol being in multiple TUs being a "duplicate symbol" error.

(Aside: Remember that all class member functions defined in the class body are implicitly inline. But "static" means something different at class scope, so to have a static (in the visiblity sense) inline member function, you have to define it out of line like "static inline void Cls::f() {}"; note that C::f() is still a member function, not a static function on Cls here. edit: this doesn't work. There's apparently no way to have inline member functions with static linkage.)

If you make your inline function "static inline", it's no longer externally-visible. In this case, the compilation model is that the compiler can decide to inline for every TU, and if it doesn't it emits a local definition of the function to that TU (which means that static locals in the inline function are now per-TU instead of global). At link time, the linker can't merge all these copies since they're local to each TU, but it can merge them via identical code folding later on if that's enabled.

gcc then came up with (not standardized) semantics for inline for C89 that are similar but don't require this special linker support -- in return, they're a bit more difficult to use and there are several options for how to use inline: "inline", "static inline", and "extern inline". In C99, the C committee standardized "inline" for C in a way that's similar to what gcc did for C, except they swapped the meaning of "inline" and "extern inline" 9_9. http://stackoverflow.com/questions/216510/extern-inline/216546#216546 has a good concise overview of the differences. "static inline" is roughly the same as C++'s "static inline". With "extern inline", you have to explicitly pick one TU where the inline function is defined in case it's not inlined somewhere.

(Aside: If you have an inline function in a .h file that you want to be usable from both .c and .cc files, you need to make it "static inline". This may be a good idea in other cases too, see below, but the drawback is that it requires identical code folding in the linker to get rid of the size overhead, and it requires more work by both compiler and linker to emit all these copies and then fold them again.)

Ok, that's inline functions in C++. If you ignore C interop and static locals and member functions, it's pretty harmless and makes sense.

Here's the unrelated feature it interacts with in surprising ways: Newer CPUs support instructions older CPUs don't support. New Intel chips contain AVX instructions for example. In CPU-intensive code, you might want to check if the CPU supports AVX, and if so use them since they're more efficient for your use case, and else fall back to a non-AVX implementation.

gcc 6 has a way to do this kind of automatically 1, but other compilers don't, so the somewhat manual approach is to put your AVX code in file_avx.cc, your fallback in file_slow.cc, built the former with /arch:AVX and the latter without, and then have

void ProcessImage() {
  if (have_avx)
    return ProcessImage_AVX();
  ProcessImage_Slow();
}

However, if file_avx.cc includes any standard library header that contains an inline function, say ceilf() from math.h, the definition of the inline function generated for file_avx.cc can use AVX instructions, and when the linker at the end happens to pick the AVX ceilf() when it picks one version and throws away all the others, then calls to ceilf() from non-AVX TUs now call a function that uses AVX instructions. So /arch used with non-static inline functions gives you ODR violations, leading to CLs like https://chromium.googlesource.com/skia/+/e9f78b41c65a78400c22a1b79007b1b9e187a745 (#define __inline static __inline)

If you make your function static inline, this particular problem doesn't happen. So you could say "well MS should mark their ceilf static inline; also for C interop". But all template functions behave like inline by default, and e.g. requiring all templates in the standard library to be static functions seems unreasonable (and the standard mentions, say, template<class T> const T& min(const T& a, const T& b);, not template<class T> static const T& min(const T& a, const T& b);).

In practice, this means you can't include standard library headers in files you build with a different /arch. Longer term, gcc's approach is probably how things should work generally.

@lc0305
Copy link

lc0305 commented Apr 24, 2023

Great post, thank you for sharing this!
Really makes me wonder in how many projects the mentioned "arch issue" is overlooked...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment