Problems with GCCs per-function target specifiers (at least in 4.6.3)
So in RADFFT we have some SSE3 kernel functions that have been the source of a lot | |
of compilation "fun". With VC++ it's no problem, with GCC it evidently is. We don't | |
want to specify "-msse3" since that allows the compiler to use SSE3 outside these | |
functions - all of which are called within a CPU feature dispach block - and that | |
not only can but has caused breakage for customers in the past. | |
GCC says "put the SSE3 code in a separate translation unit and compile just that with | |
-msse3". There's been at least three problems with this that I'm aware of: | |
a) It forces us to make these small internal DSP kernels public (currently they're | |
static functions), which is extra namespace pollution, which is not a huge deal | |
but still blows. | |
b) We've had fun issues with function attributes "bleeding out" or getting ignored | |
in the past, mostly involving inlining. This is less of an issue when we're | |
talking about a separate translation unit - unless there's LTO involved, | |
which it often is. Yay. | |
c) Having to add extra command line options, only on some files, only when using | |
GCC, and only on some platforms (namely x86 targets) is, in that exact | |
combination, a pretty giant pain in the ass in the build process and super | |
error-prone. Just not instantiating say SSE3 functions when the compiler doesn't | |
support the intrinsics is easy. Messing with the build flags on specific files | |
and under specific circumstances is something of a PITA in all build systems | |
I've dealt with, but CDep is worse than most. | |
Okay. So I was pretty excited to see that GCC has this: | |
https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bsyscall_005flinkage_007d-function-attribute_002c-IA-64-3240 | |
(the "target" bit, not the Itanium thing that link is actually to, but I the | |
anchor for the "target" thing is on the line just below the "target" heading | |
so you have no idea what it's talking about) | |
Andh you can also set that via a #pragma | |
https://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html | |
which is all supported on GCC 4.6 and up (probably earlier, didn't check). | |
So I thought, great, I can just use that, right? Gather all the SSE3 functions | |
together in one contiguous block, do this before the block: | |
#pragma GCC push_options | |
#pragma GCC target("sse3") | |
#include <pmmintrin.h> // so I actually get the SSE3 intrinsics | |
and this right after: | |
#pragma GCC pop_options | |
Cool. That's reasonable. I can certainly live with that! | |
And of course it was too good to be true, because if you try that (at least with | |
the GCC 4.6.3 on our Ubuntu 12.04 LTS Linux builders), what you'll find is that | |
including "pmmintrin.h" does not in fact work. The target("sse3") thing enables | |
the builtins, but you still can't include that header. (It fails testing for a | |
__SSE3__ #define. You can try #defining that manually, but will quickly discover | |
that this merely causes more interesting breakage further down the line.) | |
It turns out that the smallest example I can get to produce the behavior I actually | |
want is this: | |
#pragma GCC target("no-sse3") // at the top of the file | |
#include <emmintrin.h> // for SSE2 intrinsics; required to include this here! | |
#pragma GCC push_options | |
#pragma GCC target("sse3") | |
#include <pmmintrin.h> // need to include this *in* the sse3 block (fair enough) | |
__m128 silly_sse3(float *x) | |
{ | |
return _mm_moveldup_ps(_mm_load_ps(x)); | |
} | |
#pragma GCC pop_options | |
__m128 silly_sse2(float *x) | |
{ | |
__m128 v = _mm_load_ps(x); | |
return _mm_shuffle_ps(v, v, 0xa0); | |
} | |
And then compile that with "-msse3". Anything less and it either doesn't | |
compile or doesn't link. Having to do this pretty thoroughly dismantles the | |
benefit of being able to specify this per-function. |
This comment has been minimized.
This comment has been minimized.
Compiles fine with
However,
Adding Does that help? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This comment has been minimized.
+1
Doing this in MSVC is just so much saner, although less "optimized" because compiler can not use SSEn instructions for regular code.
Last llvm devmeeting had a talk about this: http://llvm.org/devmtg/2014-10/Slides/Christopher-Function%20Multiversioning%20Talk.pdf (video also available)
They are working to make this easier, but afaik currently the focus is for optimizer to generate arch-specific functions and automatically choose function at runtime. But they are thinking how to make life easier for including intrinsic headers.