Problems with GCCs per-function target specifiers (at least in 4.6.3)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
So in RADFFT we have some SSE3 kernel functions that have been the source of a lot | |
of compilation "fun". With VC++ it's no problem, with GCC it evidently is. We don't | |
want to specify "-msse3" since that allows the compiler to use SSE3 outside these | |
functions - all of which are called within a CPU feature dispach block - and that | |
not only can but has caused breakage for customers in the past. | |
GCC says "put the SSE3 code in a separate translation unit and compile just that with | |
-msse3". There's been at least three problems with this that I'm aware of: | |
a) It forces us to make these small internal DSP kernels public (currently they're | |
static functions), which is extra namespace pollution, which is not a huge deal | |
but still blows. | |
b) We've had fun issues with function attributes "bleeding out" or getting ignored | |
in the past, mostly involving inlining. This is less of an issue when we're | |
talking about a separate translation unit - unless there's LTO involved, | |
which it often is. Yay. | |
c) Having to add extra command line options, only on some files, only when using | |
GCC, and only on some platforms (namely x86 targets) is, in that exact | |
combination, a pretty giant pain in the ass in the build process and super | |
error-prone. Just not instantiating say SSE3 functions when the compiler doesn't | |
support the intrinsics is easy. Messing with the build flags on specific files | |
and under specific circumstances is something of a PITA in all build systems | |
I've dealt with, but CDep is worse than most. | |
Okay. So I was pretty excited to see that GCC has this: | |
https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bsyscall_005flinkage_007d-function-attribute_002c-IA-64-3240 | |
(the "target" bit, not the Itanium thing that link is actually to, but I the | |
anchor for the "target" thing is on the line just below the "target" heading | |
so you have no idea what it's talking about) | |
Andh you can also set that via a #pragma | |
https://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html | |
which is all supported on GCC 4.6 and up (probably earlier, didn't check). | |
So I thought, great, I can just use that, right? Gather all the SSE3 functions | |
together in one contiguous block, do this before the block: | |
#pragma GCC push_options | |
#pragma GCC target("sse3") | |
#include <pmmintrin.h> // so I actually get the SSE3 intrinsics | |
and this right after: | |
#pragma GCC pop_options | |
Cool. That's reasonable. I can certainly live with that! | |
And of course it was too good to be true, because if you try that (at least with | |
the GCC 4.6.3 on our Ubuntu 12.04 LTS Linux builders), what you'll find is that | |
including "pmmintrin.h" does not in fact work. The target("sse3") thing enables | |
the builtins, but you still can't include that header. (It fails testing for a | |
__SSE3__ #define. You can try #defining that manually, but will quickly discover | |
that this merely causes more interesting breakage further down the line.) | |
It turns out that the smallest example I can get to produce the behavior I actually | |
want is this: | |
#pragma GCC target("no-sse3") // at the top of the file | |
#include <emmintrin.h> // for SSE2 intrinsics; required to include this here! | |
#pragma GCC push_options | |
#pragma GCC target("sse3") | |
#include <pmmintrin.h> // need to include this *in* the sse3 block (fair enough) | |
__m128 silly_sse3(float *x) | |
{ | |
return _mm_moveldup_ps(_mm_load_ps(x)); | |
} | |
#pragma GCC pop_options | |
__m128 silly_sse2(float *x) | |
{ | |
__m128 v = _mm_load_ps(x); | |
return _mm_shuffle_ps(v, v, 0xa0); | |
} | |
And then compile that with "-msse3". Anything less and it either doesn't | |
compile or doesn't link. Having to do this pretty thoroughly dismantles the | |
benefit of being able to specify this per-function. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Compiles fine with
gcc (GCC) 4.9.2 20150304 (prerelease)
However,
Adding
-m32
gives the same error.Does that help?