Created
April 23, 2015 20:44
-
-
Save rygorous/159aa1c4573077126169 to your computer and use it in GitHub Desktop.
How I would like compilers to implement SSE intrinsics
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
So I would like to be able to write functions that use SSEx intrinsics (that are called | |
via some CPU dispatch mechanism) without allowing the compiler to use SSEx instructions | |
everywhere (because that's not under control of the runtime CPU dispatch we have). | |
On VC++, this is easy. I get to use whatever intrinsics I want, and the compiler will | |
emit the corresponding instructions. It will not use these same instructions in code | |
that wasn't written with intrinsics unless I specifically allow it to with a | |
command-line option. In GCC and Clang, this turns out to be pretty hard (or at least | |
there's no good way I know of). And yes, I fully realize that the under-the-hood semantics | |
of this are tricky, since in a modern compiler these vector intrinsics turn into an IL that | |
undergoes several transforms, and it may not be obvious to the back-end where they came | |
from and whether it's allowed to say match a codegen pattern that uses a SSE3 instruction | |
or not in a particular context. | |
So you need to define precisely what the desired behavior actually is to decide what | |
should happen in such a case. A reasonable formalization is this: unless I have specified | |
some option that allows the compiler to use some instruction set extension everywhere | |
(for example, "/arch:SSE2" for VC++ lets the compiler use SSE2 instructions wherever it | |
wants), the compiler may only emit SSE2 instructions within functions that use SSE2 | |
intrinsics (and hence implicitly require SSE2 anyway). Thus, even if I didn't specify | |
"/arch:SSE2", I would be okay with the compiler using SSE2 instructions for general-purpose | |
code in such functions. On x64, this particular example is somewhat moot (since x64 includes | |
SSE2); but the actual *behavior* of VC++ in such cases is very convenient and | |
programmer-friendly, and I would like to see more compilers adopt it. | |
I *would* like to be able to use SSSE3 intrinsics in arbitrary (x86) code, without at the | |
same time allowing the compiler to use SSSE3 code everywhere else in that translation unit | |
as consequence of automatic transformations (say, replacing a sequence of permuted integer | |
loads and stores with a MOVDQU, PSHUFB, MOVDQU). I get that behavior in VC++ but not in other | |
compilers. The problem is that while this behavior is easy to describe at the source level, | |
it's not necessarily obvious at the IL level. | |
Thus, here is my formal, precise, source language-agnostic definition of the behavior I | |
would like to see: I am okay with the compiler emitting (say) SSSE3 instructions in any | |
block that is dominated by a block containing SSSE3 intrinsics (that is, source language | |
statements that require SSSE3). (I am of course also okay with the compiler using SSSE3 | |
instructions when their usage was globally enabled using a command-line switch, but I | |
would prefer something more selective for code that needs to run on older machines and | |
can't just be compiled with "ZOMG use SSE4.2 *everywhere*"). | |
- | |
This *sounds* more complicated than say "don't automatically introduce SSSE3 instructions | |
at all" or "just give me a function-level annotation", but both of these approaches have | |
problems: the former is actually tricky when intrinsics are rewritten to a generic form in | |
the IL, and needlessly restrictive besides; the latter is fine in principle, but in practice | |
tend to break frequently as soon as there's inlining or link-time optimization is involved. | |
So my hope is that expressing the property I want purely in terms of things that are | |
available in a low-level, basic-blocks-plus-CFG form is helpful. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment