Skip to content

Instantly share code, notes, and snippets.

@rygorous
Created April 23, 2015 20:44
Show Gist options
  • Save rygorous/159aa1c4573077126169 to your computer and use it in GitHub Desktop.
Save rygorous/159aa1c4573077126169 to your computer and use it in GitHub Desktop.
How I would like compilers to implement SSE intrinsics
So I would like to be able to write functions that use SSEx intrinsics (that are called
via some CPU dispatch mechanism) without allowing the compiler to use SSEx instructions
everywhere (because that's not under control of the runtime CPU dispatch we have).
On VC++, this is easy. I get to use whatever intrinsics I want, and the compiler will
emit the corresponding instructions. It will not use these same instructions in code
that wasn't written with intrinsics unless I specifically allow it to with a
command-line option. In GCC and Clang, this turns out to be pretty hard (or at least
there's no good way I know of). And yes, I fully realize that the under-the-hood semantics
of this are tricky, since in a modern compiler these vector intrinsics turn into an IL that
undergoes several transforms, and it may not be obvious to the back-end where they came
from and whether it's allowed to say match a codegen pattern that uses a SSE3 instruction
or not in a particular context.
So you need to define precisely what the desired behavior actually is to decide what
should happen in such a case. A reasonable formalization is this: unless I have specified
some option that allows the compiler to use some instruction set extension everywhere
(for example, "/arch:SSE2" for VC++ lets the compiler use SSE2 instructions wherever it
wants), the compiler may only emit SSE2 instructions within functions that use SSE2
intrinsics (and hence implicitly require SSE2 anyway). Thus, even if I didn't specify
"/arch:SSE2", I would be okay with the compiler using SSE2 instructions for general-purpose
code in such functions. On x64, this particular example is somewhat moot (since x64 includes
SSE2); but the actual *behavior* of VC++ in such cases is very convenient and
programmer-friendly, and I would like to see more compilers adopt it.
I *would* like to be able to use SSSE3 intrinsics in arbitrary (x86) code, without at the
same time allowing the compiler to use SSSE3 code everywhere else in that translation unit
as consequence of automatic transformations (say, replacing a sequence of permuted integer
loads and stores with a MOVDQU, PSHUFB, MOVDQU). I get that behavior in VC++ but not in other
compilers. The problem is that while this behavior is easy to describe at the source level,
it's not necessarily obvious at the IL level.
Thus, here is my formal, precise, source language-agnostic definition of the behavior I
would like to see: I am okay with the compiler emitting (say) SSSE3 instructions in any
block that is dominated by a block containing SSSE3 intrinsics (that is, source language
statements that require SSSE3). (I am of course also okay with the compiler using SSSE3
instructions when their usage was globally enabled using a command-line switch, but I
would prefer something more selective for code that needs to run on older machines and
can't just be compiled with "ZOMG use SSE4.2 *everywhere*").
-
This *sounds* more complicated than say "don't automatically introduce SSSE3 instructions
at all" or "just give me a function-level annotation", but both of these approaches have
problems: the former is actually tricky when intrinsics are rewritten to a generic form in
the IL, and needlessly restrictive besides; the latter is fine in principle, but in practice
tend to break frequently as soon as there's inlining or link-time optimization is involved.
So my hope is that expressing the property I want purely in terms of things that are
available in a low-level, basic-blocks-plus-CFG form is helpful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment