Skip to content

Instantly share code, notes, and snippets.

@rygorous
Created December 6, 2013 23:27
Embed
What would you like to do?
GLSL lawyering.
// LEGAL OR NOT?
// in_buf and out_buf point to the same resource.
uniform isamplerBuffer in_buf;
layout(r32i) uniform writeonly iimageBuffer out_buf;
layout(local_size_x=64) in;
void main()
{
// i is different for every single invocation in the entire dispatch!
int i = int(gl_GlobalInvocationID.x);
// shader does this
int old = texelFetch(in_buf, i).x;
int new = complicated_function(old); // <500 lines elided>
imageStore(out_buf, i, ivec4(new));
}
// -----
//
// Okay, why not just use imageLoads, which makes this well-defined?
// Original version did just this. This works fine on GCN.
// (Actually, original original version used a shader storage buffer).
// But on VLIW5/4 hardware, this version runs *much* faster (it was
// more than 3x speedup on a shader that really is 1000 lines long
// and that runs for multiple milliseconds).
//
// What is the argument for this being incorrect?
// Well, I am reading from and writing to the same resource using
// different mechanisms using no particular synchronization.
//
// Why would it even work?
// Each invocation only ever reads from one location and writes to
// that same location. Within an invocation, there is a dependency
// chain that requires the read to have finished before the write
// starts. Hence this code is race-free.
//
// Now suppose that the reads are being cached, and the shader is
// simultaneously running on multiple CUs / shader cores / whatever.
// Further suppose that shader invocations and writes are issued and
// complete in no particular order and may become visible with delay.
//
// Suppose that at the time I'm doing the read, the corresponding line
// is in the cache. The previous statements then boil down to "the contents
// of said cache line are an arbitrary mix of old and new values for other
// invocations". However, for the *active* invocation, I *know* that the value
// I'm reading is the old one (what was originally in the buffer). Why?
// Because that's the value that was there when the dispatch started, and nobody
// can have modified it yet, because *I am the only invocation that writes to
// that location*.
//
// Okay, you say - so the contents of the cache line may be partially "stale" at
// read time, the value I need to read is always there in its original unmodified
// form, because *it hasn't been modified yet*. Fine. But what about the writes?
// Suppose this is a write-back cache! Couldn't it be writing back some of its old
// stale values along with my shiny new value, potentially undoing some work that
// another CU just did on another invocation?
//
// Well... that sounds like it might indeed break this code, but consider this
// example:
void main()
{
int i = int(gl_GlobalInvocationID.x);
int j = integer_hash(i); // bijective!
imageStore(out_buf, j, ivec4(i));
}
// This code just stores the global invocation index to a pseudo-random location.
// This is clearly race-free again. Every location is written at most once,
// because integer_hash is bijective. Just a scatter, nothing else! And by
// construction, we expect each item in every cache line to be written by a
// different invocation.
//
// See the problem? If the GPU was using a write-back cache that didn't have
// sufficient internal synchronization to order *writes* (i.e. if it had the
// capacity to accidentally undo writes), this trivial shader wouldn't work
// either! So for even basic stores to work, a GPU must implement write caching
// in a way that makes sure at least disjoint stores of the basic GL data types
// can't inadvertently undo each other.
//
// And at this point we know that, under fairly weak assumptions:
// 1. I am in fact reading the value I intend to read, and
// 2. the updated value I store will in fact reach memory safely (eventually)
// and not be undone by another store.
//
// Which means that, except for the usual stuff (i.e. I need to put a
// glMemoryBarrier between this dispatch and other dispatches that look at the
// data written by this buffer), I really should be fine.
//
// But I'm really not sure what the "official" position would be, hence my
// question.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment