GLSL lawyering.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // LEGAL OR NOT? | |
| // in_buf and out_buf point to the same resource. | |
| uniform isamplerBuffer in_buf; | |
| layout(r32i) uniform writeonly iimageBuffer out_buf; | |
| layout(local_size_x=64) in; | |
| void main() | |
| { | |
| // i is different for every single invocation in the entire dispatch! | |
| int i = int(gl_GlobalInvocationID.x); | |
| // shader does this | |
| int old = texelFetch(in_buf, i).x; | |
| int new = complicated_function(old); // <500 lines elided> | |
| imageStore(out_buf, i, ivec4(new)); | |
| } | |
| // ----- | |
| // | |
| // Okay, why not just use imageLoads, which makes this well-defined? | |
| // Original version did just this. This works fine on GCN. | |
| // (Actually, original original version used a shader storage buffer). | |
| // But on VLIW5/4 hardware, this version runs *much* faster (it was | |
| // more than 3x speedup on a shader that really is 1000 lines long | |
| // and that runs for multiple milliseconds). | |
| // | |
| // What is the argument for this being incorrect? | |
| // Well, I am reading from and writing to the same resource using | |
| // different mechanisms using no particular synchronization. | |
| // | |
| // Why would it even work? | |
| // Each invocation only ever reads from one location and writes to | |
| // that same location. Within an invocation, there is a dependency | |
| // chain that requires the read to have finished before the write | |
| // starts. Hence this code is race-free. | |
| // | |
| // Now suppose that the reads are being cached, and the shader is | |
| // simultaneously running on multiple CUs / shader cores / whatever. | |
| // Further suppose that shader invocations and writes are issued and | |
| // complete in no particular order and may become visible with delay. | |
| // | |
| // Suppose that at the time I'm doing the read, the corresponding line | |
| // is in the cache. The previous statements then boil down to "the contents | |
| // of said cache line are an arbitrary mix of old and new values for other | |
| // invocations". However, for the *active* invocation, I *know* that the value | |
| // I'm reading is the old one (what was originally in the buffer). Why? | |
| // Because that's the value that was there when the dispatch started, and nobody | |
| // can have modified it yet, because *I am the only invocation that writes to | |
| // that location*. | |
| // | |
| // Okay, you say - so the contents of the cache line may be partially "stale" at | |
| // read time, the value I need to read is always there in its original unmodified | |
| // form, because *it hasn't been modified yet*. Fine. But what about the writes? | |
| // Suppose this is a write-back cache! Couldn't it be writing back some of its old | |
| // stale values along with my shiny new value, potentially undoing some work that | |
| // another CU just did on another invocation? | |
| // | |
| // Well... that sounds like it might indeed break this code, but consider this | |
| // example: | |
| void main() | |
| { | |
| int i = int(gl_GlobalInvocationID.x); | |
| int j = integer_hash(i); // bijective! | |
| imageStore(out_buf, j, ivec4(i)); | |
| } | |
| // This code just stores the global invocation index to a pseudo-random location. | |
| // This is clearly race-free again. Every location is written at most once, | |
| // because integer_hash is bijective. Just a scatter, nothing else! And by | |
| // construction, we expect each item in every cache line to be written by a | |
| // different invocation. | |
| // | |
| // See the problem? If the GPU was using a write-back cache that didn't have | |
| // sufficient internal synchronization to order *writes* (i.e. if it had the | |
| // capacity to accidentally undo writes), this trivial shader wouldn't work | |
| // either! So for even basic stores to work, a GPU must implement write caching | |
| // in a way that makes sure at least disjoint stores of the basic GL data types | |
| // can't inadvertently undo each other. | |
| // | |
| // And at this point we know that, under fairly weak assumptions: | |
| // 1. I am in fact reading the value I intend to read, and | |
| // 2. the updated value I store will in fact reach memory safely (eventually) | |
| // and not be undone by another store. | |
| // | |
| // Which means that, except for the usual stuff (i.e. I need to put a | |
| // glMemoryBarrier between this dispatch and other dispatches that look at the | |
| // data written by this buffer), I really should be fine. | |
| // | |
| // But I'm really not sure what the "official" position would be, hence my | |
| // question. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment