For those short on time:
#extension GL_NV_gpu_shader5 : enable
#extension GL_EXT_nonuniform_qualifier : enable
#ifdef GL_EXT_nonuniform_qualifier
#define NonUniformIndex(x) nonuniformEXT(x)
#else
#define NonUniformIndex(x) (x)
#endif
Resources can only be legally accessed in dynamically uniform fashion in GLSL. In other words, when a sampling instruction (or image/buffer access) is executed, all invocations in the invocation group (dispatch, draw, or sub-draw in a multidraw) must access the same resource or the results are undefined. In the GLSL specification, the relevant sections are 3.8.2. Dynamically Uniform Expressions and Uniform Control Flow and 4.1.7. Opaque Types.
The motivation for having this restriction in GLSL is that some GPUs simply cannot do it (e.g. descriptors may be placed in scalar registers which are not unique to one lane). If we investigate the fields used for the IMAGE_SAMPLE
instruction on RDNA 3, we see this:
SSAMP: SGPR to supply S# (sampler constant) in 4 consecutive SGPRs. ...
SRSRC: SGPR to supply T# (resource constant) in 8 consecutive SGPRs. ...
Definitions:
SGPR: Scalar General Purpose Registers. 32-bit registers that are shared by work-items in each wave.
Wave: A collection of 32 or 64 work-items that execute in parallel on a single RDNA3 processor.
Work-item: A single element of work: one element from the dispatch grid, or in graphics a pixel, vertex or primitive.
Why do we care? Often, to minimize the number of draw calls issued, we reach for ARB_bindless_texture
. This extension allows us to access samplers and images from a shader without needing to explicitly bind them (and thus issue multiple draw calls). We may then wish to associate texture indices with individual mesh instances and draw them all with one call. Of course, those indices will not necessarily be dynamically uniform value. We want to leverage the awesome ergonomics that bindless textures give us!
Note that this issue applies to regular resource arrays too (e.g. uniform sampler2D textures[N];
), not just bindless textures and images. It's just that use cases requiring bindless textures often require non-uniform resource indexing as well.
We need to inform the compiler of our intentions to allow it to generate the correct code.
As of writing, EXT_nonuniform_qualifier
is supported on modern AMD drivers: https://opengl.gpuinfo.org/listreports.php?extension=GL_EXT_nonuniform_qualifier. Using this extension is shrimple. Enable it in your shader:
#extension GL_EXT_nonuniform_qualifier : enable
then put it on an expression that is used to index an array of resources, or the indexed resource itself:
vec4 col = texture(mySampler2Ds[nonuniformEXT(i)], uv);
Note that in Vulkan GLSL, you can construct a samplerND from separate samplers and textures. In this case, the nonuniform qualifier must be put on the samplerND and not the individual array indices:
vec4 col = texture(nonuniformEXT(sampler2D(myTexture2Ds[texIdx], mySamplers[samplerIdx])), uv);
Nvidia does not support EXT_nonuniform_qualifier
in OpenGL (despite having it in Vulkan). However, they do have NV_gpu_shader5
. Simply enable that and you're good to go!
#extension GL_NV_gpu_shader5 : enable
For these platforms, a handwritten waterfall loop is needed. This is essentially what the driver will do for us if we use nonuniformEXT
. This only works if your driver supports ARB_shader_ballot
(which must be enabled first):
vec4 col;
for (;;) {
uint currentIdx = readFirstInvocationARB(i);
// Ensure that the index is dynamically uniform for any instance of texture()
if (currentIdx == i) {
// Note that because the control flow path to this invocation is not uniform, the implicit derivatives will be undefined.
// In practice, it will work. If you're still worried, use textureGrad instead.
col = texture(mySampler2Ds[currentIdx], uv);
break;
}
}
Note: I don't know what's actually needed for Intel hardware since I have never worked with it. If you know, please comment.
No.
No. Yes. Maybe. Technically the derivatives will be undefined, but an implementation probably won't trash those registers for the invocations that didn't follow.
https://www.khronos.org/opengl/wiki/Sampler_(GLSL)#Texture_lookup_in_shader_stages