Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save loopervfx/85d8bd8d362f08dee8bc65b0b30aef73 to your computer and use it in GitHub Desktop.
Save loopervfx/85d8bd8d362f08dee8bc65b0b30aef73 to your computer and use it in GitHub Desktop.
These generalistions about never using conditionals in GPU code are misleading, anyway. Just be careful about using conditionals
so that your thread groups are mostly either true or false. like checking a UV coord on a 2D frag/pixel shader
e.g. if (uv.x < 0.53) and branching in a clean cut through the texture or screenspace coordinates, is no problem on modern GPUs.
Because only a few thread groups in the dispatch "grid" might get held up --the ones that have a mix of true/false threads around
the 0.53 coord.
The situation to watch out for is doing something like, a conditional to check for even or odd texel coords or a conditional
with some pseudo random determination which means every thread group will have a heterogeneous mix of true/false
and every thread group in the dispatch will be held up taking the time to execute both conditions.
Even in that worst case scenario, if the operations are simple enough and the dispatch isn't gigantic it doesn't always matter
that much on modern GPUs. everyone should be cautious and use lookups, ifdef ands switch cases etc when it makes sense
but it's not always worth the few microseconds. (except for when it does, say with a giant shader dispatch,
or a slim mobile GPU performance budget, and meets the criteria above for thread branch heterogeneity, etc.)
If you did something like `if(fragcoord.x % 2) then execute these instructions, else these other instructions` then
sure it might become an issue at scale or sufficient cost in your branching code, say, a bunch of texture samples in each branch.
really simple value assignments and basic one line vector math operations aren't going to be that noticeable when branching though,
like the step function mentioned above.
And then if you use % or mod by themselves it shouldn't even branch at all because it's still the same instruction
executed for every thread. Don't quote me on this though i haven't examined the IR or disassembly to be 100% sure.
Small fast simple little branches at smaller scales don't matter that much. especially if they mostly branch into
large homogeneous groups, equal to or greater then the thread group size.
Big long complex or expensive branches in large shaders dispatches, and/or with too much heterogeneity occurring smaller
than the thread group size is what to watch out for.
Of course, there are many other factors like comparing floats vs ints and uints, bools and relational operators,
and other considerations I'm sure I have not included here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment