Skip to content

Instantly share code, notes, and snippets.

@FireNX70
Last active June 2, 2024 11:40
Show Gist options
  • Save FireNX70/10c72169ea0105f998bab6f51443d42e to your computer and use it in GitHub Desktop.
Save FireNX70/10c72169ea0105f998bab6f51443d42e to your computer and use it in GitHub Desktop.
melonDS compute renderer testing notes
Cheep Cheep Beach in MKDS will display garbage even at 5x if you go into the
water. (Happened once, can't reproduce)
I tried using a debug context, enabling the debug output from the compute
renderer (even though it shouldn't be needed because debug contexts enable it by
default) and giving it a function to print debug messages. It didn't seem to
print any more info than when I tested with a regular context.
I also tested with Mesa. It prints even less stuff than Nvidia's driver
(Nvidia's driver only printed messages about successful buffer allocations).
Newer versions of Mesa will have the red and blue channels swapped. A fresh
install of Debian 12.2 has it working fine. It's broken after dist-upgrading to
testing (Trixie), which at the time of writing is using Mesa 24.0.8. I suppose
it's a small behavior change in newer Mesa releases. It still doesn't log any
errors.
I checked all compute shaders with the glslang validator and got no complaints
from it.
Tried increasing maxYSpanIndices (there's a comment saying the current values
are a bad guess). It didn't change anything.
There's a couple of glMemoryBarrier calls using GL_SHADER_STORAGE_BUFFER, which
looks wrong to me. The documentation doesn't list GL_SHADER_STORAGE_BUFFER as a
glMemoryBarrier flag; but it does list GL_SHADER_STORAGE_BARRIER_BIT, which
later glMemoryBarrier calls use. Replacing GL_SHADER_STORAGE_BUFFER with
GL_SHADER_STORAGE_BARRIER_BIT did not get rid of the garbage. Adding barriers
after the first two glDispatchCompute calls did not fix it either. Adding a
barrier after the glDispatchCompute call in the loop did not fix it.
I checked that the vast majority of the pipeline wasn't exceeding
GL_MAX_COMPUTE_WORK_GROUP_COUNT. It wasn't. For the stages using
glDispatchCompute; if this were a problem OpenGL should log errors and it
doesn't, so they must be fine. glDispatchComputeIndirect will not log OpenGL
errors even if the compute work group counts are too high, and checking these
would be annoying since I'd have to read the buffer back from VRAM. However,
RenderDoc did capture the values used for these. At 6x none of the calls ever go
above the minimum of 65535. At 7x there is one call that does exceed that
minimum (glDispatchComputeIndirect(1, 1, 79414)). This exceeds the maximum
supported by my 1080Ti with the 552 driver (it sticks to the minimum of 65535 in
the Z axis). I tried replacing the glDispatchComputeIndirect call in the loop
with glDispatchCompute(1, 1, 65535). This DID have an effect on the garbage, but
it did not fix it. It's likely this is the problem on Nvidia. The local sizes
all seem fine.
Tried grabbing a couple of captures with RenderDoc while running Phantom
Hourglass. The low res framebuffer seems to be affected in a similar way to the
high res one. The high res framebuffer also looks like it's got the red and blue
channels swapped before presentation (much like what I was seeing on Mesa).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment