Skip to content

Instantly share code, notes, and snippets.

@rygorous
Last active December 15, 2015 05:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rygorous/5210848 to your computer and use it in GitHub Desktop.
Save rygorous/5210848 to your computer and use it in GitHub Desktop.
Weird rendering problem
Weird rendering problem:
We need to render a 3D object such that the z values getting passed on to depth test/write for all pixels
are all exactly the same value (constant per batch), and we need to be able to choose that value freely.
This is what we'd like to do, but it doesn't work:
// at the end of the VS
out.pos.z = ourZValue * out.pos.w;
Because of round-off error, this is only *approximately* the same value at all vertices, not exactly the
same like we need.
Here's the ways we've come up with to solve the problem:
1. Do the perspective divide in the vertex shader
// at the end of the VS
float oneOverW = 1.0f / out.pos.w;
out.pos.xy *= oneOverW;
out.pos.z = ourZValue;
our.pos.w = 1.0f;
With this, we can exactly control the depth value that gets written, but we lose perspective
correction for interpolated quantities. We could pass multiply all attributes by oneOverW, pass
oneOverW as extra attribute, and then do the perspective interpolation ourselves in the pixel shader,
so now we need every pixel shader to be specialized for this, and we do manual perspective correction.
Ugh.
2. Pass ourZValue to the pixel shader (as constant / attribute), write it to oDepth.
This is reasonably straightforward, but it involves writes to oDepth, and again having variants of the
pixel shaders that do this. This is less "ugh" in terms of amount of code but still requires having
basically 2x the pixel shaders and lots of ugly code paths.
3. Massive depth bias abuse.
We set ourZValue = 0 - this always ends up exact. Then, we set the actual Z value we want as a depth
bias. This is nice in that it involves absolutely no modifications to any of the shaders, it's just a
weird projection matrix we send to the VS with a z=0 row. It should also work fine with most rendering
APIs we support.
The problem is that on D3D10+, the depth bias is part of the rasterizer state, and in our case it
changes per batch. So we'd probably end up creating (and destroying) a bunch of rasterizer state objects
per frame. This is fairly iffy.
4. Massive depth range / viewport abuse
Set a depth range that has both the min and max end at ourZValue. Now, no matter what the VS outputs, we
get ourZValue back, or at least should in theory!
But now we're calling glDepthRange (GL) or *SetViewport (D3D) for all affected batches. There's no reason
this cannot be fast - but it's extremely weird so I also wouldn't be surprised if it's a slow path
regardless.
5. ???
If you have other ideas, please ping me: @rygorous on Twitter!
@darksylinc
Copy link

Option 4 might not work for the same reasons option 1 isn't working.

Option 5:
Two passes, first one writes oDepth, the 2nd one just renders with depth read & writes off. Plain and simple.

Option 6:
Use the stencil buffer w/ increment on depth pass. Then render a fullscreen quad (one for each batch) and output the right depth (using any of the other tricks you mention, i.e. oDepth) when stencil matches the quad's value. If you have 200 batches, you'll need 200 quad passes. Doesn't work w/ >256 batches.
If unpacking the depth buffer as a texture is possible, you can use a CS or PS to do this job and output the depth values uncompressed, all in one go.

Option 7:
Mix of option 0 & 6: Instead of stencil buffer, output that "more or less" depth value you want to write using option 0 (z = z * w).
Then, assuming there's enough distance between each depth value you want to set, draw a quad with:
7a. Depth value as lower bound and use greater equal. Use oDepth to write the exact depth.
7b. Use exact depth, and use depth bias abuse to compare against lower bound w/ greater equal.

By "lower bound" I mean that if you wanted the value "0.3" and the GPU is writing in range [0.2999; 0.3001], the lower bound is 0.2999. Beware that it assumes you're rendering all your batches in order (0.3 is drawn after 0.2, which are both drawn after 0.1 and so forth)
If you have CS access, it's great because you can perform bi-directional depth tests ( "if( 0.2999 <= depth <= 0.3001 )" ) in one run for all batches, instead of doubling draw calls.
If you don't have CS access, you can complement with Stencil buffer to emulate bi-directional tests in cases your batches aren't in order or share textures, constants, etc; but it will still double your draw call count as far as I can scratch my head. It's getting late here.

These options almost always duplicate the draw call count (unless you have CS) and could have terrible bandwidth usage (for starters, lot of overdraw; unless you can predict the area the batch will cover) unless you're using CS. If the quad covers the whole screen, remember to use Persson's tip to use a tri instead of a quad.

The advantage is that it requires minimum shader changes, other than multiplying z by w. And if you're in DX11 HW, it can be very fast. (it becomes just option 0 with a rounding/quantization step as a post process)

My brain is melting by now (not because of this excercise, it's just late) and I cant't think more without knowing context (which you probably can't share), for example I don't know if you just need the end result depth buffer to be filled with particular values, or you need to interact with those particular values as you keep rendering more stuff.

Good luck

@darksylinc
Copy link

When I sad "Option 4 might not work for the same reasons option 1 isn't working." I meant the same reasons option 0 isn't working.

@darksylinc
Copy link

Random thought: I'm not sure what you want to accomplish, but it looks like you want proper rendering AND controlling the depth value which is not always possible. If the batch has two polygons covering the same pixel, having the same depth means you will need to guarantee they're submitted in the right order.

The only solution to that is to first draw on an RTT with proper depth, and then impostor-render to the scene. That, or use something else, like the stencil buffer to flag this batch as "overwrite my depth" for another pass.
If the batch doesn't have more than one triangle covering the same pixel (or their colour output is the same) then disregard this comment

@rygorous
Copy link
Author

5: Doesn't work because we need the depth test results. The whole reason we're doing this to begin with is to avoid double-shading pixels in areas where our geometry has self-overlaps.

6: This works with one batch, but due to round-off error on the Zs in the Z-buffer we now don't know what the next safe "ourZValue" is to use for the pass after that. We can try to be conservative, but it wastes "Z space". Also, one fullscreen quad for each batch = way too expensive to be practical here.

7: If you're willing to double the batch count, there's way easier ways to accomplish what we need, but we'd rather not.

While from the spec it might seem that Option 4 doesn't necessarily work, we have several targets where we know the exact implementation of depth range and know that it would work with those implementations.

We can't use compute; in fact, we're limited to stuff that works on PS2.0 level hardware.

@rygorous
Copy link
Author

"If the batch has two polygons covering the same pixel, having the same depth means you will need to guarantee they're submitted in the right order."
So? I can, and I do. :)

@darksylinc
Copy link

Oh well, the incomplete information syndrom. If this is a 1 million vertex batch, then submitting in the right order wouldn't be reasonable. I think I'm beginning to understand the whole situation. "avoid double-shading pixels in areas where our geometry has self-overlaps" + "I can, and I do [submit in right order]"

Sounds like you're rendering some sort of special geometry you're in control of, and has deep depth fighting; probably procedural; or low poly. What I don't understand then, if you're in control of the order, why don't just write depth and set depth test to always pass?
But this batch may be behind one that came earlier, so depth test is still needed and could depth with it, or may be you're sorting because the PS is too expensive and want to take advantage of Early-Z

But again, the only other solution that can cover such wide range of platforms is render to RTT then render as impostor. Since this is dead obvious, I can safely assume the amount of batches, overdraw, and need for alpha test make it prohibitively slow.
Most PS 2.0 hw wasn't even IEEE compliant, not to mention behave different on division by zero, overflow and stuff like that.

Something you may want to try is to use a very, very biiiiig Z (wishful thinking: so that Z * W / W = same value), and compensate with a very big negative depth bias to bring it back to valid range. In other words: in Vertex Shader "z = (realZ + bigOffset) * w;" then negative depth bias "-bigOffset2" to bring the value back. I would be surprised if it works at all, and you lose some control over which depth values you can set, but at least it's homogeneous and predictable. Also I wonder if the bias is applied after or before the depth clipping...
And then even if it works, pray older hardware also supports it (i.e. some gpu randomly overflowing)

@darksylinc
Copy link

After realizing my above post is embarrassingly illegible, I'll try to explain the idea in code (assuming IEEE 32-bit):

float z = 0.1f; //Desired value
float w = 11.0f;
float outZ = (z + 83886.0f) * w;
float zBack = (outZ / w) - 83886.0f;

'zBack' prints 0.101562500 nothing new. My point here is that the chance of "zBack" printing values other than '0.101562500' should be lower because outZ was a big number, with few decimal places left, if my floating point knowledge doesn't fail me (there's a big possibility it does, :P ).
Even if we change "z = 0.105", zBack still reads the same value. Even if outZ is incremented by 0.025, zBack still reads the same value (in binary).

It is wishfull thinking because it could embarrassingly fail because of how the gpu interpolates the depth during rasterization (you know that much more than I do). And even if all that is true, I'm not sure whether the depth bias is operated after or before checking depth is within clipping range.

Shall this work, the advantage would be that you can use the same large depth bias for all batches, and pass the depth as a constant to the vertex shader. The disadvantage is that you loose control over "zBack", you can just steer it towards some value, but not an exact one; it's not 100% reliable, which may not be what you're looking for.
Not to mention you have to account for HW with 24-bit floating point precision

@johnbartholomew
Copy link

Regarding your option 4, maybe you don't have to change the depth range for every batch. Perhaps you can adjust your coordinates and DepthRange so that the round-off error produced by the simple out.pos.z = ourZValue * out.pos.w gets quantised out?

Something like: you divide your depth buffer into ranges (0-255, 256-511, etc), set up the DepthRange to use the first 256 values, render 256 batches, then change DepthRange, render another 256 batches, and so on. You want your increment of ourZValue each time to be (conservatively) larger than round-off error, but then you use DepthRange to pack the final values into the depth buffer without gaps and cut off the noisy low bits.

No doubt the devil is in the details that I haven't worked out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment