kvark/clustered.md

## clustered.md

      
    Raw
  

              clustered.md
            
          
    Clustered shading

http://www.cse.chalmers.se/~olaolss/get_file.php?filename=papers/clustered_shading_preprint.pdf
Why?


Allows arbitrary complex material BRDFs
Very friendly to MSAA and high resolutions
Doesn't need the depth pre-pass
Allows forward rendering of opaque and transparent objects in the same way
Compatible with DX9/GL2 class hardware

General steps for DX9/GL2 HW


CPU writes light information into the light buffer
CPU software-rasterizes lights into the 3D texture. Each cell contains (offset, count) encoded into uint32.
* Maintains a list for each cluster. Then scans through the nursery 3D texture and fills up light index buffer and the cluster 3D texture.
* Writes the (cluster address, light index) into the nursery light index array (NLIA). Then sorts it by the cluster address, and linearly fills both 3D texture and the light index buffer.
CPU sends cluster texture, light index buffer, and the light buffer onto the GPU. A latency of one frame should be acceptable.
GPU draws everything in a forward manner:
1. Cluster texture coordinate is interpolated from the vertex stage
2. Fragment shader fetches the (offset, count) from the cluster texture
3. It iterates over the slice in the light index buffer
4. For each index, it loads light parameters from the light buffer
5. Finally, evaluates the lighting

Adopted to DX11/GL4 pipeline


Get the light buffer provided by CPU
Rasterize into the 3D cluster texture. Options:
* Use instanced rendering (since the light info is given)
* Use a geometry shader to draw into all the depth layers at once (with culling)
For each output fragment, append the (cluster address, light index) into the NLIA
Parallel-sort NLIA by cluster address (and possibly light intensity)
Initialize cluster texture with INT_MAX
For each NLIA entry (in parallel):
1. Copy the light index into the light index buffer
2. Atomic min the start index for the target cluster with the current index
3. Atomic increment the count of lights for the target cluster

Conclusion

It is very possible!
gfx-rs is not quite ready for such magic though. Here is what we are missing:

Unordered access view resources