Skip to content

Instantly share code, notes, and snippets.

@kvark
Last active November 23, 2019 02:17
Show Gist options
  • Save kvark/4d400632714011f80ff1 to your computer and use it in GitHub Desktop.
Save kvark/4d400632714011f80ff1 to your computer and use it in GitHub Desktop.
Clustered pipeline description

Clustered shading

http://www.cse.chalmers.se/~olaolss/get_file.php?filename=papers/clustered_shading_preprint.pdf

Why?

  • Allows arbitrary complex material BRDFs
  • Very friendly to MSAA and high resolutions
  • Doesn't need the depth pre-pass
  • Allows forward rendering of opaque and transparent objects in the same way
  • Compatible with DX9/GL2 class hardware

General steps for DX9/GL2 HW

  1. CPU writes light information into the light buffer
  2. CPU software-rasterizes lights into the 3D texture. Each cell contains (offset, count) encoded into uint32. * Maintains a list for each cluster. Then scans through the nursery 3D texture and fills up light index buffer and the cluster 3D texture. * Writes the (cluster address, light index) into the nursery light index array (NLIA). Then sorts it by the cluster address, and linearly fills both 3D texture and the light index buffer.
  3. CPU sends cluster texture, light index buffer, and the light buffer onto the GPU. A latency of one frame should be acceptable.
  4. GPU draws everything in a forward manner: 1. Cluster texture coordinate is interpolated from the vertex stage 2. Fragment shader fetches the (offset, count) from the cluster texture 3. It iterates over the slice in the light index buffer 4. For each index, it loads light parameters from the light buffer 5. Finally, evaluates the lighting

Adopted to DX11/GL4 pipeline

  1. Get the light buffer provided by CPU
  2. Rasterize into the 3D cluster texture. Options: * Use instanced rendering (since the light info is given) * Use a geometry shader to draw into all the depth layers at once (with culling)
  3. For each output fragment, append the (cluster address, light index) into the NLIA
  4. Parallel-sort NLIA by cluster address (and possibly light intensity)
  5. Initialize cluster texture with INT_MAX
  6. For each NLIA entry (in parallel): 1. Copy the light index into the light index buffer 2. Atomic min the start index for the target cluster with the current index 3. Atomic increment the count of lights for the target cluster

Conclusion

It is very possible!

gfx-rs is not quite ready for such magic though. Here is what we are missing:

  • Unordered access view resources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment