paniq/csg_kernel.md

## csg_kernel.md

      
    Raw
  

              csg_kernel.md
            
          
    After solving the compaction puzzle for parallel processing of values in a partitioned stream, the path to a screenspace CSG quadtree kernel is now open.
Instead of separate tiles, products and factors, we now keep a single array of factors, of which each factor also has a product index and a tile coordinate. We operate on a 256x256 tile so our tile coordinates fit into 16 bit, and allow only a maximum of 65536 products for this tile, with 2^31 addressable brushes. A factor then requires only 8 bytes: 2 bytes for its product index, 2 bytes for its tile coordinate and 4 bytes for its signed brush id.
We seed the array with all factors that matter for this 256x256 tile, sorted by tile coordinate and product index so that factors which belong to the same product are packed together, and all products which belong to the same tile are packed together as well.
We also init a brush id image with tile size 1x1.
In the beginning, there is typically only a single tile, and all subsequent passes ensure that this ordering is maintained.
A culling pass then consists of these steps

For each factor, compute near and far depth bounds within the frustum of its tile. This step is modified for the very last pass, where we raytrace the bounds so that they have a width of zero, and so that only a single factor per tile survives, and is forced to be written to the brush id image.
For each factor, compare its depth bound against all other factors in the same product using a modified Goldfeather algorithm for products and mark the factor if it has not been erased. Use the product/tile tags of each factor and linear search to find left and right boundaries of the local region (Local regions are typically small, so this should be okay.)
Build a histopyramid, compacting all factors using the predicate map built in (2) to cull all useless factors and create a new, compacter factor buffer, as well as a compacter depth bound buffer from (1).
Find the nearest among the near interior depth bounds within each tile using partitioned stream processing.
Using the buffer built in (4), mark each factor whose near depth bound lies behind the threshold, i.e. is fully occluded.
Repeat (3), once more compacting the factor buffer by culling all occluded factors.
Using the buffer built in (6), mark each factor whose immediate neighbors do share its tile (as this means that a tile has more than one factor left), otherwise write the factor's brush id to the brush id image at its tile coordinate.

These steps need to be executed for all but the last iteration:

Repeat (3), compacting the factor buffer a final time by culling all singular factors. The compaction can be combined with tile subdivision if the marking in (7) writes 4 instead of 1. For each factor, compute the position of its equivalent four copies in the new buffer (using the factor buffer's size as stride), updating for the new tile coordinates.
Double the size of the brush id image.

After all passes have run, the brush id image is now complete, and the brush id specified at each pixel can be raytraced to retrieve the final depth value and normal.
In the last few steps, many subdivisions only solve intersections of two or three brushes; It's worth investigating if step (7) could discard tiles of up to three factors, and allow for three uint slots in the brush id image; Then the final raytrace pass finalizes among these three tiles.