After solving the compaction puzzle for parallel processing of values in a partitioned stream, the path to a screenspace CSG quadtree kernel is now open.
Instead of separate tiles, products and factors, we now keep a single array of factors, of which each factor also has a product index and a tile coordinate. We operate on a 256x256 tile so our tile coordinates fit into 16 bit, and allow only a maximum of 65536 products for this tile, with 2^31 addressable brushes. A factor then requires only 8 bytes: 2 bytes for its product index, 2 bytes for its tile coordinate and 4 bytes for its signed brush id.
We seed the array with all factors that matter for this 256x256 tile, sorted by tile coordinate and product index so that factors which belong to the same product are packed together, and all products which belong to the same tile are packed together as well.
We also init a brush id image with tile size 1x1.
In the beginning, there is typically