Sharlock93/image_layout.md

## image_layout.md

      
    Raw
  

              image_layout.md
            
          
    Huge thanks to Fabian Giesen (Mastadon: @rygorous@mastodon.gamedev.place ) for taking the time to answer my question.
Q: I have been doing a lot of vulkan lately and I was wondering if you knew where "Image Layouts" came from?
do they mean anything to the GPU or is this just something the driver takes as a hint?
A: They do mean something. It's mostly (but not exclusively) about additional metadata associated with attachments.
If you have access to AMD HW console GPU docs, read up on Cmask, Fmask, Hmask, DCC. That kind of stuff. :)
Often related to memory access/bandwidth optimizations that use secondary metadata elsewhere.
Color compression, hierarchical Z/stencil, depth compression, multisampling optimizations, fast clears, the works.
All of these have extra metadata stored (and cached) somewhere to reduce memory accesses for the primary copy.
For example, for fast clears, you might have a metadata bit for "this block of 8x8 pixels is supposed to be cleared"
(with the clear color stored in some register), and then reads from that block return the clear color and the next render target
write to any pixel in that block actually writes out those pixel values. (and clears out the "this block is suppose to get cleared"
metadata bit).
The point being that you deferred that clear until you were actually going to write to those cache lines anyway
(and also saved a read of all-clear-color pixels for that block in the process).
Then to clear say a 3840x2160 RGBA16F render target, instead of memset-ing 64MB of data,
you program in the clear color and memset (3840/8) * (2160/8) bits = 129600 bits = about 16k, saving those 64MB of write bandwidth,
and also the corresponding amount of read bandwidth for the first time each of these pixels get read.
(This particular example is a bit old-school; fast clears are a thing but so is color compression, and these days,
dedicated fast clear logic is often subsumed by the general color compression path.)
Anyway, I'm using this specific example because it shows why image layout transitions are a thing:
the logic described usually lives in the ROP/CB/blend units, and only there.
So:

Their metadata caches are usually not coherent, so after that 16k memset for the clear, they need to get invalidated
Other units such as texture samplers or display/scan-out HW often do not know about the fast clear trick, or the associated metadata.
Before either consume these render targets, you usually resolve the fast clears (by actually writing the not-yet-cleared pixels).
Note you still saved one read and one write per pixel that wrote over the clear color.

This kind of thing is everywhere.
For stuff like depth compression/hierarchical Z, you both have a compressed representation and
extra metadata (including for clears!) per block, and again,
your texture samplers (for say shadow map reads) often can't read that directly.
Layout transitions handle both the "caches need invalidation" part and the
"we might need to do an extra pass to make the primary representation consistent" part.

Q: how come consoles themselves (at least for the PS5) don't have these layout transitions?
is it because you have better access or need to manually do/enable those features/invalidate the cache?
A: On PS4/PS5 you need to manually do the syncs, eliminate fast clears, etc., yes
Q: so is my understanding correct that they[Image Layouts] do mean something but they don't map to some feature on the hardware?
as compared to say a texture's format where it will change how the hardware will deal with the memory.
A: The texture format maps to the pixel format, VkImageTiling maps to the texture layout (e.g. linear vs. various tiled modes),
the image layout does not physically change anything about the image per se but is essentially a statement of intent about how an
image is currently used (and through which read/write paths) and changing layouts may trigger invalidations or extra blits.
(Somewhat akin to memory barriers for resources, which also imply necessary sync+cache invalidations.)