teoxoy/Shader_Buffer_Memory_Layout_Info.md

## Shader_Buffer_Memory_Layout_Info.md

      
    Raw
  

              Shader_Buffer_Memory_Layout_Info.md
            
          
    Shader Buffer Memory Layout Info


1. General

1.1. Background
1.2. Notation
1.3. Scalar, std430, std140 layouts
1.4. Vector-relaxed std140 / std430 layouts
1.5. Detailed array layout info
1.6. Detailed struct layout info


2. WGSL

2.1. Storage Address Space
2.2. Uniform Address Space
2.3. Notes
2.4. References


3. GLSL

3.1. Shader Storage Buffer Object
3.2. Uniform Buffer Object
3.3. Notes
3.4. References


4. SPIR-V for Vulkan

4.1. StorageBuffer Storage Class / PushConstant Storage Class / Uniform Storage Class with BufferBlock Decoration
4.2. Uniform Storage Class with Block Decoration
4.3. Notes
4.2. References


5. HLSL

5.1. Structured Buffer
5.2. Constant Buffer
5.3. Notes
5.4. References


6. MSL

6.1. Device / Constant Address Space
6.2. Notes
6.3. References


1. General

1.1. Background

What the different layout rules are solving is mapping complex (relative to scalars i.e. u32, f32) data structures to memory (a byte array); each with their own space/time tradeoffs.
Data accessed from memory requires knowledge of a byte offset (relative to the start of the memory).
The most important properties of a data structure are alignment and size.
The alignment is the divisor of any byte offset at which the given data structure can reside (i.e. offset % alignment = 0).
Alignment is a power of 2 and for performance reasons is often more than 1 (1 usually also referred to as unaligned access) due to how CPUs/GPUs data accesses are performed at a hardware level.
1.2. Notation

The SS constant denotes the inherent size of the (inner) scalar.
The roundUp function (returns n rounded up to a multiple of k) is defined for positive integers k and n as:

roundUp(k, n) = ⌈n ÷ k⌉ × k

The po2 function (returns n rounded up to a power of 2) is defined for positive integer n as:

po2(n) = 2^{⌈log₂(n)⌉}

1.3. Scalar, std430, std140 layouts


ty
scalar align
scalar size
std430 align
std430 size
std140 align
std140 size


scalar S
SS
SS
SS
SS
SS
SS


vecN<S>
SS
SS * N
po2(SS * N)
SS * N
po2(SS * N)
SS * N


matCxR<S>
SS
SS * C * R
po2(SS * R)
alignOf(self) * C
roundUp(16, SS * R)
alignOf(self) * C


array<E, N>
alignOf(E)
sizeOf(E) * N
alignOf(E)
roundUp(alignOf(E), sizeOf(E)) * N
roundUp(16, alignOf(E))
roundUp(alignOf(self), sizeOf(E)) * N


struct with members M₁...M_N
max(alignOf(M₁)...alignOf(M_N))
roundUp(alignOf(self), offsetOf(M_N) + sizeOf(M_N))
max(alignOf(M₁)...alignOf(M_N))
roundUp(alignOf(self), offsetOf(M_N) + sizeOf(M_N))
max(16, alignOf(M₁)...alignOf(M_N))
roundUp(alignOf(self), offsetOf(M_N) + sizeOf(M_N))


1.4. Vector-relaxed std140 / std430 layouts

only relevant for laying out vectors inside structs
Same std140/std430 layout rules as above with the only change being that vectors now have scalar alignment (i.e. vecN alignment = S) as long as the rules below are met
Pseudocode
// start offset
F = S * k

if sizeOf(vecN) < 16 {
    // start and end offsets need to lay in the same 16 byte block
    L = F + sizeOf(vecN)
    assert(floor(F / 16) == floor(L / 16))
} else {
    // start offset needs to be aligned to 16 bytes
    assert(F % 16 == 0)
}
1.5. Detailed array layout info

Elements of arrays are laid out according to the following algorithm
Pseudocode
// Note: Array alignment differs between layouts but is always a multiple of the element layout

// Stride is the aligned size of an element
stride = roundUp(alignOf(array), sizeOf(E))

for i in array.length() {
    // Offset at which the element resides
    array[i].offset = stride * i
}

// This is the return value of sizeOf(array)
array.size = stride * array.length()
1.6. Detailed struct layout info

Members of structs are laid out according to the following algorithm
Pseudocode
// This is the return value of alignOf(struct)
struct.alignment = max(struct.members.map(alignOf))

// Byte offset from the start of the struct
current_offset = 0

for member in struct.members {
    // Align offset for member
    current_offset = roundUp(alignOf(member), current_offset)

    // Offset at which the member resides
    // This is the return value of offsetOf(member)
    struct[member].offset = current_offset

    current_offset += sizeOf(member)
}

// This is the return value of sizeOf(struct)
struct.size = roundUp(alignOf(struct), current_offset)
2. WGSL

The default layout is std430.
The extra requirements for the uniform address space have to be explicitly met.
2.1. Storage Address Space


std430

2.2. Uniform Address Space


std140; with the caveat that matrices of the form matCx2 have an alignment of 8 instead of 16 and therefore also size C * 8 instead of C * 16

2.3. Notes


matrices are column-major
align and size attributes can be used to change the alignment and size of struct members

2.4. References

WGSL Specification
3. GLSL

3.1. Shader Storage Buffer Object


std430
std140

SSBOs require OpenGL 4.3 / OpenGL 4.0 + ARB_shader_storage_buffer_object
3.2. Uniform Buffer Object


std140

3.3. Notes


matrices are column-major (can be overriden to be row-major in buffers via row_major layout qualifier; added in GLSL 1.4)
offset and align layout qualifiers can be used to change the offset and alignment of struct members (added in GLSL 4.4 / GLSL 1.4 + ARB_enhanced_layouts)

3.4. References

OpenGL Specification
GLSL Specification
4. SPIR-V for Vulkan

4.1. StorageBuffer Storage Class / PushConstant Storage Class / Uniform Storage Class with BufferBlock Decoration


std140
std430; default
scalar; via scalarBlockLayout in Vulkan v1.2 or VK_EXT_scalar_block_layout
vector-relaxed std140 / std430; since Vulkan v1.1 or via VK_KHR_relaxed_block_layout

4.2. Uniform Storage Class with Block Decoration


std140; default
std430; via uniformBufferStandardLayout in Vulkan v1.2 or VK_KHR_uniform_buffer_standard_layout
scalar; via scalarBlockLayout in Vulkan v1.2 or VK_EXT_scalar_block_layout
vector-relaxed std140 / std430; since Vulkan v1.1 or via VK_KHR_relaxed_block_layout

4.3. Notes


Offset decoration is required on struct members
ArrayStride decoration is required on array types
MatrixStride and either ColMajor or RowMajor decorations are required for matrices


Even if scalar alignment is supported, it is generally more performant to use the base alignment.


4.2. References

Vulkan Specification
Vulkan Shader Memory Layout Guide
SPIR-V Specification (Decorations)
SPIR-V Specification (Shader Validation)
5. HLSL

5.1. Structured Buffer


scalar

5.2. Constant Buffer


vector-relaxed std140; with the caveat that struct members of type matrix, array or struct don't round up their size to a multiple of their alignment
scalar; via -no-legacy-cbuf-layout DXC flag

5.3. Notes


matrices are column-major in buffers by default (can be overriden via row_major modifier), however are row-major in shaders (notation (i.e. float4x3 is a 3 column 4 row matrix), construction and access are all row-major)

5.4. References

DXC Buffer Packing Wiki
HLSL Constant Buffer Packing Rules
DXC HLSL to SPIR-V Feature Mapping
6. MSL

6.1. Device / Constant Address Space


std430; with the caveat that vector 3's size is 16 instead of 12 (however a packed vector 3 with the alignas specifier = 16 can be used instead)

6.2. Notes


provides extra packed vectors (scalar layout)
matrices are column-major
alignas specifier can be used to change the alignment (can be applied to structs or struct members)

6.3. References

MSL Specification
ty	scalar align	scalar size	std430 align	std430 size	std140 align	std140 size
scalar S	SS	SS	SS	SS	SS	SS
vecN<S>	SS	SS * N	po2(SS * N)	SS * N	po2(SS * N)	SS * N
matCxR<S>	SS	SS * C * R	po2(SS * R)	alignOf(self) * C	roundUp(16, SS * R)	alignOf(self) * C
array<E, N>	alignOf(E)	sizeOf(E) * N	alignOf(E)	roundUp(alignOf(E), sizeOf(E)) * N	roundUp(16, alignOf(E))	roundUp(alignOf(self), sizeOf(E)) * N
struct with members M₁...M_N	max(alignOf(M₁)...alignOf(M_N))	roundUp(alignOf(self), offsetOf(M_N) + sizeOf(M_N))	max(alignOf(M₁)...alignOf(M_N))	roundUp(alignOf(self), offsetOf(M_N) + sizeOf(M_N))	max(16, alignOf(M₁)...alignOf(M_N))	roundUp(alignOf(self), offsetOf(M_N) + sizeOf(M_N))