sebbbi/BetterBuffers.txt

## BetterBuffers.txt
All current buffer types in shading languages are slightly different ways to present homogeneous arrays (single struct or type repeating N times in memory).

DirectX has raw buffers (RWByteAddressBuffer) but that is limited to 32 bit integer types and the implementation doesn't require natural alignment for wide loads resulting in suboptimal codegen on Nvidia GPUs.

Complex use cases, such as tree traversal in spatial data structures (physics, ray-tracing, etc) require data structure that is non-homogeneous. You want different node payloads and tight memory layout.

Ability to mix 8/16/32 bit data types and 1d/2d/4d vectors to faciliate GPU wide loads (max bandwidth) in same data structure is crucial for complex use cases like this.

On the other hand we want better more readable/maintainable code syntax than DirectX raw buffers without manual bit packing/extracting and reinterpret casting. Goal should be to allow modern GPUs to use sub-register addressing (SDWA on AMD hardware). Saving both ALU and registers.

// Declare buffers
Buffer myBuffer;
RWBuffer myBufferRW;

// Load data from buffer
MyStruct data = myBuffer.load<MyStruct>(alignedOffset);

// Store data to buffer:
MyStruct data;
myBufferRW.store<MyStruct>(alignedOffset, data);

MyStruct can be of course replaced with native types such as float, uint4, etc. AlignedOffset must abide natural alignment of the largest field in the struct. Similar to C/C++ struct alignment rules.

Alignment of float4 (4d vector) is 16 bytes, 8 bytes for 2d vector. This allows all GPU vendors (including Nvidia) to emit optimal wide load instructions, solving the performance issue with DirectX raw buffers.

Shader language needs to be extended with native 8 and 16 bit types (min16float isn't good enough). Structs can include 8 bit int/uint, 16/32 bit int/uint/float types as scalars and 2d/4d vectors.

struct MyStruct
{
  f32 oneFloat;
  f32x4 fourFloats;
  f16x2 twoHalfFloats;
  u8x4 fourBytes;
  i16 one16bInt;
};
	All current buffer types in shading languages are slightly different ways to present homogeneous arrays (single struct or type repeating N times in memory).

	DirectX has raw buffers (RWByteAddressBuffer) but that is limited to 32 bit integer types and the implementation doesn't require natural alignment for wide loads resulting in suboptimal codegen on Nvidia GPUs.

	Complex use cases, such as tree traversal in spatial data structures (physics, ray-tracing, etc) require data structure that is non-homogeneous. You want different node payloads and tight memory layout.

	Ability to mix 8/16/32 bit data types and 1d/2d/4d vectors to faciliate GPU wide loads (max bandwidth) in same data structure is crucial for complex use cases like this.

	On the other hand we want better more readable/maintainable code syntax than DirectX raw buffers without manual bit packing/extracting and reinterpret casting. Goal should be to allow modern GPUs to use sub-register addressing (SDWA on AMD hardware). Saving both ALU and registers.

	// Declare buffers
	Buffer myBuffer;
	RWBuffer myBufferRW;

	// Load data from buffer
	MyStruct data = myBuffer.load<MyStruct>(alignedOffset);

	// Store data to buffer:
	MyStruct data;
	myBufferRW.store<MyStruct>(alignedOffset, data);

	MyStruct can be of course replaced with native types such as float, uint4, etc. AlignedOffset must abide natural alignment of the largest field in the struct. Similar to C/C++ struct alignment rules.

	Alignment of float4 (4d vector) is 16 bytes, 8 bytes for 2d vector. This allows all GPU vendors (including Nvidia) to emit optimal wide load instructions, solving the performance issue with DirectX raw buffers.

	Shader language needs to be extended with native 8 and 16 bit types (min16float isn't good enough). Structs can include 8 bit int/uint, 16/32 bit int/uint/float types as scalars and 2d/4d vectors.

	struct MyStruct
	{
	f32 oneFloat;
	f32x4 fourFloats;
	f16x2 twoHalfFloats;
	u8x4 fourBytes;
	i16 one16bInt;
	};