from Landom Thomas https://landonthomas.net/docs/gpu_compute_model_terms_quick_ref.pdf
Compute Abstraction Hierarchy | Microsoft HLSL | Khronos GLSL | Khronos OpenCL C | Nvidia,CUDA C | AMD,HIP | Apple,MSL |
---|---|---|---|---|---|---|
Entire Kernel/Shader Compute Space | Dispatch | Dispatch, Compute Space | NDRange, Index Space | Grid | Grid | Grid |
Major Compute Group | Group, Thread Group | Workgroup, Local Workgroup | Work-group | Block, Thread Block | Block | Threadgroup |
Minor Compute Group (& Device Minor SIMD Unit) | Wave | Subgroup | Sub-group | Warp | Wavefront, Wave, Warp | SIMD-group |
Quad Compute Object | Quad Wave | Subgroup Quad | ? | Quad | Quad | Quad-group |
Single Compute Object | Thread | Invocation | Work-item | Thread | Thread | Thread |
Hardware Abstraction Hierarchy | Microsoft HLSL | Khronos GLSL | Khronos OpenCL C | Nvidia,CUDA C | AMD,HIP | Apple,MSL |
---|---|---|---|---|---|---|
GPU / Compute Device | Device | Physical Device | Compute Device | Device | Device | Device |
Major SIMD / Multi Processor Unit | SIMD Processor | Compute Unit (CU) | Compute Unit (CU) | Streaming Multiprocessor (SM) | Compute Unit (CU) | Compute Unit (CU) |
Minor SIMD Unit(& Compute Minor Group) | Wave | Subgroup | Sub-group | Warp | Wavefront, Wave, Warp | SIMD-group |
Single SIMD Processor | Lane | ? | Processing Element (PE) | Streaming Processor (SP), Lane | Processing Element (PE), Lane | Lane |
Memory Abstraction Hierarchy | Microsoft HLSL | Khronos GLSL | Khronos OpenCL C | Nvidia,CUDA C | AMD,HIP | Apple,MSL |
---|---|---|---|---|---|---|
Contiguous Device Memory ~L2+ cache | Device Memory | Buffer, Image Memory | Global Memory | Global Memory | Global Data Share (GDS), Global Segment | Device Memory / Address Space |
Faster, Partitioned Memory ~L1 cache | Thread Group Shared Memory (TGSM) | Shared Memory | Local Memory | Shared Memory | Local Data Share (LDS), Local Segment | Threadgroup Memory / Address Space |
Fastest, Smallest Memory ~L0 cache | Temporary Registers | Subgroup Memory | Private Memory | Local Memory | L0 vector cache, Private Segment | Thread Memory / Address Space |