Skip to content

Instantly share code, notes, and snippets.

@airMeng
Last active June 29, 2024 02:25
Show Gist options
  • Save airMeng/41480a867232c08b5c60665e5031741a to your computer and use it in GitHub Desktop.
Save airMeng/41480a867232c08b5c60665e5031741a to your computer and use it in GitHub Desktop.
XeTLA

HW Target

Device PVC MTL DG2 LNL/BMG(TODO) ARL(TODO)
ISA Xe Xe-lpg Xe-hpg Xe2 Xe-lpg+
DPAS 8,8,16 NA 8,8,8 8,4,16 8,8,8
2D Block 32, 64 NA NA 32, 64 NA
1D Block 64 32 32 64 32

How to Add a new HW

+template <>
+struct arch_attr_t<gpu_arch::Xe2> {
+  template <msg_type message_type = msg_type::block_2d>
+  using load_store_attr = load_store_attr_t<message_type, gpu_arch::Xe2>;
+
+  template <grf_mode grf_num_mode = grf_mode::double_grf>
+  using register_attr = register_attr_t<grf_num_mode, gpu_arch::Xe2>;
+
+  using dpas_attr = dpas_attr_t<gpu_arch::Xe2>;
+
+  static constexpr uint32_t max_wg_num = 16;
+  static constexpr uint32_t local_mem_size = 128 * 1024;
+};

Workflow of INT4 GEMM

graph TD
    H[INT4 type] --> B
    I[Block size] -->B
    J[Layout] --> B
    A[Perf tuning] --> D[compute policy]
    B[Quantization info] --> D
    C[MMA engine] --> D
    Q[Arch] --> D
    D --> G[micro GEMM kernel]
    G --> E[GEMM kernel]
    O[Epilogue] --> E
    P[Group Dispatch] --> E
Loading
graph LR
   A[Activation] -- PrologueA --- C
   C[GetActivation in SLM] --> D[GemmCore]
   E[Weight in HBM] --PrologueB --- G
   G[GetWeight in SLM] --> D
   D --> H[Accumalator]
   H --> I[Epilogue]
Loading

Int4 Recommended pattern

alt text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment