Skip to content

Instantly share code, notes, and snippets.

@ykchen-intel
ykchen-intel / TusharTracelist-GEMM-shape-cluster.md
Created April 30, 2026 23:03
GEMM Shape Clustering — Skill Reference (Tushar Tracelist)

GEMM Shape Clustering — Skill Reference

  • Machine: scaia254.sc.intel.com
  • Absolute path: /mnt/20TB/home/ychen4/20260422-GEMMShapeCluster/
  • Relative path: ~/20260422-GEMMShapeCluster/

Goal

Given a collection of GEMM workload traces from multiple LLM models, reduce 1300+ unique (M, N, K) shapes down to ~17 representative medoids that can be

@ykchen-intel
ykchen-intel / Analytical_GEMM_performance_study.md
Created April 30, 2026 22:55
Analytical GEMM Performance Study - Xe4/Xe5 projections with memory latency sweep (study document)

Analytical GEMM Performance Study

1. Objective

Project GEMM performance for specific workload shapes on Intel Xe4 (48 XeCores) and Xe5 (176 XeCores) GPU configurations, sweeping HBM memory latencies from 650ns to 1000ns at 1.5 GHz core clock. Identify optimal tile configurations and characterize bottlenecks per shape.


2. Setup Steps

@ykchen-intel
ykchen-intel / SKILL.md
Created April 30, 2026 22:54
Analytical GEMM Performance Study - Xe4 48-core & Xe5 176-core projections with memory latency sweep

SKILL.md - GPU Analytical Projections

Overview

This repository provides analytical performance projection models and simulators for Intel Xe4 (JGS) and Xe5 (TGS) GPU architectures. It models GEMM, Flash Attention, Flash Decoding, and MoE (Mixture of Experts) workloads to predict throughput, latency, bandwidth utilization, and bottleneck classification at the XeCore and SoC level.


Prerequisites