Skip to content

Instantly share code, notes, and snippets.

@njriasan
njriasan / gist:569d01e408032b80f94ae6de84ea4cb1
Created May 8, 2026 20:02
INNER_TREE Eager Raw Benchmarking
This file has been truncated, but you can view the full file.
strategy,dtype,M,N,rows_per_block,num_warps,ms,gbps
EAGER_LOOPED_ACC_INNER_TREE,fp8,1,1,1,4,0.0085,0.00
EAGER_LOOPED_ACC_INNER_TREE,fp8,1,1,4,4,0.0085,0.00
EAGER_LOOPED_ACC_INNER_TREE,fp8,1,1,1,8,0.0085,0.00
EAGER_LOOPED_ACC_INNER_TREE,fp8,1,1,4,8,0.0085,0.00
EAGER_LOOPED_ACC_INNER_TREE,fp8,1,2,1,4,0.0079,0.00
EAGER_LOOPED_ACC_INNER_TREE,fp8,1,2,4,4,0.0079,0.00
EAGER_LOOPED_ACC_INNER_TREE,fp8,1,2,1,8,0.0079,0.00
EAGER_LOOPED_ACC_INNER_TREE,fp8,1,2,4,8,0.0079,0.00
EAGER_LOOPED_ACC_INNER_TREE,fp8,1,3,1,4,0.0079,0.00
INNER_TREE is a clear net win for fp16, fp32, and fp8, but a net regression for fp64. The wins are concentrated in wide shapes (large M, small N), while
regressions are concentrated in tall-skinny shapes (small M, large N) — and for fp64, the regressions dominate.
---
fp16 — Strong win
┌───────────────────────────┬────────────────────────────────────────────────────────────────┐
│ Metric │ Value │
├───────────────────────────┼────────────────────────────────────────────────────────────────┤
│ INNER_TREE faster │ 66% of configs │