Skip to content

Instantly share code, notes, and snippets.

@ch-wan
ch-wan / transform_attn_capture_replay.py
Created May 23, 2026 02:51
Mechanical refactor transform: unify cuda-graph capture/replay across 3 attention backends (FlashMLA, TRTLLM-MHA, FlashAttention)
#!/usr/bin/env python3
"""Reproducible transform for: unify cuda-graph capture/replay across 3 attention backends
Run from the repo root: python3 /tmp/transform_attn_capture_replay.py
"""
import sys
from pathlib import Path
sys.path.append(".claude/skills/mechanical-refactor-verify")
from mechanical_refactor_verify_utils import (
@ch-wan
ch-wan / transform_unify_cuda_graph_capture_replay.py
Created May 23, 2026 01:46
Mechanical refactor transform: unify cuda-graph capture/replay across attention backends (PR #26134)
#!/usr/bin/env python3
"""Reproducible transform for: unify cuda-graph capture/replay across attention backends
Covers:
- cutlass_mla_backend: delegate capture to replay; remove stale assert + blank line in replay
- flashinfer_mla_backend: merge identical is_target_verify / is_draft_extend branches
- wave_backend: extract _build_cuda_graph_forward_metadata; delegate capture to replay;
remove three stale comments in replay
- flashinfer_backend: extract _create_decode_wrappers / _create_prefill_wrappers /
_prepare_cuda_graph_metadata; collapse capture to prepare + replay; merge