PyTorch 2.9 advertises max compute capability sm_120. Blackwell consumer / GB10
parts report sm_121. Triton (and anything that lowers through it —
torch.compile, FlashAttention, hand-written kernels) fails with
"Triton Error [CUDA]: no kernel image is available for execution on the device".
The fix is two env vars plus removing one that often appears in "workarounds":