Skip to content

Instantly share code, notes, and snippets.

@liangel-02
liangel-02 / cudnn_mismatch.md
Last active October 28, 2025 20:30
Varlen API cudnn backward numerical mismatch

Relevant PRs

Summary

We are implementing variable length attention with cuDNN backend and the outputs between our API and SDPA with packing doesn’t match after the backward.

In the provided repro, we included the definition of _varlen_attn(), our private custom op that calls into _cudnn_attention_forward(). We also define _backward(), the backward pass that is registered with torch.autograd(). This calls _cudnn_attention_backward().

@liangel-02
liangel-02 / cuDDN_mismatch.md
Last active October 7, 2025 20:16
Varlen API cuDNN numerical mismatch

Summary

We are implementing variable length attention with cuDNN backend and the outputs between our API and SDPA with packing doesn’t match after the forward.

In the provided repro, we included the definition of _varlen_attn(), our private custom op that calls into _cudnn_attention_forward().

Then, in our test:

  • We first define an AttentionBlock that has two forward methods, one that calls our implementation, and the other calls .scaled_dot_product_attention().
  • We call create_variable_length_batch() with batch_size = 2, max_seq_len = 128, embed_dim = 32, and num_heads = 4. This creates x_padded for SDPA and x_packed for varlen.
  • Then, we call the respective forward methods and compare outputs per batch. We expect that the tensors are close within the tolerance that we set.