cedrickchee/flash_attention_cuda_programming.md

## flash_attention_cuda_programming.md

      
    Raw
  

              flash_attention_cuda_programming.md
            
          
    GPGPU Programming Flash Attention 2 with CUDA

GPUs Go Brrr by Hazy Research, Stanford, May 2024.

we’re going to talk about what we’ve learned about making GPUs go brr -- and release an embedded DSL, ThunderKittens, that we’ve built to help us write some particularly speedy kernels (which we are also releasing).


small library (DSL?) that we called ThunderKittens that we hope lets us write simple-to-understand clean code that indeed makes gpus go brrr.