Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save cedrickchee/e0e6a3a9c4e799355a0c0001a6a89c7f to your computer and use it in GitHub Desktop.
Save cedrickchee/e0e6a3a9c4e799355a0c0001a6a89c7f to your computer and use it in GitHub Desktop.
GPGPU Programming Flash Attention 2 with CUDA

GPGPU Programming Flash Attention 2 with CUDA

GPUs Go Brrr by Hazy Research, Stanford, May 2024.

we’re going to talk about what we’ve learned about making GPUs go brr -- and release an embedded DSL, ThunderKittens, that we’ve built to help us write some particularly speedy kernels (which we are also releasing).

small library (DSL?) that we called ThunderKittens that we hope lets us write simple-to-understand clean code that indeed makes gpus go brrr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment