Skip to content

Instantly share code, notes, and snippets.

#include <iostream>
#include <cuda_runtime.h>
int main() {
// Ref: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html
// Compilation command: nvcc device_query.cpp -arch=sm_100 -o device_query && ./device_query
int device = 0;
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, device);
@kimbochen
kimbochen / enable_gqa_repro.py
Last active November 10, 2024 22:38
SDPA `enable_gqa` Speedup Repro
import torch
import torch.nn.functional as F
@torch.compile()
def baseline(q_BHTD, k_BJTD, v_BJTD, gq_ratio):
k_BHTD = k_BJTD.repeat_interleave(gq_ratio, 1)
v_BHTD = v_BJTD.repeat_interleave(gq_ratio, 1)
o_BHTD = F.scaled_dot_product_attention(q_BHTD, k_BHTD, v_BHTD, is_causal=True)
return o_BHTD
import functools
from dataclasses import asdict, dataclass
from typing import Optional
import torch
import torch.nn as nn
import torch.nn.functional as F
@dataclass
@kimbochen
kimbochen / prefix-flash-attn.ipynb
Created July 17, 2024 04:36
prefix-flash-attn
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kimbochen
kimbochen / speculative-sampling.ipynb
Last active July 17, 2024 04:27
speculative-sampling.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kimbochen
kimbochen / notes.md
Created July 7, 2024 05:05
ML Efficiency Notes

ML Efficiency Notes

GPU Specs

Name FP16 Compute Memory Bandwidth Memory Size TDP
A100 312 TFLOP/s 2 TB/s 40 GB 250 W
H100 750 TFLOP/s 2 TB/s 80 GB 350 W
A10 125 TFLOP/s 0.6 TB/s 24 GB 150 W
@kimbochen
kimbochen / flash-attn-triton.ipynb
Last active July 17, 2024 04:27
flash-attn-triton.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kimbochen
kimbochen / triton-puzzles.ipynb
Last active May 3, 2024 22:31
triton-puzzles.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kimbochen
kimbochen / pytorch-practice.ipynb
Last active February 26, 2024 01:59
PyTorch Practice
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kimbochen
kimbochen / hand-code-mlp-backprop.ipynb
Last active February 26, 2024 00:23
Hand-code MLP Backprop
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.