Skip to content

Instantly share code, notes, and snippets.

View seungrokjung's full-sized avatar

jrok seungrokjung

View GitHub Profile
@seungrokjung
seungrokjung / quant_cuda_kernel.cu
Created May 5, 2023 12:22
GPTQ quantization kernel of dequantization + fp16 gemm operation compatible with Hipify. Original cuda code is from "https://github.com/oobabooga/GPTQ-for-LLaMa/blob/cuda/quant_cuda_kernel.cu"
#include <torch/all.h>
#include <torch/python.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <cuda_fp16.h>
// atomicAdd for double-precision floating-point numbers on hardware with
// compute capability < 6.0 from:
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 600