Skip to content

Instantly share code, notes, and snippets.

View tobi's full-sized avatar

Tobias Lütke tobi

View GitHub Profile
@Birch-san
Birch-san / llama-convert.md
Created June 1, 2023 18:24
Converting LLaMA model weights to huggingface format + safetensors

Loading LLaMA via Huggingface + Safetensors, with 4-bit quantization

Let's say we're trying to load a LLaMA model via AutoModelForCausalLM.from_pretrained with 4-bit quantization in order to inference from it:

python -m generate.py

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, LlamaTokenizerFast, LlamaForCausalLM
import transformers