Skip to content

Instantly share code, notes, and snippets.

View ahmadmustafaanis's full-sized avatar

Ahmad Mustafa Anis ahmadmustafaanis

View GitHub Profile
@Birch-san
Birch-san / fine-tuning.md
Last active December 27, 2023 17:24
Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

nvidia-smi said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred initially in the alpaca dataset (~337 token long prompts).
You can get this down to about 10.9GB if (by modifying qlora.py) you run torch.cuda.empty_cache() after PEFT has been applied to your loaded model and before you begin training.

Setup

All instructions are written assuming your command-line shell is bash.

Clone repository: