You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
Instantly share code, notes, and snippets.
Birch-san
A Very Full Stack dev. I like low-level (C++, WASM, LLVM, Graal, DTrace, BPF) and devops (Docker, networking, monitoring, logging). I like performance + scale.
Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization
Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization
nvidia-smi said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred initially in the alpaca dataset (~337 token long prompts). You can get this down to about 10.9GB if (by modifying qlora.py) you run torch.cuda.empty_cache() after PEFT has been applied to your loaded model and before you begin training.
Setup
All instructions are written assuming your command-line shell is bash.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Installing CUDA 12.1.1 + PyTorch nightly + Python 3.10 on Ubuntu 22.10
Installing CUDA 12.1.1 + PyTorch nightly + Python 3.10 on Ubuntu 22.10
Should you keep your NVIDIA driver?
CUDA 12.1.1 toolkit is gonna offer to install Nvidia driver 530 for us. It's from New Feature branch. It's likely to be newer than the default Nvidia driver you would've installed via apt-get (apt would prefer to give you 525, i.e. Production Branch).
If you're confident that you already have a new enough Nvidia driver for CUDA 12.1.1, and you'd like to keep your driver: feel free to skip this "uninstall driver" step.
But if you're not sure, or you know your driver is too old: let's uninstall it. CUDA will install a new driver for us later.
Compute size of buffer required to fit q_proj @ k_proj.T attention scores
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Reducing the softmax denominator to sum only as many attention scores as the in-distibution checkpoint would've, so that its outputs have in-distribution magnitudes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters