nvidia-smi
said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred initially in the alpaca dataset (~337 token long prompts).
You can get this down to about 10.9GB if (by modifying qlora.py) you run torch.cuda.empty_cache()
after PEFT has been applied to your loaded model and before you begin training.
All instructions are written assuming your command-line shell is bash.
Clone repository: