Ahmad Mustafa Anis ahmadmustafaanis

## fine-tuning.md

      
        
          
            
              
              1 file
            
          
          
            
              
              0 forks
            
          
          
            
              
              1 comment
            
          
          
            
              
              27 stars
            
          
        
        
          
              
          
          
            
                Birch-san
                / fine-tuning.md
            
            
              Last active
              December 27, 2023 17:24
            
              
                Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization
              
          
        
      
        
  
      
    Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

nvidia-smi said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred initially in the alpaca dataset (~337 token long prompts).

You can get this down to about 10.9GB if (by modifying qlora.py) you run torch.cuda.empty_cache() after PEFT has been applied to your loaded model and before you begin training.
Setup

All instructions are written assuming your command-line shell is bash.
Clone repository: