Icaro Bombonato ibombonato

## llama-convert.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              18 stars
            
          
                Birch-san
                / llama-convert.md
            
            
              Created
              June 1, 2023 18:24
            
              
                Converting LLaMA model weights to huggingface format + safetensors
              
          
    Loading LLaMA via Huggingface + Safetensors, with 4-bit quantization

Let's say we're trying to load a LLaMA model via AutoModelForCausalLM.from_pretrained with 4-bit quantization in order to inference from it:
python -m generate.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, LlamaTokenizerFast, LlamaForCausalLM
import transformers

  
## fine-tuning.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              27 stars
            
          
                Birch-san
                / fine-tuning.md
            
            
              Last active
              December 27, 2023 17:24
            
              
                Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization
              
          
    Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

nvidia-smi said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred initially in the alpaca dataset (~337 token long prompts).

You can get this down to about 10.9GB if (by modifying qlora.py) you run torch.cuda.empty_cache() after PEFT has been applied to your loaded model and before you begin training.
Setup

All instructions are written assuming your command-line shell is bash.
Clone repository:

  
## Harry_Potter_QA.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                radekosmulski
                / Harry_Potter_QA.ipynb
            
            
              Created
              August 9, 2021 15:09
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## optimization-lpsolve.R

require(ggplot2)
require(ggthemes)

require(plyr)
require(stringr)

library(lpSolve)

out <- read.csv("https://raw.githubusercontent.com/flovv/flovv.github.io/master/_Rmd/data/com.csv", sep="|")

## orig.png

      
              4 files
            
          
              10 forks
            
          
              2 comments
            
          
              27 stars
            
          
                hrbrmstr
                / orig.png
            
            
              Last active
              July 16, 2023 06:43
            
              
                Supreme Annotations - moar splainin here: http://rud.is/b/2016/03/16/supreme-annotations/ - NOTE: this requires the github version of ggplot2

	require(ggplot2)
	require(ggthemes)

	require(plyr)
	require(stringr)

	library(lpSolve)

	out <- read.csv("https://raw.githubusercontent.com/flovv/flovv.github.io/master/_Rmd/data/com.csv", sep="\|")