Birch-san

## llama-convert.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              17 stars
            
          
                Birch-san
                / llama-convert.md
            
            
              Created
              June 1, 2023 18:24
            
              
                Converting LLaMA model weights to huggingface format + safetensors
              
          
    Loading LLaMA via Huggingface + Safetensors, with 4-bit quantization

Let's say we're trying to load a LLaMA model via AutoModelForCausalLM.from_pretrained with 4-bit quantization in order to inference from it:
python -m generate.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, LlamaTokenizerFast, LlamaForCausalLM
import transformers

  
## fine-tuning.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              27 stars
            
          
                Birch-san
                / fine-tuning.md
            
            
              Last active
              December 27, 2023 17:24
            
              
                Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization
              
          
    Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

nvidia-smi said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred initially in the alpaca dataset (~337 token long prompts).

You can get this down to about 10.9GB if (by modifying qlora.py) you run torch.cuda.empty_cache() after PEFT has been applied to your loaded model and before you begin training.
Setup

All instructions are written assuming your command-line shell is bash.
Clone repository:

  
## gist:daf94f0dd0fc4b87ad530db6f77b6a55
# for i, x in model.named_parameters():
#   print(i)
transformer.word_embeddings.weight
transformer.h.0.ln_attn.weight
transformer.h.0.ln_attn.bias
transformer.h.0.ln_mlp.weight
transformer.h.0.ln_mlp.bias
transformer.h.0.self_attention.query_key_value.weight
transformer.h.0.self_attention.dense.weight
transformer.h.0.mlp.dense_h_to_4h.weight

## opencv-cuda.md

      
              1 file
            
          
              1 fork
            
          
              1 comment
            
          
              4 stars
            
          
                Birch-san
                / opencv-cuda.md
            
            
              Last active
              May 4, 2024 23:58
            
              
                Building OpenCV with CUDA acceleration
              
          
    For CUDA 12, see Installing CUDA 12.1.1 + PyTorch nightly + Python 3.10 on Ubuntu 22.10 for how to install Nvidia driver 530, gcc 12 and CUDA 12.1.1 libraries.

If you want CUDA 11.8, then you can use latest Nvidia driver from Production branch, 525, with gcc 11.
Activate your conda environment, if you haven't done so already.
CUDA 11:

Make sure gcc 11 is the default gcc for your OS, or select gcc 11 explicitly.

CUDA 12:

Make sure gcc 12 is the default gcc for your OS, or select gcc 12 explicitly.

Check CUDA_DIR below points to the CUDA installation you wish to use.

  
## magma-readme.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              2 stars
            
          
                Birch-san
                / magma-readme.md
            
            
              Created
              April 27, 2023 21:58
            
              
                Build magma from source
              
          
    I wrote these instructions as part of "installing PyTorch with CUDA 12.1.1".

I extracted them into this separate gist, because I realised there's a much easier way to install magma for CUDA 12.1.1:

https://anaconda.org/pytorch/magma-cuda121
There's a conda package!
conda install -c pytorch magma-cuda121

  
## CUDA-12-1-1-pytorch.md

      
              1 file
            
          
              8 forks
            
          
              7 comments
            
          
              31 stars
            
          
                Birch-san
                / CUDA-12-1-1-pytorch.md
            
            
              Last active
              April 28, 2024 10:22
            
              
                Installing CUDA 12.1.1 + PyTorch nightly + Python 3.10 on Ubuntu 22.10
              
          
    Installing CUDA 12.1.1 + PyTorch nightly + Python 3.10 on Ubuntu 22.10

Should you keep your NVIDIA driver?

CUDA 12.1.1 toolkit is gonna offer to install Nvidia driver 530 for us. It's from New Feature branch. It's likely to be newer than the default Nvidia driver you would've installed via apt-get (apt would prefer to give you 525, i.e. Production Branch).
If you're confident that you already have a new enough Nvidia driver for CUDA 12.1.1, and you'd like to keep your driver: feel free to skip this "uninstall driver" step.
But if you're not sure, or you know your driver is too old: let's uninstall it. CUDA will install a new driver for us later.

  
## attn_scores_buffer.py
float_width=2 # float16
cond_count=2 # uncond and cond for 1 sample
attn_heads=8 # SD1.5 isn't optimized for flash attn, so all layers have 8 heads, lol
vae_scale_factor=8
px_height=px_width=768
latent_height=px_height/vae_scale_factor
latent_width=px_width/vae_scale_factor
q_proj_tokens=k_proj_tokens=latent_height*latent_width
qk_bytes = cond_count*attn_heads*float_width*q_proj_tokens*k_proj_tokens
qk_mb = qk_bytes/1024**2

## attn_scores.py
from torch import FloatTensor, load, baddbmm, zeros
from dataclasses import dataclass
import torch
from os.path import join

@dataclass
class Fixtures:
  q_proj: FloatTensor
  k_proj: FloatTensor

## topk_softmax_denominator.py
from torch import FloatTensor

vae_scale_factor = 8
typical_self_attn_key_length = (512/vae_scale_factor) * (512/vae_scale_factor)
desired_self_attn_key_length = (768/vae_scale_factor) * (768/vae_scale_factor)

key_length_factor=desired_self_attn_key_length/typical_self_attn_key_length if is_self_attn else 1.

def softmax(x: FloatTensor, dim=-1) -> FloatTensor:
    maxes = x.max(dim, keepdim=True).values

## scaled_softmax.py
from torch import FloatTensor

vae_scale_factor = 8
typical_self_attn_key_length = (512/vae_scale_factor) * (512/vae_scale_factor)
desired_self_attn_key_length = (200/vae_scale_factor) * (200/vae_scale_factor)

key_length_factor=desired_self_attn_key_length/typical_self_attn_key_length if is_self_attn else 1.

def softmax(x: FloatTensor, dim=-1) -> FloatTensor:
    key_tokens = x.size(-1)
	# for i, x in model.named_parameters():
	# print(i)
	transformer.word_embeddings.weight
	transformer.h.0.ln_attn.weight
	transformer.h.0.ln_attn.bias
	transformer.h.0.ln_mlp.weight
	transformer.h.0.ln_mlp.bias
	transformer.h.0.self_attention.query_key_value.weight
	transformer.h.0.self_attention.dense.weight
	transformer.h.0.mlp.dense_h_to_4h.weight
	float_width=2 # float16
	cond_count=2 # uncond and cond for 1 sample
	attn_heads=8 # SD1.5 isn't optimized for flash attn, so all layers have 8 heads, lol
	vae_scale_factor=8
	px_height=px_width=768
	latent_height=px_height/vae_scale_factor
	latent_width=px_width/vae_scale_factor
	q_proj_tokens=k_proj_tokens=latent_height*latent_width
	qk_bytes = cond_countattn_headsfloat_widthq_proj_tokensk_proj_tokens
	qk_mb = qk_bytes/1024**2
	from torch import FloatTensor, load, baddbmm, zeros
	from dataclasses import dataclass
	import torch
	from os.path import join

	@dataclass
	class Fixtures:
	q_proj: FloatTensor
	k_proj: FloatTensor
	from torch import FloatTensor

	vae_scale_factor = 8
	typical_self_attn_key_length = (512/vae_scale_factor) * (512/vae_scale_factor)
	desired_self_attn_key_length = (768/vae_scale_factor) * (768/vae_scale_factor)

	key_length_factor=desired_self_attn_key_length/typical_self_attn_key_length if is_self_attn else 1.

	def softmax(x: FloatTensor, dim=-1) -> FloatTensor:
	maxes = x.max(dim, keepdim=True).values