Skip to content

Instantly share code, notes, and snippets.

View ibombonato's full-sized avatar

Icaro Bombonato ibombonato

View GitHub Profile
@Birch-san
Birch-san / llama-convert.md
Created June 1, 2023 18:24
Converting LLaMA model weights to huggingface format + safetensors

Loading LLaMA via Huggingface + Safetensors, with 4-bit quantization

Let's say we're trying to load a LLaMA model via AutoModelForCausalLM.from_pretrained with 4-bit quantization in order to inference from it:

python -m generate.py

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, LlamaTokenizerFast, LlamaForCausalLM
import transformers
@Birch-san
Birch-san / fine-tuning.md
Last active December 27, 2023 17:24
Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

nvidia-smi said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred initially in the alpaca dataset (~337 token long prompts).
You can get this down to about 10.9GB if (by modifying qlora.py) you run torch.cuda.empty_cache() after PEFT has been applied to your loaded model and before you begin training.

Setup

All instructions are written assuming your command-line shell is bash.

Clone repository:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@flovv
flovv / optimization-lpsolve.R
Last active February 16, 2017 15:56
lpsolve - optimization of a fantasy football team
require(ggplot2)
require(ggthemes)
require(plyr)
require(stringr)
library(lpSolve)
out <- read.csv("https://raw.githubusercontent.com/flovv/flovv.github.io/master/_Rmd/data/com.csv", sep="|")
@hrbrmstr
hrbrmstr / orig.png
Last active July 16, 2023 06:43
Supreme Annotations - moar splainin here: http://rud.is/b/2016/03/16/supreme-annotations/ - NOTE: this requires the github version of ggplot2
orig.png