Method Bits 7B 13B 30B 65B 8x7B
Full 16 160GB 320GB 600GB 1200GB 1000GB
Freeze 16 20GB 40GB 120GB 240GB 200GB
LoRA 16 16GB 32GB 80GB 160GB 120GB
QLoRA 8 10GB 16GB 40GB 80GB 80GB
QLoRA 4 6GB 12GB 24GB 48GB 32GB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" | |
pip install "xformers<0.0.26" | |
pip install trl peft accelerate bitsandbytes | |
Thu May 9 04:49:57 2024 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
#Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Input:\n\n\n### Response:\nThe three primary colors are red, blue, and yellow. These colors are called primary because they cannot be created by mixing other colors and all other colors can be made by combining them in various proportions. In the additive color system, used for light, the primary colors are red, green, and blue (RGB). | |
''' | |
from unsloth import FastLanguageModel | |
import torch | |
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! | |
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ | |
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
#Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Input:\n\n\n### Response:\nThe three primary colors are red, blue, and yellow. These colors are called primary because they cannot be created by mixing other colors and all other colors can be made by combining them in various proportions. In the additive color system, used for light, the primary colors are red, green, and blue (RGB). | |
''' | |
from unsloth import FastLanguageModel | |
import torch | |
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! | |
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ | |
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#scrollTo=yqxqAZ7KJ4oL | |
modified 1 epoch unsloth/llama-3-70b-bnb-4bit bs=64 gradAcc=5 | |
Wed May 8 10:51:22 2024 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fsdp-qlora-distributed-llama3.ipynb | |
Expected Memory usage: | |
Full-finetuning with FSDP needs ~16X80GB GPUs | |
FSDP + LoRA needs ~8X80GB GPUs | |
FSDP + Q-Lora needs ~2x40GB GPUs | |
FSDP + Q-Lora + CPU offloading needs 4x24GB GPUs, with 22 GB/GPU and 127 GB CPU RAM with a sequence length of 3072 and a batch size of 1. | |
Tue May 7 20:43:36 2024 | |
+---------------------------------------------------------------------------------------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sun May 5 18:06:21 2024 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | | |
|=========================================+======================+======================| | |
| 0 Tesla V100-SXM2-16GB On | 00000000:00:04.0 Off | 0 | | |
| N/A 51C P0 66W / 300W | 16071MiB / 16384MiB | 0% Default | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thu May 2 20:35:44 2024 | |
+-----------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | | |
|-----------------------------------------+------------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | | |
|=========================================+========================+======================| | |
| 0 NVIDIA GeForce RTX 4070 ... Off | 00000000:01:00.0 Off | N/A | | |
| 0% 52C P2 77W / 285W | 15490MiB / 16376MiB | 99% Default | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
./build/bin/main -m ./models/llama3_alpaca_dpo_GGUF-unsloth.F16.gguf -p '''Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhy is AI like the Industrial Revolution?\n\n### Input:\n\n\n### Response:\n''' -ngl 35 -n 400 -e | |
<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. | |
### Instruction: | |
Why is AI like the Industrial Revolution? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Every 1.0s: nvidia-smi 129-146-124-202: Tue Apr 30 18:21:29 2024 | |
Tue Apr 30 18:21:29 2024 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | | |
|=========================================+======================+======================| |
NewerOlder