This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" | |
pip install "xformers<0.0.26" | |
pip install trl peft accelerate bitsandbytes | |
Thu May 9 04:49:57 2024 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
#Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Input:\n\n\n### Response:\nThe three primary colors are red, blue, and yellow. These colors are called primary because they cannot be created by mixing other colors and all other colors can be made by combining them in various proportions. In the additive color system, used for light, the primary colors are red, green, and blue (RGB). | |
''' | |
from unsloth import FastLanguageModel | |
import torch | |
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! | |
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ | |
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#scrollTo=yqxqAZ7KJ4oL | |
modified 1 epoch unsloth/llama-3-70b-bnb-4bit bs=64 gradAcc=5 | |
Wed May 8 10:51:22 2024 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fsdp-qlora-distributed-llama3.ipynb | |
Expected Memory usage: | |
Full-finetuning with FSDP needs ~16X80GB GPUs | |
FSDP + LoRA needs ~8X80GB GPUs | |
FSDP + Q-Lora needs ~2x40GB GPUs | |
FSDP + Q-Lora + CPU offloading needs 4x24GB GPUs, with 22 GB/GPU and 127 GB CPU RAM with a sequence length of 3072 and a batch size of 1. | |
Tue May 7 20:43:36 2024 | |
+---------------------------------------------------------------------------------------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sun May 5 18:06:21 2024 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | | |
|=========================================+======================+======================| | |
| 0 Tesla V100-SXM2-16GB On | 00000000:00:04.0 Off | 0 | | |
| N/A 51C P0 66W / 300W | 16071MiB / 16384MiB | 0% Default | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thu May 2 20:35:44 2024 | |
+-----------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | | |
|-----------------------------------------+------------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | | |
|=========================================+========================+======================| | |
| 0 NVIDIA GeForce RTX 4070 ... Off | 00000000:01:00.0 Off | N/A | | |
| 0% 52C P2 77W / 285W | 15490MiB / 16376MiB | 99% Default | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
./build/bin/main -m ./models/llama3_alpaca_dpo_GGUF-unsloth.F16.gguf -p '''Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhy is AI like the Industrial Revolution?\n\n### Input:\n\n\n### Response:\n''' -ngl 35 -n 400 -e | |
<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. | |
### Instruction: | |
Why is AI like the Industrial Revolution? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Every 1.0s: nvidia-smi 129-146-124-202: Tue Apr 30 18:21:29 2024 | |
Tue Apr 30 18:21:29 2024 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | | |
|=========================================+======================+======================| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(harness) vincent@virus:~/Downloads$ cat bleu_text.py | |
from nltk.translate.bleu_score import sentence_bleu | |
reference = [ | |
'this is a dog'.split(), | |
'it is dog'.split(), | |
'dog it is'.split(), | |
'a dog, it is'.split() | |
] | |
candidate = 'it is dog'.split() | |
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate ))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# requires A100 40GB - 30gb VRAM | |
from transformers import AutoModelForCausalLM, AutoTokenizer | |
device = "cuda" | |
model = AutoModelForCausalLM.from_pretrained("kaist-ai/mistral-orpo-capybara-7k").to(device) | |
tokenizer = AutoTokenizer.from_pretrained("kaist-ai/mistral-orpo-capybara-7k") | |
query = [{'role': 'user', 'content': 'Tell me how AI is like the Industrial Revolution'}] | |
prompt = tokenizer.apply_chat_template(query, tokenize=False, add_generation_prompt=True) | |
inputs = tokenizer (prompt, return_tensors='pt').to(device) |
NewerOlder