Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Last active May 7, 2024 21:28
Show Gist options
  • Save bigsnarfdude/47684d7afc4941c118ad8c0a2d764ca5 to your computer and use it in GitHub Desktop.
Save bigsnarfdude/47684d7afc4941c118ad8c0a2d764ca5 to your computer and use it in GitHub Desktop.
fsdp-qlora-llama3
https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fsdp-qlora-distributed-llama3.ipynb
Expected Memory usage:
Full-finetuning with FSDP needs ~16X80GB GPUs
FSDP + LoRA needs ~8X80GB GPUs
FSDP + Q-Lora needs ~2x40GB GPUs
FSDP + Q-Lora + CPU offloading needs 4x24GB GPUs, with 22 GB/GPU and 127 GB CPU RAM with a sequence length of 3072 and a batch size of 1.
Tue May 7 20:43:36 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:07:00.0 Off | 0 |
| N/A 45C P0 248W / 400W | 20611MiB / 40960MiB | 82% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-40GB On | 00000000:08:00.0 Off | 0 |
| N/A 41C P0 242W / 400W | 20755MiB / 40960MiB | 82% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM4-40GB On | 00000000:09:00.0 Off | 0 |
| N/A 44C P0 235W / 400W | 20253MiB / 40960MiB | 84% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM4-40GB On | 00000000:0A:00.0 Off | 0 |
| N/A 45C P0 252W / 400W | 20755MiB / 40960MiB | 83% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM4-40GB On | 00000000:0B:00.0 Off | 0 |
| N/A 44C P0 205W / 400W | 20253MiB / 40960MiB | 85% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-SXM4-40GB On | 00000000:0C:00.0 Off | 0 |
| N/A 40C P0 249W / 400W | 20253MiB / 40960MiB | 86% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-SXM4-40GB On | 00000000:0D:00.0 Off | 0 |
| N/A 42C P0 253W / 400W | 20253MiB / 40960MiB | 85% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-SXM4-40GB On | 00000000:0E:00.0 Off | 0 |
| N/A 46C P0 273W / 400W | 20611MiB / 40960MiB | 86% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 10795 C /usr/bin/python3 20576MiB |
| 1 N/A N/A 10796 C /usr/bin/python3 20720MiB |
| 2 N/A N/A 10797 C /usr/bin/python3 20218MiB |
| 3 N/A N/A 10798 C /usr/bin/python3 20720MiB |
| 4 N/A N/A 10799 C /usr/bin/python3 20218MiB |
| 5 N/A N/A 10800 C /usr/bin/python3 20218MiB |
| 6 N/A N/A 10801 C /usr/bin/python3 20218MiB |
| 7 N/A N/A 10802 C /usr/bin/python3 20576MiB |
+---------------------------------------------------------------------------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment