The following was tested on Google GCP utilizing an a2-highgpu-1g
instance and Rocky Linux 9 image.
It has 80GB of RAM, 12 CPU cores, and a single NVIDIA A100 40GB GPU attached.
I also recommand taking 500GB SSD hard drive, it's somewhat more than required, but you might need it.
NOTICE: Make sure you have positive bank balance before trying.
Update the system:
sudo dnf update -y
Install my favorite editor:
sudo dnf install -y nano
Install some basic development tools:
sudo dnf groupinstall "Development Tools"
sudo dnf install python3-pip
Next you need to install drivers for your GPU. I am ofcourse using NVIDIA A100-SXM4-40GB
but this should work almost for any decent datacenter GPU (NVIDIA).
Add EL9 compatible repository (Fedora):
sudo dnf config-manager --set-enabled crb
sudo dnf install \
https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm \
https://dl.fedoraproject.org/pub/epel/epel-next-release-latest-9.noarch.rpm
sudo dnf config-manager --add-repo \
http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo
Install driver dependencies:
sudo dnf install kernel-headers-$(uname -r) kernel-devel-$(uname -r) tar bzip2 make automake gcc gcc-c++ \
pciutils elfutils-libelf-devel libglvnd-opengl libglvnd-glx libglvnd-devel acpid pkgconfig dkms
Install NVIDIA GPU driver:
sudo dnf module install nvidia-driver:latest-dkms
Now it's a good time to reboot the system:
reboot
Check the driver installation worked:
nvidia-smi
You should see something like this a second later:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-40GB Off| 00000000:00:04.0 Off | 0 |
| N/A 35C P0 51W / 400W| 0MiB / 40960MiB | 27% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Install support for large files for git:
sudo dnf install git-lfs
git lfs install
Clone LLaMA-13b model weights:
git clone https://huggingface.co/huggyllama/llama-13b
Create Vicuna-13b weights output directory:
mkdir vicuna-13b
Clone FastChat repository:
git clone https://github.com/lm-sys/FastChat.git && cd FastChat
Upgrade pip
(to enable PEP 660 support):
pip3 install --upgrade pip
Install package dependencies:
pip3 install -e .
Apply delta weights (will download repository):
python3 -m fastchat.model.apply_delta \
--base-model-path ../llama-13b \
--target-model-path ../vicuna-13b \
--delta-path lmsys/vicuna-13b-delta-v1.1
Confirm weights output:
ls -alh ../vicuna-13b/
python3 -m fastchat.serve.cli --model-path ../vicuna-13b
Install tmux
for easy running multiple processes:
sudo dnf install -y tmux
To run tmux just type tmux
in the shell.
The first window is created automatically.
To create another window ctrl + b
then c
.
To switch window ctrl + b
then w
and choose with arrows the window.
To detach ctrl + b
then d
.
To reattach latest session type tmux at
in the shell.
Run each of the servers in a different tmux
window so you can switch between
them and also leave them running in interactive mode after you logout or disconnected.
Start the controller server:
python3 -m fastchat.serve.controller
Start the worker server (can run multiple workers, different models):
python3 -m fastchat.serve.model_worker --model-path ../vicuna-13b/
Add default web interface http port (7860
) to firewall:
sudo firewall-cmd --add-port=7860/tcp
sudo firewall-cmd --add-port=7860/tcp --permanent
If you're using Google GCP you probably need to open ingress for port 7860!
Start the GUI web interface:
python3 -m fastchat.serve.gradio_web_server