Skip to content

Instantly share code, notes, and snippets.

@alkavan
Last active May 10, 2023 17:20
Show Gist options
  • Save alkavan/eda9f5da34ea7134e47065f0c60444e5 to your computer and use it in GitHub Desktop.
Save alkavan/eda9f5da34ea7134e47065f0c60444e5 to your computer and use it in GitHub Desktop.
Server install instructions (Rocky Linux 9) for enthusiastic people who wish to run or train LLaMA models.

Rocky Linux 9 | Chatbot Edition

The following was tested on Google GCP utilizing an a2-highgpu-1g instance and Rocky Linux 9 image.
It has 80GB of RAM, 12 CPU cores, and a single NVIDIA A100 40GB GPU attached.
I also recommand taking 500GB SSD hard drive, it's somewhat more than required, but you might need it.

NOTICE: Make sure you have positive bank balance before trying.

Update the system:

sudo dnf update -y

Install my favorite editor:

sudo dnf install -y nano

Install some basic development tools:

sudo dnf groupinstall "Development Tools"
sudo dnf install python3-pip

NVIDIA Drivers

Next you need to install drivers for your GPU. I am ofcourse using NVIDIA A100-SXM4-40GB but this should work almost for any decent datacenter GPU (NVIDIA).

Add EL9 compatible repository (Fedora):

sudo dnf config-manager --set-enabled crb
sudo dnf install \
  https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm \
  https://dl.fedoraproject.org/pub/epel/epel-next-release-latest-9.noarch.rpm
sudo dnf config-manager --add-repo \
  http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo

Install driver dependencies:

sudo dnf install kernel-headers-$(uname -r) kernel-devel-$(uname -r) tar bzip2 make automake gcc gcc-c++ \
  pciutils elfutils-libelf-devel libglvnd-opengl libglvnd-glx libglvnd-devel acpid pkgconfig dkms

Install NVIDIA GPU driver:

sudo dnf module install nvidia-driver:latest-dkms

Now it's a good time to reboot the system:

reboot

Check the driver installation worked:

nvidia-smi

You should see something like this a second later:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB           Off| 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0               51W / 400W|      0MiB / 40960MiB |     27%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

LLaMA Model Weights

Install support for large files for git:

sudo dnf install git-lfs
git lfs install

Clone LLaMA-13b model weights:

git clone https://huggingface.co/huggyllama/llama-13b

Create Vicuna-13b weights output directory:

mkdir vicuna-13b

FastChat

Clone FastChat repository:

git clone https://github.com/lm-sys/FastChat.git && cd FastChat

Upgrade pip (to enable PEP 660 support):

pip3 install --upgrade pip

Install package dependencies:

pip3 install -e .

Apply delta weights (will download repository):

python3 -m fastchat.model.apply_delta \
  --base-model-path ../llama-13b \
  --target-model-path ../vicuna-13b \
  --delta-path lmsys/vicuna-13b-delta-v1.1

Confirm weights output:

ls -alh ../vicuna-13b/

Run CLI prompt (single GPU):

python3 -m fastchat.serve.cli --model-path ../vicuna-13b

Run web interface

Install tmux for easy running multiple processes:

sudo dnf install -y tmux

Quick tmux Tutorial

To run tmux just type tmux in the shell.
The first window is created automatically.
To create another window ctrl + b then c.
To switch window ctrl + b then w and choose with arrows the window. To detach ctrl + b then d.
To reattach latest session type tmux at in the shell.

Starting controller, worker(s), web interface

Run each of the servers in a different tmux window so you can switch between them and also leave them running in interactive mode after you logout or disconnected.

Start the controller server:

python3 -m fastchat.serve.controller

Start the worker server (can run multiple workers, different models):

python3 -m fastchat.serve.model_worker --model-path ../vicuna-13b/

Add default web interface http port (7860) to firewall:

sudo firewall-cmd --add-port=7860/tcp
sudo firewall-cmd --add-port=7860/tcp --permanent

If you're using Google GCP you probably need to open ingress for port 7860!

Start the GUI web interface:

python3 -m fastchat.serve.gradio_web_server
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment