This tutorial is based on this AWS tutorial. In this tutorial, we will install Nvidia driver on AWS EC2 instance and compile and run llama.cpp on it.
Here we use g5.4xlarge
instance with Ubuntu 22.04 AMI, which use Nvidia A10G GPU.
First, we need to update the package list and install the latest kernel update.
sudo apt-get update -y
sudo apt-get upgrade -y linux-aws
sudo reboot
Rebuild the grub configuration:
sudo apt-get install -y gcc make linux-headers-$(uname -r)
cat << EOF | sudo tee --append /etc/modprobe.d/blacklist.conf
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
EOF
sudo sed -i 's/GRUB_CMDLINE_LINUX=""/GRUB_CMDLINE_LINUX="rdblacklist=nouveau"/' /etc/default/grub
sudo update-grub
Download Nvidia driver from AWS S3
sudo apt install awscli
aws configure
aws s3 cp --recursive s3://ec2-linux-nvidia-drivers/latest/ .
sudo sh NVIDIA-Linux-x86_64-535.104.05-grid-aws.run
Confirm the driver is installed successfully
nvidia-smi -q | head
Disable GSP and reboot. (More information here)
sudo touch /etc/modprobe.d/nvidia.conf
echo "options nvidia NVreg_EnableGpuFirmware=0" | sudo tee --append /etc/modprobe.d/nvidia.conf
sudo reboot
wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run
sudo sh cuda_12.2.2_535.104.05_linux.run --silent --override --toolkit --samples --toolkitpath=/usr/local/cuda-12 --samplespath=/usr/local/cuda --no-opengl-libs