We have an A5000 board,
I made the following configuration for nvidia-smi gpu command to work on host:
!!! Disclaimer !!!
The information below is not my own. I found all the commands already ready on the internet, read carefully all the steps before executing, researched each one of them before, what it does, its type of hardware, and its current configuration if supported, and backed up first.
This tutorial is incomplete, I just made the GPU configuration on the host to make it work, but the mdevctl types command that displays the profiles doesn’t return anything, according to other tutorials I found it should return, that’s all I got so far.
Make sure to add the community pve repo and get rid of the enterprise repo (you can skip this step if you have a valid enterprise subscription)
echo "deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription" >> /etc/apt/sources.list
rm /etc/apt/sources.list.d/pve-enterprise.list
Update and upgrade
apt update
apt dist-upgrade
apt update && apt upgrade -y
apt install -y build-essential pve-headers-`uname -r` dkms jq cargo mdevctl unzip uuid
https://pve.proxmox.com/wiki/Pci_passthrough
nano /etc/default/grub
For Intel CPU’s edit this line
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
For AMD CPU’s edit this line
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
Save file and update grub
update-grub
The kernel parameters have to be appended to the commandline in the file /etc/kernel/cmdline, so open that in your favorite editor:
nano /etc/kernel/cmdline
On a clean installation the file might look similar to this:
root=ZFS=rpool/ROOT/pve-1 boot=zfs
On Intel systems, append this at the end
intel_iommu=on iommu=pt
For AMD, use this
amd_iommu=on iommu=pt
After editing the file, it should look similar to this
root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt
Now, save and exit from the editor using Ctrl+O and then Ctrl+X and then apply your changes:
proxmox-boot-tool refresh
/etc/modules-load.d/modules.conf
Insert these lines
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Create a couple of files in modprobe.d
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
Update initramfs
update-initramfs -u -k all
Reboot Proxmox
reboot
And verify that IOMMU is enabled
dmesg | grep -e DMAR -e IOMMU
Example output
[ 1.121863] pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.121888] pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.121906] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.121927] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.148549] pci 0000:c0:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 1.148566] pci 0000:80:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 1.148575] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 1.148582] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 1.150154] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 1.150162] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[ 1.150170] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[ 1.150180] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).
As of the time of this writing (August 2022), the latest available GRID driver is 14.2 with vGPU driver version 510.85.03. You can check for the latest version here. I cannot guarantee that newer versions would work, this tutorial only covers 14.2 (510.85.03).
The file you are looking for is called NVIDIA-GRID-Linux-KVM-510.85.03-510.85.02-513.46.zip
, you can get it from the download portal by downloading version 14.2 for Linux KVM
.
After downloading, extract that and copy the file NVIDIA-Linux-x86_64-510.85.03-vgpu-kvm.run
to your Proxmox host into the /root/
folder
chmod +x ./NVIDIA-Linux-x86_64-510.85.03-vgpu-kvm.run
./NVIDIA-Linux-x86_64-510.85.03-vgpu-kvm.run --dkms
The installer will ask you
Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later.
, answer withYes
.
Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 510.85.03) is now complete.
Click Ok
to exit the installer.
To finish the installation, reboot.
reboot
Wait for your server to reboot, then type this into the shell to check if the driver install worked
nvidia-smi
You should get an output similar to this one
Sat Aug 13 12:40:00 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.03 Driver Version: 510.85.03 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A5000 On | 00000000:41:00.0 Off | Off |
| 30% 33C P8 24W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
To test the nvidia-smi command:
nvidia-smi vgpu
Example output
Sat Aug 13 12:38:01 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.03 Driver Version: 510.85.03 |
|---------------------------------+------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|=================================+==============================+============|
| 0 NVIDIA RTX A5000 | 00000000:41:00.0 | 0% |
+---------------------------------+------------------------------+------------+
nvidia-smi vgpu -s
GPU 00000000:41:00.0
NVIDIA RTXA5000-1B
NVIDIA RTXA5000-2B
NVIDIA RTXA5000-1Q
NVIDIA RTXA5000-2Q
NVIDIA RTXA5000-3Q
NVIDIA RTXA5000-4Q
NVIDIA RTXA5000-6Q
NVIDIA RTXA5000-8Q
NVIDIA RTXA5000-12Q
NVIDIA RTXA5000-24Q
NVIDIA RTXA5000-1A
NVIDIA RTXA5000-2A
NVIDIA RTXA5000-3A
NVIDIA RTXA5000-4A
NVIDIA RTXA5000-6A
NVIDIA RTXA5000-8A
NVIDIA RTXA5000-12A
NVIDIA RTXA5000-24A
NVIDIA RTXA5000-4C
NVIDIA RTXA5000-6C
NVIDIA RTXA5000-8C
NVIDIA RTXA5000-12C
NVIDIA RTXA5000-24C
Here you see a lot of different types your GPU offers and are split up into 4 distinct types
Type Intended purpose
A Virtual Applications (vApps) B Virtual Desktops (vPC) C AI/Machine Learning/Training (vCS or vWS) Q Virtual Workstations (vWS)
The type Q profile is most likely the type you want to use since it enables the possibility to fully utilize the GPU using a remote desktop (eg Parsec). The next step is selecting the right Q profile for your GPU. This is highly dependent on the available VRAM your GPU offers. So my RTX A5000 has 24GB of VRAM and i want to create 6 vGPU’s. Then i would choose a profile that has 4GB of VRAM. (24GB / 6 vGPU’s = 4GB). RTXA5000-4Q
the command nvidia-smi
works normally now.
...
There is only one thing you have to do from the commandline: Open the VM config file and give the VM a uuid. For that you need your VM ID, in this example I'm using 100.
nano /etc/pve/qemu-server/<VM-ID>.conf
So with the VM ID 100, I have to do this:
nano /etc/pve/qemu-server/100.conf
In that file, you have to add a new line at the end:
args: -uuid 00000000-0000-0000-0000-00000000XXX
You have to replace XXXX with your VM ID. With my 100 ID I have to use this line:
args: -uuid 00000000-0000-0000-0000-00000000100
Save and exit from the editor. Thats all you have to do from the terminal.