Skip to content

Instantly share code, notes, and snippets.

@kralicky
Last active March 19, 2024 16:37
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save kralicky/0f9994526eac7ddc1808bcbfea6a8444 to your computer and use it in GitHub Desktop.
Save kralicky/0f9994526eac7ddc1808bcbfea6a8444 to your computer and use it in GitHub Desktop.
Harvester GPU Provisioning

Harvester GPU Provisioning

  1. Install Harvester, then SSH into the server.

  2. Edit /boot/grub/grub.cfg as follows:

 set default=0
 set timeout=10
 
 set gfxmode=auto
 set gfxpayload=keep
 insmod all_video
 insmod gfxterm
 
 menuentry "Start Harvester" {
   search.fs_label HARVESTER_STATE root
   set sqfile=/k3os/system/kernel/current/kernel.squashfs
   loopback loop0 /$sqfile
   set root=($root)
-  linux (loop0)/vmlinuz printk.devkmsg=on console=tty1
+  linux (loop0)/vmlinuz printk.devkmsg=on intel_iommu=on modprobe.blacklist=nouveau pci=noaer
   initrd /k3os/system/kernel/current/initrd
 }
  • intel_iommu=on: enables intel IOMMU support. For AMD, use amd_iommu=on
  • modprobe.blacklist=nouveau: Disable the nouveau driver. We will configure the vfio-pci driver instead later.
  • pci=noaer: Prevents some issues related to USB device passthrough

Reboot.

  1. Find the PCI Device IDs for your GPU and any other devices that may be in the same IOMMU group.
$ kubectl run -it --privileged --image ubuntu <pod name>
=> $ apt update && apt install pciutils
=> $ lspci -nnk -d 10de: # colon at the end is required

Note that "10de" is nvidia's vendor ID. One or more devices may be shown. For example:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117GLM [Quadro T1000 Mobile] [10de:1fb9] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)

If multiple devices are shown, they may be grouped together in the same card and will all need to be configured for PCIe passthrough.

  1. Get the IDs of the devices. In this case, 10de:1fb9,10de:10fa. Edit /boot/grub/grub.cfg as follows:
 set default=0
 set timeout=10
 
 set gfxmode=auto
 set gfxpayload=keep
 insmod all_video
 insmod gfxterm
 
 menuentry "Start Harvester" {
   search.fs_label HARVESTER_STATE root
   set sqfile=/k3os/system/kernel/current/kernel.squashfs
   loopback loop0 /$sqfile
   set root=($root)
-  linux (loop0)/vmlinuz printk.devkmsg=on intel_iommu=on modprobe.blacklist=nouveau pci=noaer
+  linux (loop0)/vmlinuz printk.devkmsg=on intel_iommu=on modprobe.blacklist=nouveau vfio-pci.ids=10de:1fb9,10de:10fa pci=noaer
   initrd /k3os/system/kernel/current/initrd
 }

This configuration tells the kernel to use the vfio-pci drivers for these devices.

Reboot.

  1. Verify the devices are using the correct driver:
$ kubectl run -it --privileged --image ubuntu <pod name>
=> $ apt update && apt install pciutils
=> $ lspci -nnk -d 10de:

If configured correctly, you should see Kernel driver in use: vfio-pci

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117GLM [Quadro T1000 Mobile] [10de:1fb9] (rev a1)
	Kernel driver in use: vfio-pci
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
	Kernel driver in use: vfio-pci
  1. Install the nvidia kubevirt gpu device plugin

$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/kubevirt-gpu-device-plugin/master/manifests/nvidia-kubevirt-gpu-device-plugin.yaml

  1. Check the log output: $ kubectl -n kube-system logs nvidia-kubevirt-gpu-dp-daemonset-xxxxx

You should see the following:

2021/07/19 15:52:28 Not a device, continuing
2021/07/19 15:52:28 Nvidia device  0000:01:00.0
2021/07/19 15:52:28 Iommu Group 1
2021/07/19 15:52:28 Device Id 1fb9
2021/07/19 15:52:28 Nvidia device  0000:01:00.1
2021/07/19 15:52:28 Iommu Group 1
2021/07/19 15:52:28 Error accessing file path "/sys/bus/mdev/devices": lstat /sys/bus/mdev/devices: no such file or directory
2021/07/19 15:52:28 Iommu Map map[1:[{0000:01:00.0} {0000:01:00.1}]]
2021/07/19 15:52:28 Device Map map[1fb9:[1]]
2021/07/19 15:52:28 vGPU Map  map[]
2021/07/19 15:52:28 GPU vGPU Map  map[]
2021/07/19 15:52:28 DP Name TU117GLM_Quadro_T1000_Mobile
2021/07/19 15:52:28 Devicename TU117GLM_Quadro_T1000_Mobile
2021/07/19 15:52:28 TU117GLM_Quadro_T1000_Mobile Device plugin server ready

Copy the device plugin name, in this example it is TU117GLM_Quadro_T1000_Mobile

  1. At the time of writing, Harvester does not have a UI for provisioning GPUs, so we will need to edit the YAML for a virtual machine. Create a VM instance and stop it. Then, edit its yaml as follows:
...
    spec:
      domain:
        cpu:
          cores: 4
          sockets: 1
          threads: 1
        devices:
          disks:
          - disk:
              bus: virtio
            name: disk-0
          - bootOrder: 1
            disk:
              bus: virtio
            name: disk-1
+         gpus:
+         - deviceName: nvidia.com/TU117GLM_Quadro_T1000_Mobile
+           name: gpu1
...

(Replace the part after nvidia.com/ with your device name)

  1. Start the VM. If configured correctly, you should see the following output from the kubevirt gpu plugin pod:
2021/07/19 15:53:08 In allocate
2021/07/19 15:53:08 Allocated devices [0000:01:00.0 0000:01:00.1]

If the VM fails to start, check kubevirt logs. If you see errors such as: Please ensure all devices within the iommu_group are bound to their vfio bus driver., this means there are other devices in the same IOMMU group as your GPU which also need to be configured with the vfio-pci driver. Edit the kernel cmdline to include these devices and reboot.

@czadikem
Copy link

What has changed now that they are using grub2?

@kralicky
Copy link
Author

@czadikem The kernel parameters should not be any different but if /etc/default/grub is no longer read-only you should edit them there and use grub2-mkconfig instead.

@czadikem
Copy link

Any idea why Harvestor when I ssh as root that is deletes any of the files of grub I edit? I am unable to do grub2-mkconfig as I get ""/usr/sbin/grub2-probe: error: failed to get canonical path of `overlay'."

@kralicky
Copy link
Author

@mirceanton
Copy link

I followed the steps presented in the documentation here:

  1. Mount the state dir in rw mode
mount -o remount,rw /dev/sda3 /run/initramfs/cos-state
  1. Edited the /run/initramfs/cos-state/grub2/grub.cfg file to contain:
# ...
set gfxmode=auto
set gfxpayload=keep
insmod all_video
insmod all_video
insmod gfxterm
insmod loopback
insmod squash4

menuentry "${display_name}" --id cos {
  search --no-floppy --no-floppy --label --set=root COS_STATE
  set img=/cOS/active.img
  set label=COS_ACTIVE
  loopback loop0 /$img
  set root=($root)
  source (loop0)/etc/cos/bootargs.cfg
  linux (loop0)$kernel $kernelcmd ${extra_cmdline} ${extra_active_cmdline} intel_iommu=on modprobe.blacklist=nouveau vfio-pci.ids=10de:0fb9,10de:1c81,10de:1ad9,10de:1ad8,10de:10f8,10de:1e84 pci=noaer
  initrd (loop0)$initramfs
}
# ...
  1. Reboot

  2. SSH in and sudo su to get root access

# lspci -k -d 10de:
65:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device 3ffc
65:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device 3ffc
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel
65:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device 3ffc
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
65:00.3 Serial bus controller: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device 3ffc
b4:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050] (rev a1)
        Subsystem: ASUSTeK Computer Inc. Device 85d7
b4:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
        Subsystem: ASUSTeK Computer Inc. Device 85d7
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel

Which is the same output I had even before configuring anything.

I did try to install the nvidia gpu plugin, just in case, and the logs say:

2022/06/24 11:45:02 Not a device, continuing
2022/06/24 11:45:02 Nvidia device  0000:65:00.0
ERROR: logging before flag.Parse: E0624 11:45:02.561376       1 device_plugin.go:257] Could not read link driver for device 0000:65:00.0: readlink /sys/bus/pci/devices/0000:65:00.0/driver: no such file or directory
2022/06/24 11:45:02 Could not get driver for device  0000:65:00.0
2022/06/24 11:45:02 Nvidia device  0000:65:00.1
2022/06/24 11:45:02 Nvidia device  0000:65:00.2
2022/06/24 11:45:02 Nvidia device  0000:65:00.3
ERROR: logging before flag.Parse: E0624 11:45:02.561505       1 device_plugin.go:257] Could not read link driver for device 0000:65:00.3: readlink /sys/bus/pci/devices/0000:65:00.3/driver: no such file or directory
2022/06/24 11:45:02 Could not get driver for device  0000:65:00.3
2022/06/24 11:45:02 Nvidia device  0000:b4:00.0
ERROR: logging before flag.Parse: E0624 11:45:02.561879       1 device_plugin.go:257] Could not read link driver for device 0000:b4:00.0: readlink /sys/bus/pci/devices/0000:b4:00.0/driver: no such file or directory
2022/06/24 11:45:02 Could not get driver for device  0000:b4:00.0
2022/06/24 11:45:02 Nvidia device  0000:b4:00.1
2022/06/24 11:45:02 Error accessing file path "/sys/bus/mdev/devices": lstat /sys/bus/mdev/devices: no such file or directory
2022/06/24 11:45:02 Iommu Map map[]
2022/06/24 11:45:02 Device Map map[]
2022/06/24 11:45:02 vGPU Map  map[]
2022/06/24 11:45:02 GPU vGPU Map  map[]

Which is honestly what I would expect... driver issues, since I can't seem to get the vfio-pci driver to load.

Any ideas what I am doing wrong, or if anything has dramatically changed since writing this guide?

@kralicky
Copy link
Author

@mirceanton This guide is pretty old (written before harvester 1.0), and I'm not a maintainer of Harvester so I can't guarantee that this still works. Some things I would try:

  • Check /proc/cmdline to see if your kernel parameters were actually applied
  • Check to see if vfio-pci.ko exists in /usr/lib/modules
  • Try running modprobe vfio-pci while the gpu is not bound to the nvidia driver, and see if it binds to vfio_pci.

@mirceanton
Copy link

Managed to find a solution. The process has substantially changed up until step 5. Will test and post an update.

@larkinwc
Copy link

larkinwc commented Jun 30, 2022

Managed to find a solution. The process has substantially changed up until step 5. Will test and post an update.

Would love to hear what steps need to be made to make this function

@mirceanton
Copy link

Okay so, the solution i managed to find only works partially. The gpu would throw a code 43 error every time i reboot the vm and i would have to reinstall the drivers. Since this was a time-constrained project, i moved back to proxmox (for now) and decided to give this another shot in the next release in september since their github states they will add better support for hardware passthrough then.

That being said, what I had to do was to implement this guide while also keeping a close eye on this reddit post. The trick is, the file system seems to be ephemeral, in the sense that it is re-generated on every boot, like a container. As such, what you have to do is to configure all your files and settings in the /oem/99-.... file. The format seems similar to a cloud init.

Truth be told, i haven't saved my configuration from last time since i didn't get it working, but what i remember is that i had to manually add more entries to the write_files section to do the driver blacklist and configure everything in those guides. Additionally, I had to do some softdep for the drivers on some gpu devices, such as the audio controller, to make sure the default audio driver to vfio-pci as well as manually override drivers with commands such as:

#!/bin/bash

# unbind from drivers
echo 0000:0a:00.0 > /sys/bus/pci/devices/0000\:0a\:00.0/driver/unbind
echo 0000:0a:00.1 > /sys/bus/pci/devices/0000\:0a\:00.1/driver/unbind

# bind to vfio
echo 0000:0a:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
echo 0000:0a:00.1 > /sys/bus/pci/drivers/vfio-pci/bind

Sorry I can't be of more service right now, but I hope at least i am pointing you in the right direction.

Do let me know if you manage to get it working though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment