kaapstorm/gpu_passthrough_on_saitama.rst

## gpu_passthrough_on_saitama.rst

      
    Raw
  

              gpu_passthrough_on_saitama.rst
            
          
    GPU Passthrough on saitama


Preamble

saitama has an AMD socket AM4 B350 motherboard. It does not have an
integrated graphics device. It has two NVIDIA cards plugged into PCIe
ports. The worse of the two cards is in the first port, making it the
primary card. The better is in the fourth port. As a result the better
card has a slower port, but this was found to be necessary; if the
primary card was passed through, the host could be configured not to
take it, but the VM BIOS and OS would not detect it. Ubuntu worked as a
guest OS, but Windows did not.

Prepare the host


If necessary, enable IOMMU in the BIOS.

Edit /etc/default/grub. Append "iommu=pt iommu=1" to GRUB_CMDLINE_LINUX_DEFAULT [1]:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt iommu=1"


Update grub:
$ sudo update-grub


Reboot

Identify the PCIe bus that the GPU we're passing through is on:
$ lspci -nnk
...
1b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107 [GeForce GTX 750] [10de:1381] (rev a2)
 Subsystem: Gigabyte Technology Co., Ltd GM107 [GeForce GTX 750] [1458:362e]
 Kernel driver in use: nvidia
 Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
1b:00.1 Audio device [0403]: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] [10de:0fbc] (rev a1)
 Subsystem: Gigabyte Technology Co., Ltd GM107 High Definition Audio Controller [GeForce 940MX] [1458:362e]
 Kernel driver in use: snd_hda_intel
 Kernel modules: snd_hda_intel
...


Check what group the graphics card for Windows is in:
$ find /sys/kernel/iommu_groups/ -type l


Check that only devices that you want to pass through to Windows are
in the IOMMU group. In the case of saitama, that includes a graphics
card, and also a USB controller, a PCIe controller, a SATA controller
and an Ethernet controller. At first this seems like a poor choice.
But the alternatives turn out to be worse; passing the primary
graphics card through to the guest OS can be done by disabling the
EFI framebuffer, but it seemed that neither the BIOS of the guest VM
nor the guest OS could detect the card. Ubuntu worked as a guest OS,
but Windows did not, and the OVMF/TianoCore BIOS could not show a
graphical splash screen. Installing a prepared kernel that had been
patched to allow ACS Override also failed. The option of maintaining
a self-compiled kernel seemed less attractive. The benefit of passing
through a USB, SATA and Ethernet controllers is that Windows gets a
complete set of near-native-speed devices.
$ ls -lhA /sys/bus/pci/devices/0000\:1b\:00.0/iommu_group/devices/
total 0
... 0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:03:00.0
... 0000:03:00.1 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:03:00.1
... 0000:03:00.2 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:03:00.2
... 0000:16:00.0 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:16:00.0
... 0000:16:01.0 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:16:01.0
... 0000:16:04.0 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:16:04.0
... 0000:18:00.0 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:16:01.0/0000:18:00.0
... 0000:1b:00.0 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:16:04.0/0000:1b:00.0
... 0000:1b:00.1 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:16:04.0/0000:1b:00.1


Blacklist the GPU we're passing through to the VM so that the graphics driver can't grab it. We use the pci-stub module to claim the card before nvidia or nouveau can. Add "pci-stub" to /etc/initramfs-tools/modules:
$ echo "pci-stub" | sudo tee -a /etc/initramfs-tools/modules


Pass pci-stub the IDs of the graphics controller and the audio device (as found with lspci -nnk above). Set pci-stub as a dependency for drm, otherwise the graphics driver will be loaded before the pci-stub driver. (Check dmesg to see when this happens.)
$ sudo vim /lib/modprobe.d/pci-stub.conf
options pci-stub ids=10de:1381,10de:0fbc
softdep drm pre: pci-stub


Update the existing initramfs image:
$ sudo update-initramfs -u


Reboot

Confirm that pci-stub claimed the devices:
$ lspci -nnk
...
1b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107 [GeForce GTX 750] [10de:1381] (rev a2)
        Subsystem: Gigabyte Technology Co., Ltd GM107 [GeForce GTX 750] [1458:362e]
        Kernel driver in use: pci-stub
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
1b:00.1 Audio device [0403]: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] [10de:0fbc] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd GM107 High Definition Audio Controller [GeForce 940MX] [1458:362e]
        Kernel driver in use: pci-stub
        Kernel modules: snd_hda_intel


[1] https://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM


Create scripts to bind devices


Create a script to bind passthrough devices to vfio-pci:
$ vim bind-vfio-pci
#!/bin/sh

modprobe vfio-pci

# Bind vfio-pci to the USB controller
echo '0000:03:00.0' | tee /sys/bus/pci/drivers/xhci_hcd/unbind
echo '0000:03:00.0' | tee /sys/bus/pci/drivers/vfio-pci/bind

# Bind vfio-pci to the SATA controller
echo '0000:03:00.1' | tee /sys/bus/pci/drivers/ahci/unbind
echo '0000:03:00.1' | tee /sys/bus/pci/drivers/vfio-pci/bind

# vfio-pci does not support bridges. Just unbind it from the host.
echo '0000:03:00.2' | tee /sys/bus/pci/drivers/pcieport/unbind
echo '0000:16:00.0' | tee /sys/bus/pci/drivers/pcieport/unbind
echo '0000:16:01.0' | tee /sys/bus/pci/drivers/pcieport/unbind
echo '0000:16:04.0' | tee /sys/bus/pci/drivers/pcieport/unbind

$ chmod +x bind-vfio-pci


Create another script to return devices to their normal drivers:
$ vim unbind-vfio-pci
#!/bin/sh

echo '0000:03:00.0' > /sys/bus/pci/drivers/vfio-pci/unbind
echo '0000:03:00.0' > /sys/bus/pci/drivers/xhci_hcd/bind

echo '0000:03:00.1' > /sys/bus/pci/drivers/vfio-pci/unbind
echo '0000:03:00.1' > /sys/bus/pci/drivers/ahci/bind

echo '0000:03:00.2' > /sys/bus/pci/drivers/pcieport/bind
echo '0000:16:00.0' > /sys/bus/pci/drivers/pcieport/bind
echo '0000:16:01.0' > /sys/bus/pci/drivers/pcieport/bind
echo '0000:16:04.0' > /sys/bus/pci/drivers/pcieport/bind

$ chmod +x unbind-vfio-pci


Create script for Windows VM


Install QEMU, KVM, and the OVMF UEFI BIOS:
$ sudo apt-get install qemu-kvm ovmf


Copy the OVMF variables image to support UEFI variables:
$ cp /usr/share/OVMF/OVMF_VARS.fd ovmf_vars.fd


Create a script for your VM.

Here is the script. We will unpack it next.
   $ vim windows
   #!/bin/sh

   USB_DEVICE=03:00.0
   SATA_DEVICE=03:00.1
   ETH_DEVICE=18:00.0
   GPU_VIDEO=1b:00.0
   GPU_AUDIO=1b:00.1

   ./bind-vfio-pci

   qemu-system-x86_64 \
       -enable-kvm \
       -monitor stdio \
       -name win10 \
       \
       -machine type=q35,accel=kvm,kernel_irqchip=on \
       -cpu EPYC,kvm=off,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_vendor_id=NoFortyThree \
       -m 4G \
       -net none \
       -usb \
       -vga none \
       \
       -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd \
       -drive if=pflash,format=raw,file=ovmf_vars.fd \
       \
       -device vfio-pci,host=$USB_DEVICE \
       -device vfio-pci,host=$SATA_DEVICE,rombar=0 \
       -device vfio-pci,host=$ETH_DEVICE \
       -device vfio-pci,host=$GPU_VIDEO,multifunction=on,x-vga=on \
       -device vfio-pci,host=$GPU_AUDIO \

   ./unbind-vfio-pci

$ chmod +x windows


For convenience we set GPU_DEVICE to the device ID of the
graphics card to be passed through.

We chose -machine type=q35 to use a PCIe bus.

We use -cpu EPYC because -cpu host causes Windows to keep
rebooting, and -cpu EPYC best models the features of the
hosts's Ryzen 5 CPU.

We told the VM to hide the fact that we are using KVM with
kvm=off. This is to avoid a "bug" in NVIDIA's driver for
Windows.

We enabled Hyper-V enlightenments with
hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time.

We are using drive interface if=pflash for the BIOS to support
UEFI variables.


Set up the Windows VM

See How to configure GPU passthrough on trillian for
instructions to download an image, write it to a file or device, and
change the image to boot using UEFI.

Alternative: Copy the Windows VM from another host

I set up the Windows VM once, on trillian, and copy it to saitama using
netcat:

Follow the steps for setting up the Windows VM on trillian until the
NVIDIA driver and Steam are installed.

On saitama, wipe the MBR and GPT filesystem data off the drive which
is passed through to the VM. (On saitama /dev/sdx is /dev/sda.)
$ sudo wipefs -a --backup /dev/sdx


Use netcat to listen on port, say, 4444, and, as root, send the data
directly to /dev/sdx:
$ sudo su -
# nc -l 4444 > /dev/sdx


On trillian, cat the VM drive to 4444 on saitama:
(trillian) $ sudo cat /dev/sdxX | nc -N 172.16.2.XXX 4444

... or ...
(trillian) $ sudo losetup --show -o XXXXXX -f /dev/sdxX
/dev/loopN
(trillian) $ sudo cat /dev/loopN | nc -N 172.16.2.XXX 4444


Reinstalling Windows


Mount Windows drive C.
Back up the IEUser directory. Use tar + gzip because it will preserve
the right metadata for NTFS.
Follow the steps for trillian to set up the Windows VM, until the
NVIDIA driver and Steam are installed. The documentation refers to
the target device as /dev/sdxX. On saitama the target device is
/dev/sda.


Optimising Windows


Use GParted to add a data drive:
$ sudo gparted /dev/sdx


Assign more processors to Windows. nproc will tell you how many
processors are available. e.g.
$ nproc
8
$ vim windows
...
    -smp 4 \
...


Disable Hibernate: Run "cmd" as administrator, and
> powercfg -h off


Disable Suspend: Start > Settings > System > Power & sleep >
Sleep: "Never"

Disable Cortana:

Run "gpedit.msc"
Go to Computer Configuration > Administrative Templates > Windows
Components > Search
Find "Allow Cortana", and set it to "Disabled". Click "OK".
Reboot, or log out and log in.


If you are going to assign more than 4GB of RAM to Windows, hugepage
support will improve performance. hugepages is installed in Ubuntu
by default. The following is based on Ubuntu Community Help Wiki
and ArchWiki.

Confirm that hugepage size is 2048 KB:
$ cat /proc/meminfo | grep Hugepagesize
Hugepagesize:       2048 kB


If we want to assign 6 GB to Windows, that will be
6 × 1024 × 1024 ÷ 2048 = 6 × 1024 ÷ 2 = 3072 hugepages. Round up
to 3100. If we want to assign 12 GB to Windows, that will be
12 × 1024 ÷ 2 = 6144 hugepages. Round up to 6150. Add the
following to the script to reserve 3100 (for example) hugepages:
$ vim windows
...
sysctl vm.nr_hugepages=3100
...
    -m 6G \
    -mem-path /dev/hugepages \
...


You can check, while the VM is running, how many pages are used:
$ cat /proc/meminfo | grep HugePages


References


QEMU User Documentation
Heiko Sieger's great blog post
Konrad Eisele and Alex John's email thread


How to take back the NVIDIA

If you want to use the NVIDIA in Linux again, use the following steps to take the graphics card back:

Comment out "pci_stub" in /etc/initramfs-tools/modules

Move /lib/modprobe.d/pci-stub.conf to ~/doc/
$ sudo mv /lib/modprobe.d/pci-stub.conf ~/doc/


Update the initramfs image:
$ sudo update-initramfs -u


Reboot.


If necessary, check prime graphics device and monitor configuration:
$ xsudo nvidia-settings
$ rm ~/.config/monitors.xml

Reboot, log in, and configure monitors.