Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Consumer-grade GPU passthrough in an OpenStack system (NVIDIA GPUs)

Consumer-grade GPUs in an OpenStack system (NVIDIA GPUs)

Assumptions

This assumes you have GTX980 cards in your system (PCI id 10de:13c0 & 10de:0fbb per card). Just add more IDs for other cards in order to make this more generic. This also assumes nova uses qemu-kvm as the virtualization hypervisor (qemu-system-x86_64). This seems to be the default on OpenStack Newton when installed using openstack-ansible.

We assume OpenStack Newton is pre-installed and that we are working on a Nova compute node. This has been tested on an Ubuntu 16.04 system where I installed OpenStack AIO version 14.0.0 (different from the git tag used in the instructions!): http://docs.openstack.org/developer/openstack-ansible/developer-docs/quickstart-aio.html

Prepare the system for GPU passthrough (set up IOMMU/vfio/...)

Note: This is heavily based on information from https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Enabling_IOMMU adapted for Ubuntu 16.04

  1. Ensure SR-IOV and VT-d are enabled in your system BIOS.

  2. add intel_iommu=on to the kernel command line (in /etc/default/grub)

  3. run

    $ update-grub
    
  4. blacklist snd_hda_intel (which might grab the audio portion of the GPU on the host) (also just blacklist all potential GPU modules while we are at it. Especially nouveau is important here.) Edit /etc/modprobe.d/blacklist.conf:

    blacklist snd_hda_intel
    blacklist amd76x_edac
    blacklist vga16fb
    blacklist nouveau
    blacklist rivafb
    blacklist nvidiafb
    blacklist rivatv
    
  5. Make the vfio-pci module hold on to the devices we might want to pass through (and devices in the same iommu group). This mostly just means each GPU and its audio device (even though we don't pass through the audio device). In this case PCI vendor ID 10de:13c0 is the main GPU and 10de:0fbb is its HDMI audio interface. Create /etc/modprobe.d/vfio.conf:

    # (GTX980 and its audio controller)
    options vfio-pci ids=10de:13c0,10de:0fbb
    

    Note: you can find all NVIDIA cards with their PCI vendor IDs in your system using something like this:

    $ lspci -nn | grep NVIDIA
    
  6. Make sure vfio-pci gets loaded as early as possible by editing /etc/modules-load.d/modules.conf and adding vfio-pci to the list.

  7. Update the initrd to apply these changes at boot by running

    $ update-initramfs -u
    
  8. Reboot the system in order to activate the intel_iommu=on kernel option.

  9. Now make sure the GPUs and their audio interfaces are "in use" by vfio-pci and not by any other module. Something like this should be what you see:

    root@stack:~# lspci -nnk -d 10de:13c0
    05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1)
      Subsystem: eVga.com. Corp. GM204 [GeForce GTX 980] [3842:2980]
      Kernel driver in use: vfio-pci
      Kernel modules: nvidiafb, nouveau
    84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1)
      Subsystem: eVga.com. Corp. GM204 [GeForce GTX 980] [3842:2980]
      Kernel driver in use: vfio-pci
      Kernel modules: nvidiafb, nouveau
    root@thunerstack:~# lspci -nnk -d 10de:0fbb
    05:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
      Subsystem: eVga.com. Corp. GM204 High Definition Audio Controller [3842:2980]
      Kernel driver in use: vfio-pci
      Kernel modules: snd_hda_intel
    84:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
      Subsystem: eVga.com. Corp. GM204 High Definition Audio Controller [3842:2980]
      Kernel driver in use: vfio-pci
      Kernel modules: snd_hda_intel
    
  10. Your system should now be ready for PCI passthrough of its GPUs.

Configure Nova on the compute node and controller

Note: This is based on information from http://docs.openstack.org/admin-guide/compute-pci-passthrough.html

  1. Add this to nova.conf on the controller, the api and compute hosts (create more aliases for various other GPU models). Edit /etc/nova/nova.conf (on each system, compute, controller and api):

    [default]
    ...
    pci_alias = { "vendor_id":"10de", "product_id":"13c0", "device_type":"type-PCI", "name":"gtx980" }
    pci_passthrough_whitelist = { "vendor_id": "10de", "product_id": "13c0" }
    ...
    

    In the same file append ,PciPassthroughFilter to the scheduler_default_filters option in /etc/nova/nova.conf:

    # add this to scheduler_default_filters in /etc/nova/nova.conf
    scheduler_default_filters = ..... ,PciPassthroughFilter
    
  2. Restart nova-compute, nova-api and nova-scheduler, depending on the node:

    $ systemctl restart nova-api
    $ systemctl restart nova-scheduler
    $ systemctl restart nova-compute
    
  3. Then configure the a flavor as usual and finally add the GPU requirement to it (1x gtx980 in this example)

    $ openstack flavor set m1.large.1gtx980 --property "pci_passthrough:alias"="gtx980:1"
    

    In this example gtx980 is the name chosen above and :1 means the flav wants one of this resource. So in order to make it a 2-GPU flavor it would be gtx980:2.

  4. Now GPU-passthrough should work. There is one last step to perform in order to make NVIDIA consumer-grade GPUs usable in VMs. Apparently the NVIDIA driver checks if it runs inside a VM and won't start up in case it is. This seems to be a "bug" that NVIDIA probably does not intend to fix. In any case, KVM (in this case through qemu-kvm) can be configured to hide the fact that the VM is running in KVM. I do not think this can be directly changed in OpenStack/libvirtd, but one way of injecting the correct options is to install this wrapper script around qemu:

    1. Rename /usr/bin/qemu-system-x86_64 to /usr/bin/qemu-system-x86_64.orig and deploy this wrapper as /usr/bin/qemu-system-x86_64 on the nova compute host.

      #!/usr/bin/python
      
      import os
      import sys
      
      new_args = []
      
      # only change the "-cpu" options (inject kvm=off and hv_vendor_id=MyFake_KVM)
      for i in range(len(sys.argv)):
          if i<=1: 
              new_args.append(sys.argv[i])
              continue
          if sys.argv[i-1] != "-cpu":
              new_args.append(sys.argv[i])
              continue
      
          subargs = sys.argv[i].split(",")
      
          subargs.insert(1,"kvm=off")
          subargs.insert(2,"hv_vendor_id=MyFake_KVM")
      
          new_arg = ",".join(subargs)
      
          new_args.append(new_arg)
      
      os.execv('/usr/bin/qemu-system-x86_64.orig', new_args)
    2. Add /usr/bin/qemu-system-x86_64.orig to /etc/apparmor.d/abstractions/libvirt-qemu as

      /usr/bin/qemu-system-x86_64 rmix,
      

      and reload apparmor

      $ systemctl reload apparmor
      

This should be it. You should now be able to create GPU instances in your OpenStack cluster.

@vaskokj

This comment has been minimized.

Copy link

vaskokj commented Jan 17, 2017

My sys.argv does not have the -cpu flag. If I manually add it, I get "qemu-system-x86_64: Unable to find CPU definition: kvm=off".

If I print out the sys.argv I see this parameters being passed to my python script.

"-S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilties.pidfile -daemonize"

Did new_args.append('-cpu kvm=off') and no luck.

@ryanmickler

This comment has been minimized.

Copy link

ryanmickler commented Jul 28, 2017

So open stack no-longer passes the -cpu arg directly to qemu-system-x86_64, so you'll need to manually edit the libvirt xml with virsh then reboot the instance. this is the only way i could get it to work

@ryanmickler

This comment has been minimized.

Copy link

ryanmickler commented Jul 28, 2017

also, i think you mean
/usr/bin/qemu-system-x86_64.orig rmix
as the line you need to add

@olivier-dj

This comment has been minimized.

Copy link

olivier-dj commented Feb 2, 2018

Since the pike release, for the Openstack config section, we can just set the property img_hide_hypervisor_id=true for our images created with glance or set the a images common properties as well, source, the nova.conf shall be filled a bit differently source, for the scheduler it seems that the doc isn't up-to-date regarding some warning of the nova-conductor service.
I made Cuda work for my gtx1080ti cards on guests this way.
Also some troubleshooting for the passthrough, disabling the frame buffer may help (encountered this problem with AMD cards, disabled it before trying for nvidia). Check if it is enabled with ls -l /dev/fb* , and check grub options to disable it (nofb for example).

@frippe75

This comment has been minimized.

Copy link

frippe75 commented Mar 2, 2018

This document was soo spot on! thanks. used it for Openstack/Newton on CentOS 7.4 (gtx1050ti el cheapo).
There was only a few expected differences in file locations but now.... nvidia-smi finally returns info... Thanks!

@hebimg

This comment has been minimized.

Copy link

hebimg commented May 7, 2018

Hi,There is one last step to perform in order to make NVIDIA consumer-grade GPUs usable in VMs. Hello, I edit /usr/bin/qemu-system-x86_64 on the nova compute host.

`#!/usr/bin/python

import os
import sys

new_args = []

only change the "-cpu" options (inject kvm=off and hv_vendor_id=MyFake_KVM)
for i in range(len(sys.argv)):
if i<=1:
new_args.append(sys.argv[i])
continue
if sys.argv[i-1] != "-cpu":
new_args.append(sys.argv[i])
continue

subargs = sys.argv[i].split(",")

subargs.insert(1,"kvm=off")
subargs.insert(2,"hv_vendor_id=MyFake_KVM")

new_arg = ",".join(subargs)

new_args.append(new_arg)
os.execv('/usr/bin/qemu-system-x86_64.orig', new_args)`

But I found that /usr/bin/qemu-system-x86_64 cannot be run by OpenStack. How can I change it? Or is there a mistake in my configuration?

@mgariepy

This comment has been minimized.

Copy link

mgariepy commented May 9, 2018

Just to inform you that if you are using Openstack Pike or later, you can use the gpu directly with the installation.

you only need to set metadata in your image :
img_hide_hypervisor_id='true'

@schmilmo

This comment has been minimized.

Copy link

schmilmo commented May 28, 2018

Could someone tell what are the changes needed for RHEL/Centos?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.