Skip to content

Instantly share code, notes, and snippets.

@egernst
Last active May 11, 2017 19:36
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save egernst/8ce3b196910d20f47e37f717e8cbaef6 to your computer and use it in GitHub Desktop.
Save egernst/8ce3b196910d20f47e37f717e8cbaef6 to your computer and use it in GitHub Desktop.
SRIO-V with Docker CNM plugin

Using a Docker CNM plugin to play with SRIO-V

This gist describes the setup necessary for testing SRIO-V based connectivity between two physical boxes which are each setup as described here, and directly connected via their respective SRIO-V enabled NICs.

Setup host system's packages

For this scenario, I'm setting up two Ubuntu 16.04 systems which have a SRIO-V enabled interface as well as a second port for accessing the SUT. To setup:

  1. Enable ssh access for each machine
 sudo apt-get update && sudo apt-get install -y openssh-server
  1. Synchronize consoles to each SUT:

I'm lazy. While I'm sure there more effective ways to do this, I setup a tmux session on my development system and in two panes connect to each SUT. Next, press ctrl-b and then type :setw synchronize-panes

  1. Grab SRIOV-V plugin:
go get github.com/clearcontainers/sriov
  1. Install docker:
sudo groupadd docker
sudo -E gpasswd -a $USER docker
sudo apt-get install -y apt-transport-https ca-certificates
curl -fsSL https://yum.dockerproject.org/gpg | sudo apt-key add -
sudo add-apt-repository "deb https://apt.dockerproject.org/repo/ ubuntu-xenial main"
sudo apt-get update
sudo apt-get install -y docker-engine

Update Host machine's kernel to support SRIO-V

You may need to rebuild your system's kernel in order to disable VFNOIOMMU in the config and potentially add a PCI quirk for your NIC. If not, you are lucky and can move to next section. First, I'll describe how to assess if changes are needed, and then describe how to make these changes.

A side note on IOMMU groups and PCIe Access Control Services

Taking a look at how the iommu groups are setup on your host system can help provide information on whether or not your NIC is setup appropriately with respect to PCIe Access Control Services. More specifically, if the PCI bridge is within the same iommu-group as your NIC, then this is an indication that either your device doesn't support ACS or that it doesn't share this capability appropriately by default. As a very first step, if it isn't already enabled, add intel_iummu=on to the kernel cmdline:

edit /etc/default/grub and add intel_iommu=on to cmdline:
- GRUB_CMDLINE_LINUX=""
+ GRUB_CMDLINE_LINUX="intel_iommu=on"
sudo update-grub
sudo reboot now

After IOMMU is enabled, when you run the following, if all is setup properly, you should have the PCI for each ACS enabled NIC port in its own iommu_group.

find /sys/kernel/iommu_groups/ -type l

If you still do not see any output when running above command after editing grub and rebooting, you may need to check to make sure your kernel's config has VFNOIOMMU disabled. If you see that the devices are in the same group, then you'll need to follow directions below on updating the kernel.

For more details, checkout http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html

Updating the host kernel

  1. Grab kernel sources
sudo apt-get install linux-source-4.10.0
sudo apt-get install linux-headers-4.10.0
cd /usr/src/linux-source-4.10.0/
sudo tar -xvf linux-source-4.10.0.tar.bz2
cd linux-source-4.10.0
sudo apt-get install libssl-dev
  1. Check the config and update if necessary
sudo cp /boot/config-4.8.0-36-generic .config
sudo make olddefconfig #and verify resulting .config does not have NOIOMMU set; ie: # CONFIG_VFIO_NOIOMMU is not set
  1. If necessary, add PCI Quirk for SRIOV NIC

Now, depending on how your NIC describes its ACS capabilities, you may need to add a quirk in order to indicate that the given NIC does properly support ACS. An example is given below, but your mileage will vary (at the very least, check your PCI-ID).

modify drivers/pci/quirks.c: 
line 4118:
static const u16 pci_quirk_intel_pch_acs_ids[] = {
+        0x0c01,
        /* Ibexpeak PCH */
        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
  1. Build and install the kernel:
sudo make -j #and go get coffee
sudo make modules -j 3
sudo make modules_install
sudo make install
  1. Reboot and verify

Host system should be ready now -- reboot and verify that expect cmdline and kernel version is booted (look at /proc/cmdline and /proc/version)

sudo reboot

Setting up SRIOV Devices

All of the prior sections are needed once to prepare the SRIOV host systems. The following will be needed per boot in order to facilitate setting up a physical device's virtual functions.

For SRIOV, a physical device can create up to sriov_totalvfs virtual functions (VFs). Once created, you cannot grow or shrink the number of VFs without first setting it back to zero. Based on this, it is expected that you should set the number of VFs from a physical device just once.

  1. Add vfio-pci device driver

vfio-pci is a driver which is used to reserve a VF PCI device. Add it:

sudo modprobe vfio-pci
  1. Find our NICs of interest

Find PCI details for the NICs in question:

sriov@sriov-1:/sys/bus/pci$ lspci | grep Ethernet
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
01:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
01:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)

In our case, both 01:00.0 and 01:00.1 are the two ports on our x540-AT2 card that we'll use. You can use lshw to get further details on the controller and verify it indeed supports SRIO-V.

  1. Check how many VFs we can create:
$ sriov@sriov-1:~$ cat /sys/bus/pci/devices/0000\:01\:00.0/sriov_totalvfs
63
$ sriov@sriov-1:~$ cat /sys/bus/pci/devices/0000\:01\:00.1/sriov_totalvfs
63
  1. Create VFs:

Create virtual functions by editing sriov_numvfs. In our example, let's just create one per physical device. Note, this eliminates the usefulness of SRIOV, and is just done for simplicity in this example so I needn't look at 128 virtual devices each time I do lspci or ip a.

root@sriov-1:/home/sriov# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs
root@sriov-1:/home/sriov# echo 1 > /sys/bus/pci/devices/0000\:01\:00.1/sriov_numvfs
  1. Verify that these indeed were added to the host:
root@sriov-1:/home/sriov# lspci | grep Ethernet | grep Virtual
02:10.0 Ethernet controller: Intel Corporation X540 Ethernet Controller Virtual Function (rev 01)
02:10.1 Ethernet controller: Intel Corporation X540 Ethernet Controller Virtual Function (rev 01)

Run a SRIOV CNM plugin and make use of SRIOV

With the VFs created, let's go ahead and setup the CNM plugin and talk across the two machines.

  1. Build and start SRIOV plugin

This assumes you already have GOPATH set in your environment.

sriov@sriov-2:~$ sudo mkdir /etc/docker/plugins
sriov@sriov-2:~$ sudo cp go/src/github.com/clearcontainers/sriov/sriov.json /etc/docker/plugins/
cd /go/src/github.com/clearcontainers/sriov
go build
sudo ./sriov &
  1. Create docker network
sudo docker network create -d sriov --internal --opt pf_iface=enp1s0f0 --opt vlanid=100 --subnet=192.168.0.0/24 vfnet

E0505 09:35:40.550129 2541 plugin.go:297] Numvfs and Totalvfs are not same on the PF - Initialize numvfs to totalvfs ee2e5a594f9e4d3796eda972f3b46e52342aea04cbae8e5eac9b2dd6ff37b067

  1. Start containers and test connectivity

Assuming you did all of the above on two machines, let's go ahead and start up a container on two different machines which have their SRIOV enabled NICs connected, as follows:

Machine #1:

sriov@sriov-2:~$ sudo docker run --runtime=runc --net=vfnet --ip=192.168.0.10 -it mcastelino/iperf iperf3 -s

Machine #2:

sriov@sriov-1:~$ sudo docker run --runtime=runc --net=vfnet -it mcastelino/iperf iperf3 -c 192.168.0.10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment