Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mcastelino/4b30e82817e2bda946e1ccf081dae3b7 to your computer and use it in GitHub Desktop.
Save mcastelino/4b30e82817e2bda946e1ccf081dae3b7 to your computer and use it in GitHub Desktop.
COR edition: SRIO-V with Docker CNM plugin

Using a Docker CNM plugin to play with SRIO-V

This gist describes the setup necessary for testing SRIO-V based connectivity between two physical boxes which are each setup as described here, and directly connected via their respective SRIO-V enabled NICs.

Setup host system's packages

For this scenario, I'm setting up two Ubuntu 16.04 systems which have a SRIO-V enabled interface as well as a second port for accessing the SUT. To setup:

  1. Enable ssh access for each machine
 sudo apt-get update && sudo apt-get install -y openssh-server
  1. Syncrhonize consoles to each SUT:

I'm lazy. While I'm sure there more effective ways to do this, I setup a tmux session on my development system and in two panes connect to each SUT. Next, press ctrl-b and then type :setw synchronize-panes

  1. Install necessary packages for Clear Containers building:
sudo apt-get install -y \
autoconf \
autoconf-archive \
libtool \
libglib2.0 \
json-glib-1.0 \
libjson-glib-dev \
uuid-dev \
libmnl-dev \
check \
bats \
glang-go
  1. Grab packages we'll use when playing with SRIO-V and Clear Containers:

go get github.com/clearcontainers/sriov go get github.com/01org/cc-oci-runtime

  1. Install docker and cc-oci-runtime:

Follow instructions to get docker and cc-oci-runtime install @ https://github.com/01org/cc-oci-runtime/blob/master/documentation/Installing-Clear-Containers-on-Ubuntu.md

Update Host machine's kernel to support SRIO-V

You may need to rebuild your system's kernel in order to disable VFNOIOMMU in the config and potentially add a PCI quirk for your NIC. If not, you are lucky and can move to next section. First, I'll describe how to assess if changes are needed, and then describe how to make these changes.

A side note on IOMMU groups and PCIe Access Control Services

Taking a look at how the iommu groups are setup on your host system can help provide information on whether or not your NIC is setup appropriately with respect to PCIe Access Control Services. More specifically, if the PCI bridge is within the same iommu-group as your NIC, then this is an indication that either your device doesn't support ACS or that it doesn't share this capability appropriately by default.

For example, when you run the following, if all is setup properly, you should have the PCI for each ACS enabled NIC port in its own iommu_group. If you don't see any output when running the following, then you likely need to update your config to disable VFNOIOMMU

find /sys/kernel/iommu_groups/ -type l

Fore more details, checkout http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html

Updating the host kernel

  1. Grab kernel sources
sudo apt-get install linux-source-4.10.0
sudo apt-get install linux-headers-4.10.0
cd /usr/src/linux-source-4.10.0/
sudo tar -xvf linux-source-4.10.0.tar.bz2
cd linux-source-4.10.0
sudo apt-get install libssl-dev
  1. Check the config and update if necessary
sudo cp /boot/config-4.8.0-36-generic .config
sudo make olddefconfig #and verify resulting .config does not have NOIOMMU set; ie: # CONFIG_VFIO_NOIOMMU is not set
  1. If necessary, add PCI Quirk for SRIOV NIC Now, depending on how your NIC describes its ACS capabilities, you may need to add a quirk in order to indicate that the given NIC does properly support ACS. An example is given below, but your mileage will vary (at the very least, check your PCI-ID).
modify drivers/pci/quirks.c: 
line 4118:
static const u16 pci_quirk_intel_pch_acs_ids[] = {
+        0x0c01,
        /* Ibexpeak PCH */
        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
  1. Build and install the kernel:
sudo make -j #and go get coffee
sudo make modules -j 3
sudo make modules_install
sudo make install
  1. Edit grub to enable intel-iommu:
edit /etc/default/grub and add intel_iommu=on to cmdline:
- GRUB_CMDLINE_LINUX=""
+ GRUB_CMDLINE_LINUX="intel_iommu=on"
sudo update-grub
  1. Reboot and verify Host system should be ready now -- reboot and verify that expect cmdline and kernel version is booted (look at /proc/cmdline and /proc/version)
sudo reboot

Setting up SRIOV Devices

All of the prior sections are needed once to prepare the SRIOV host systems. The following will be needed per boot in order to facilitate setting up a physical device's virtual functions.

For SRIOV, a physical device can create up to sriov_totalvfs virtual functions (VFs). Once created, you cannot grow or shrink the number of VFs without first setting it back to zero. Based on this, it is expected that you should set the number of VFs from a physical device just once.

  1. Add vfio-pci device driver

vfio-pci is a driver which is used to reserve a VF PCI device. Add it:

sudo modprobe vfio-pci
  1. Find our NICs of interest

Find PCI details for the NICs in question:

sriov@sriov-1:/sys/bus/pci$ lspci | grep Ethernet
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
01:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
01:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)

In our case, both 01:00.0 and 01:00.1 are the two ports on our x540-AT2 card that we'll use. You can use lshw to get further details on the controller and verify it indeed supports SRIO-V.

  1. Check how many VFs we can create:
$ sriov@sriov-1:~$ cat /sys/bus/pci/devices/0000\:01\:00.0/sriov_totalvfs
63
$ sriov@sriov-1:~$ cat /sys/bus/pci/devices/0000\:01\:00.1/sriov_totalvfs
63
  1. Create VFs:

Create virtual functions by editing sriov_numvfs. In our example, let's just create one per physical device. Note, this eliminates the usefulness of SRIOV, and is just done for simplicity in this example so I needn't look at 128 virtual devices each time I do lspci or ip a.

root@sriov-1:/home/sriov# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs
root@sriov-1:/home/sriov# echo 1 > /sys/bus/pci/devices/0000\:01\:00.1/sriov_numvfs
  1. Verify that these indeed were added to the host:
root@sriov-1:/home/sriov# lspci | grep Ethernet | grep Virtual
02:10.0 Ethernet controller: Intel Corporation X540 Ethernet Controller Virtual Function (rev 01)
02:10.1 Ethernet controller: Intel Corporation X540 Ethernet Controller Virtual Function (rev 01)
  1. Assign MAC address to each VF

Depending on the NIC being used, you may need to explicitly set the MAC address for the Virtual Function device in order to guarantee that the address is consistent on the host and when passed through to the guest. For example,

ip link set <pf> vf <vfidx> mac <my made up mac address>

Setup Clear Containers

Pending the NIC version used, you may want to update the kernel config for the Clear Container. This was required for i40e, and a kernel is available @: https://github.com/egernst/linux/tree/4.9.27/workload-testing

Run a SRIOV CNM plugin and make use of SRIOV

With the VFs created, let's go ahead and setup the CNM plugin and talk across the two machines.

  1. Build and start SRIOV plugin

This assumes you already have GOPATH set in your environment.

sriov@sriov-2:~$ sudo mkdir /etc/docker/plugins
sriov@sriov-2:~$ sudo cp go/src/github.com/clearcontainers/sriov/sriov.json /etc/docker/plugins/
cd /go/src/github.com/clearcontainers/sriov
go build
sudo ./sriov &
  1. Create docker network
sudo docker network create -d sriov --internal --opt pf_iface=enp1s0f0 --opt vlanid=100 --subnet=192.168.0.0/24 vfnet


E0505 09:35:40.550129    2541 plugin.go:297] Numvfs and Totalvfs are not same on the PF - Initialize numvfs to totalvfs
ee2e5a594f9e4d3796eda972f3b46e52342aea04cbae8e5eac9b2dd6ff37b067
``

3.  Start containers and test connectivity

Assuming you did all of the above on two machines, let's go ahead and start up a container
on two different machines which have their SRIOV enabled NICs connected, as follows:

Machine #1:

sriov@sriov-2:~$ sudo docker run --runtime=cor --net=vfnet --ip=192.168.0.10 -it mcastelino/iperf iperf3 -s

Machine #2:

sriov@sriov-1:~$ sudo docker run --runtime=cor --net=vfnet -it mcastelino/iperf iperf3 -c 192.168.0.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment