Skip to content

Instantly share code, notes, and snippets.

Created April 21, 2020 19:14
Show Gist options
  • Save devimc/37240919403612af70d1df1781009db0 to your computer and use it in GitHub Desktop.
Save devimc/37240919403612af70d1df1781009db0 to your computer and use it in GitHub Desktop.
VFIO passthought with k8s and kata containers

install a device pluging for kubernetes

git clone
pushd sriov-network-device-plugin
# [optional] Running on a VM? - add a virtio net to the ConfigMap
# NOTE: The QEMU VM must have an extra virtio NIC device and support iommu:
# -machine q35,accel=kvm,kernel_irqchip=split -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on -netdev user,id=mynet1 -device virtio-net-pci,netdev=mynet1,disable-legacy=on,disable-modern=off,iommu_platform=on,ats=on
sed -i 's|resourceList.*|resourceList": [{"resourceName":"virtio_net","selectors":{"vendors":["1af4"],"devices":["1041"],"drivers":["vfio-pci"],"pfNames":["eth1"]}},{|g' deployments/configMap.yaml
make image
kubectl create -f deployments/configMap.yaml
# Create a local registry to pull the image
docker run -d -p 5000:5000 --restart=always --name registry registry:2
# tag and push the new image to the local registry
docker tag nfvpe/sriov-device-plugin localhost:5000/sriov-device-plugin
docker push localhost:5000/sriov-device-plugin
# Add the local registry to /etc/crio/crio.conf, restart crio and pull the image
# registries = [ "", "localhost:5000" ]
sudo systemctl restart crio
sudo crictl pull sriov-device-plugin
# Deploy the plugin
kubectl create -f deployments/k8s-v1.16/sriovdp-daemonset.yaml

[Optional] create a vfio device for the virtio NIC

List PCI devices to see what device you will passthrough to the kata container

$ lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:02.0 Unclassified device [00ff]: Red Hat, Inc. Virtio RNG
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)

In the following example I'm going to use the NIC device at the address 00:03.0. Change ADDR according to your needs.

sudo modprobe vfio
sudo modprobe vfio-pci
echo 0000:${ADDR} | sudo tee /sys/bus/pci/devices/0000:${ADDR}/driver/unbind
echo '1af4 1041' | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id

Run a kata container

Before running kata containers you need to check if your vfio device was found by the pluging. Change NODE with the node name

$ NODE=$(hostname)
$ kubectl get node ${NODE} -o json | jq '.status.allocatable'
  "cpu": "4",
  "": "1",
  "memory": "4000188Ki",
  "pods": "110"

Awesome!. There is a resource called It's time to use this new resource and passthrough it to a kata container. I'm going to use the following yaml to run a kata container.


apiVersion: v1
kind: Pod
  name: kata
  runtimeClassName: kata
  - name: c1
    image: ubuntu
      - bash
    tty: true
    stdin: true
        cpu: "2" "1"
        cpu: "2" "1"

Run a kata container and check the extra virtio NIC in the container

$ kubectl apply -f vfio.yaml
$ kubectl exec -ti pod/kata -- bash -c 'apt-get update -y; apt-get install -y iproute2; ip a'
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq state UP group default qlen 1000
    link/ether X brd ff:ff:ff:ff:ff:ff
    inet brd scope global eth0
       valid_lft forever preferred_lft forever
    inet6 X/64 scope link nodad 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether X brd ff:ff:ff:ff:ff:ff

Great! eth1 is the extra NIC device passed through VFIO

Copy link

with the following changes, it works.

firstly, changes kata config to have:
enable_iommu = true

secondly, do not use the latest sriodp image, instead:

as the latest one introduce extra pci dev info ENV, which kata doesn't parse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment