Skip to content

Instantly share code, notes, and snippets.

@devimc
Created April 21, 2020 19:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save devimc/37240919403612af70d1df1781009db0 to your computer and use it in GitHub Desktop.
Save devimc/37240919403612af70d1df1781009db0 to your computer and use it in GitHub Desktop.
VFIO passthought with k8s and kata containers

install a device pluging for kubernetes

git clone https://github.com/intel/sriov-network-device-plugin
pushd sriov-network-device-plugin
# [optional] Running on a VM? - add a virtio net to the ConfigMap
# NOTE: The QEMU VM must have an extra virtio NIC device and support iommu:
# -machine q35,accel=kvm,kernel_irqchip=split -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on -netdev user,id=mynet1 -device virtio-net-pci,netdev=mynet1,disable-legacy=on,disable-modern=off,iommu_platform=on,ats=on
sed -i 's|resourceList.*|resourceList": [{"resourceName":"virtio_net","selectors":{"vendors":["1af4"],"devices":["1041"],"drivers":["vfio-pci"],"pfNames":["eth1"]}},{|g' deployments/configMap.yaml
#
make image
kubectl create -f deployments/configMap.yaml
# Create a local registry to pull the image
docker run -d -p 5000:5000 --restart=always --name registry registry:2
# tag and push the new image to the local registry
docker tag nfvpe/sriov-device-plugin localhost:5000/sriov-device-plugin
docker push localhost:5000/sriov-device-plugin
# Add the local registry to /etc/crio/crio.conf, restart crio and pull the image
# registries = [ "docker.io", "localhost:5000" ]
sudo systemctl restart crio
sudo crictl pull sriov-device-plugin
# Deploy the plugin
kubectl create -f deployments/k8s-v1.16/sriovdp-daemonset.yaml
popd

[Optional] create a vfio device for the virtio NIC

List PCI devices to see what device you will passthrough to the kata container

$ lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:02.0 Unclassified device [00ff]: Red Hat, Inc. Virtio RNG
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)

In the following example I'm going to use the NIC device at the address 00:03.0. Change ADDR according to your needs.

ADDR="00:03.0"
sudo modprobe vfio
sudo modprobe vfio-pci
echo 0000:${ADDR} | sudo tee /sys/bus/pci/devices/0000:${ADDR}/driver/unbind
echo '1af4 1041' | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id

Run a kata container

Before running kata containers you need to check if your vfio device was found by the pluging. Change NODE with the node name

$ NODE=$(hostname)
$ kubectl get node ${NODE} -o json | jq '.status.allocatable'
{
  "cpu": "4",
  "intel.com/virtio_net": "1",
  "memory": "4000188Ki",
  "pods": "110"
}

Awesome!. There is a resource called intel.com/virtio_net. It's time to use this new resource and passthrough it to a kata container. I'm going to use the following yaml to run a kata container.

vfio.yaml

apiVersion: v1
kind: Pod
metadata:
  name: kata
spec:
  runtimeClassName: kata
  containers:
  - name: c1
    image: ubuntu
    command:
      - bash
    tty: true
    stdin: true
    resources:
      limits:
        cpu: "2"
        intel.com/virtio_net: "1"
      requests:
        cpu: "2"
        intel.com/virtio_net: "1"

Run a kata container and check the extra virtio NIC in the container

$ kubectl apply -f vfio.yaml
$ kubectl exec -ti pod/kata -- bash -c 'apt-get update -y; apt-get install -y iproute2; ip a'
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq state UP group default qlen 1000
    link/ether X brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.4/24 brd 10.244.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 X/64 scope link nodad 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether X brd ff:ff:ff:ff:ff:ff

Great! eth1 is the extra NIC device passed through VFIO

@renzhengeek
Copy link

renzhengeek commented Aug 31, 2023

hi, did you encounter this error alike reported by kata:

containerd[327859]: time="2023-08-31T14:53:19.342257153+08:00" level=error msg="createContainer failed" error="rpc error: code = Internal desc = Can't parse PCIDEVICE_INTEL_COM_VIRTIO_NET_INFO environment variable\n\nCaused by:\n PCI address {"0000:b7:00.0":{"generic":{"deviceID":"0000:b7:00.0"} should have the format DDDD:BB:SS.F" name=containerd-shim-v2 pid=364621 sandbox=65f15bbb6bfcd01f32d3fc08a16ec3a3bfc9c692cf4c2a5b9421c2cebce3f055 source=virtcontainers subsystem=kata_agent

@renzhengeek
Copy link

with the following changes, it works.

firstly, changes kata config to have:
enable_iommu = true
vfio_mode="vfio"

secondly, do not use the latest sriodp image, instead:
ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin:v3.5.1

as the latest one introduce extra pci dev info ENV, which kata doesn't parse.
k8snetworkplumbingwg/sriov-network-device-plugin@5aa7053

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment