Skip to content

Instantly share code, notes, and snippets.

Last active April 17, 2024 05:05
Show Gist options
  • Save mythi/0c7381613510a72ed4810d826549290b to your computer and use it in GitHub Desktop.
Save mythi/0c7381613510a72ed4810d826549290b to your computer and use it in GitHub Desktop.
SGX EPC cgroups for Kubernetes
1. Prepare the kernel
git clone --depth 1 -b sgx_cg_upstream_v12 linux-epc-cgroups
Added config:
2. Boot the VM and check SGX cgroups
host:$ qemu-system-x86_64 \
-object memory-backend-epc,id=mem1,size=64M,prealloc=on \
-M sgx-epc.0.memdev=mem1 \
-drive file=jammy.raw,if=virtio,aio=threads,format=raw,index=0,media=disk \
-kernel ./arch/x86_64/boot/bzImage \
guest:$ grep sgx_epc /sys/fs/cgroup/misc.capacity
sgx_epc 67108864
3. Setup (a single node) K8S cluster w/ containerd 1.7 and SGX EPC NRI plugin on Ubuntu 22.04
$ dpkg -l |grep containerd
ii containerd 1.7.2-0ubuntu1~22.04.1 amd64 daemon to control runC
# NB: config.toml: enable nri (disable = false), systemdCgroup = true
$ grep -A7 nri\.v1 /etc/containerd/config.toml
disable = false
disable_connections = false
plugin_config_path = "/etc/nri/conf.d"
plugin_path = "/opt/nri/plugins"
plugin_registration_timeout = "5s"
plugin_request_timeout = "2s"
socket_path = "/var/run/nri/nri.sock"
$ sudo ls /var/run/nri/
$ git clone -b PR-2023-050
$ cd intel-device-plugins-for-kubernetes
$ make intel-deviceplugin-operator
$ docker save intel/intel-deviceplugin-operator:devel > op.tar
$ sudo ctr -n i import op.tar
$ kubectl apply -k deployments/operator/default/
$ kubectl apply -f deployments/operator/samples/deviceplugin_v1_sgxdeviceplugin.yaml
4. Run
Use and run
watch -n 1 "./kube-cgroups -n 'sgxplugin-*' -f '(misc|memory).(max|current)'" -p 'sgx-epc-*'
(with the targeted namespace (-n) and podname filter (-p))
Run a pod requesting "65536"
5. e2e test framework
$ git clone -b PR-2023-050
$ cd intel-device-plugins-for-kubernetes
$ make stress-ng-gramine intel-sgx-admissionwebhook
$ docker save intel/intel-sgx-admissionwebhook:devel > wh.tar
$ sudo ctr -n i import wh.tar
$ docker save intel/stress-ng-gramine:devel > gr.tar
$ sudo ctr -n i import gr.tar
$ go test -v ./test/e2e/... -ginkgo.v -ginkgo.focus "Device:sgx.*App:sgx-epc-cgroup"
NB: The e2e test framework expects cert-manager is deployed in the cluster
NB: The e2e test framework deletes all but kube-system and cert-manager namespaces before running the tests so do not run in a cluster with something important deployed!
Copy link

CyanDevs commented Nov 9, 2023

Hi @mythi I've successfully validated this on my end using an Azure VM. The issue was with using minikube and that inherently had issues running the NRI plugin (as well as the missing kernel.config for NFD). Once I switched to using kubeadm, these issues were non-existent and everything ran as expected.

$ cd ./kubepods-besteffort-pode845916d_a5eb_4abf_8c5c_e6d3a2d4f5b6.slice/cri-containerd-ac823861137eed2323214cefdc27b7295bbbaf4d55e4ee919e772fef133d02c3.scope
$ cat misc.max
sgx_epc 65536

I'm grateful for your guidance and prompt responses here. Thank you!

Copy link

mythi commented Nov 9, 2023

@CyanDevs Great to hear! Any suggestions where I could improve the documentation here other than clearly mention minikube is known not to work? I'm also about to add the steps to get cAdvisor set up for the telemetry piece.

Go ahead with more (stress) testing and let me and Haitao know if there are issues.

Copy link

@mythi I sent you my notes that I wrote as I went through the steps. This guide is great. Some improvements I can think of is including notes for installing cert-manager and NFD -- I did not know this as I had never used intel-device-plugin before this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment