-
-
Save mythi/0c7381613510a72ed4810d826549290b to your computer and use it in GitHub Desktop.
1. Prepare the kernel | |
git clone --depth 1 -b sgx_cg_upstream_v12 https://github.com/haitaohuang/linux.git linux-epc-cgroups | |
Added config: | |
CONFIG_CGROUP_SGX_EPC=y | |
2. Boot the VM and check SGX cgroups | |
host:$ qemu-system-x86_64 \ | |
... | |
-object memory-backend-epc,id=mem1,size=64M,prealloc=on \ | |
-M sgx-epc.0.memdev=mem1 \ | |
-drive file=jammy.raw,if=virtio,aio=threads,format=raw,index=0,media=disk \ | |
-kernel ./arch/x86_64/boot/bzImage \ | |
... | |
guest:$ grep sgx_epc /sys/fs/cgroup/misc.capacity | |
sgx_epc 67108864 | |
3. Setup (a single node) K8S cluster w/ containerd 1.7 and SGX EPC NRI plugin on Ubuntu 22.04 | |
$ dpkg -l |grep containerd | |
ii containerd 1.7.2-0ubuntu1~22.04.1 amd64 daemon to control runC | |
# NB: config.toml: enable nri (disable = false), systemdCgroup = true | |
$ grep -A7 nri\.v1 /etc/containerd/config.toml | |
[plugins."io.containerd.nri.v1.nri"] | |
disable = false | |
disable_connections = false | |
plugin_config_path = "/etc/nri/conf.d" | |
plugin_path = "/opt/nri/plugins" | |
plugin_registration_timeout = "5s" | |
plugin_request_timeout = "2s" | |
socket_path = "/var/run/nri/nri.sock" | |
$ sudo ls /var/run/nri/ | |
nri.sock | |
$ git clone -b PR-2023-050 https://github.com/mythi/intel-device-plugins-for-kubernetes.git | |
$ cd intel-device-plugins-for-kubernetes | |
$ make intel-deviceplugin-operator | |
$ docker save intel/intel-deviceplugin-operator:devel > op.tar | |
$ sudo ctr -n k8s.io i import op.tar | |
$ kubectl apply -k deployments/operator/default/ | |
$ kubectl apply -f deployments/operator/samples/deviceplugin_v1_sgxdeviceplugin.yaml | |
4. Run | |
Use https://raw.githubusercontent.com/containers/nri-plugins/main/scripts/testing/kube-cgroups and run | |
watch -n 1 "./kube-cgroups -n 'sgxplugin-*' -f '(misc|memory).(max|current)'" -p 'sgx-epc-*' | |
(with the targeted namespace (-n) and podname filter (-p)) | |
Run a pod requesting sgx.intel.com/epc: "65536" | |
5. e2e test framework | |
$ git clone -b PR-2023-050 https://github.com/mythi/intel-device-plugins-for-kubernetes.git | |
$ cd intel-device-plugins-for-kubernetes | |
$ make stress-ng-gramine intel-sgx-admissionwebhook | |
$ docker save intel/intel-sgx-admissionwebhook:devel > wh.tar | |
$ sudo ctr -n k8s.io i import wh.tar | |
$ docker save intel/stress-ng-gramine:devel > gr.tar | |
$ sudo ctr -n k8s.io i import gr.tar | |
$ go test -v ./test/e2e/... -ginkgo.v -ginkgo.focus "Device:sgx.*App:sgx-epc-cgroup" | |
NB: The e2e test framework expects cert-manager is deployed in the cluster | |
NB: The e2e test framework deletes all but kube-system and cert-manager namespaces before running the tests so do not run in a cluster with something important deployed! |
Awesome, removing the kernel.config labelling rule worked like magic and the sgx.intel.com/epc resource is now registered with the node. Not sure why the kernel config is missing. I'll give kubeadm a try later - thanks for the suggestion. Also - yes the minikube nodes are using cgroupv2.
My sgx_epc is not reporting the expected 65536 allocation though. It's probably because my nri plugin isn't running. I'll investigate why tomorrow.
Thanks for all your help @mythi, I appreciate it.
docker@minikube:/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podcfe87044_f090_498e_81e7_f8f032d945
9c.slice/cri-containerd-3b11506602acad6132b5da715a016d0a166c4aae541af7be5ccfe74b0e777581.scope$ cat misc.max
sgx_epc max
The job spec
apiVersion: batch/v1
kind: Job
metadata:
name: oe-helloworld
namespace: default
spec:
template:
metadata:
labels:
app: oe-helloworld
spec:
containers:
- name: oe-helloworld
image: mcr.microsoft.com/acc/samples/oe-helloworld:1.1
command: [ "sleep", "infinity" ]
resources:
limits:
sgx.intel.com/epc: "65536"
requests:
sgx.intel.com/epc: "65536"
volumeMounts:
- name: var-run-aesmd
mountPath: /var/run/aesmd
restartPolicy: "Never"
volumes:
- name: var-run-aesmd
hostPath:
path: /var/run/aesmd
backoffLimit: 0
pod
$ kubectl describe pod oe-helloworld-xpcvg
Name: oe-helloworld-xpcvg
Namespace: default
Priority: 0
Service Account: default
Node: minikube/192.168.49.2
Start Time: Wed, 08 Nov 2023 06:48:31 +0000
Labels: app=oe-helloworld
batch.kubernetes.io/controller-uid=00df9e9d-ba40-482f-92c8-68dc1808745f
batch.kubernetes.io/job-name=oe-helloworld
controller-uid=00df9e9d-ba40-482f-92c8-68dc1808745f
job-name=oe-helloworld
Annotations: sgx.intel.com/epc: 64Ki
Status: Running
IP: 10.244.0.11
IPs:
IP: 10.244.0.11
Controlled By: Job/oe-helloworld
Containers:
oe-helloworld:
Container ID: containerd://3b11506602acad6132b5da715a016d0a166c4aae541af7be5ccfe74b0e777581
Image: mcr.microsoft.com/acc/samples/oe-helloworld:1.1
Image ID: mcr.microsoft.com/acc/samples/oe-helloworld@sha256:64033ee002d17d69790398e4c272a9c467334a931ca0fb087b98b96b9f3be3db
Port: <none>
Host Port: <none>
Command:
sleep
infinity
State: Running
Started: Wed, 08 Nov 2023 06:48:31 +0000
Ready: True
Restart Count: 0
Limits:
sgx.intel.com/enclave: 1
sgx.intel.com/epc: 65536
Requests:
sgx.intel.com/enclave: 1
sgx.intel.com/epc: 65536
Environment: <none>
Mounts:
/var/run/aesmd from var-run-aesmd (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xjdmk (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
var-run-aesmd:
Type: HostPath (bare host directory volume)
Path: /var/run/aesmd
HostPathType:
kube-api-access-xjdmk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m55s default-scheduler Successfully assigned default/oe-helloworld-xpcvg to minikube
Normal Pulled 4m55s kubelet Container image "mcr.microsoft.com/acc/samples/oe-helloworld:1.1" already present on machine
Normal Created 4m55s kubelet Created container oe-helloworld
Normal Started 4m55s kubelet Started container oe-helloworld
intel-sgx-plugin doesn't have a section for nri-sgx-epc
$ kubectl describe pod intel-sgx-plugin-42txt -n inteldeviceplugins-sy
stem
Name: intel-sgx-plugin-42txt
Namespace: inteldeviceplugins-system
Priority: 0
Service Account: default
Node: minikube/192.168.49.2
Start Time: Tue, 07 Nov 2023 23:32:26 +0000
Labels: app=intel-sgx-plugin
controller-revision-hash=868bb58f4b
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 10.244.0.2
IPs:
IP: 10.244.0.2
Controlled By: DaemonSet/intel-sgx-plugin
Containers:
intel-sgx-plugin:
Container ID: containerd://dacad378e7115d25edc2fe9a67e799ee3a542e5d42caf920fe6087e119eac345
Image: intel/intel-sgx-plugin:0.28.0
Image ID: docker.io/intel/intel-sgx-plugin@sha256:51b768fb07611454d62b1833ecdbd09d41eeb7f257893193dab1f7e061f9c54c
Port: <none>
Host Port: <none>
Args:
-v
4
-enclave-limit
110
-provision-limit
110
State: Running
Started: Wed, 08 Nov 2023 00:36:11 +0000
Last State: Terminated
Reason: Unknown
Exit Code: 255
Started: Tue, 07 Nov 2023 23:32:28 +0000
Finished: Wed, 08 Nov 2023 00:35:44 +0000
Ready: True
Restart Count: 1
Environment: <none>
Mounts:
/dev/sgx_enclave from sgx-enclave (ro)
/dev/sgx_provision from sgx-provision (ro)
/var/lib/kubelet/device-plugins from kubeletsockets (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kubeletsockets:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/device-plugins
HostPathType:
sgx-enclave:
Type: HostPath (bare host directory volume)
Path: /dev/sgx_enclave
HostPathType: CharDevice
sgx-provision:
Type: HostPath (bare host directory volume)
Path: /dev/sgx_provision
HostPathType: CharDevice
QoS Class: BestEffort
Node-Selectors: intel.feature.node.kubernetes.io/sgx=true
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events: <none>
@CyanDevs You are most likely missing
$ make intel-deviceplugin-operator
$ docker save intel/intel-deviceplugin-operator:devel > op.tar
$ sudo ctr -n k8s.io i import op.tar
that is: make sure the operator deployment does not pull the image from dockerhub but it uses the custom image built from my devel branch
Hi @mythi I've successfully validated this on my end using an Azure VM. The issue was with using minikube and that inherently had issues running the NRI plugin (as well as the missing kernel.config for NFD). Once I switched to using kubeadm, these issues were non-existent and everything ran as expected.
$ cd ./kubepods-besteffort-pode845916d_a5eb_4abf_8c5c_e6d3a2d4f5b6.slice/cri-containerd-ac823861137eed2323214cefdc27b7295bbbaf4d55e4ee919e772fef133d02c3.scope
$ cat misc.max
sgx_epc 65536
I'm grateful for your guidance and prompt responses here. Thank you!
@CyanDevs Great to hear! Any suggestions where I could improve the documentation here other than clearly mention minikube is known not to work? I'm also about to add the steps to get cAdvisor set up for the telemetry piece.
Go ahead with more (stress) testing and let me and Haitao know if there are issues.
@mythi I sent you my notes that I wrote as I went through the steps. This guide is great. Some improvements I can think of is including notes for installing cert-manager and NFD -- I did not know this as I had never used intel-device-plugin before this. Thanks!
The SGX labeling rule is such that all conditions must match. You already saw the reason for why the label is not created. Also the kernel config needs to be there to get the label.
This is a minikube thing and I've heard people hitting the issue. One easy fix would be to drop that highlighted match rule. Is that minikube setup cgroupv2 enabled? My suggestion is not to use minikube but something almost as simple like
kubeadm
. It gives you a cluster without any docker layers.It's deployed together with the SGX device plugin: