Skip to content

Instantly share code, notes, and snippets.

@mythi
Last active April 17, 2024 05:05
Show Gist options
  • Save mythi/0c7381613510a72ed4810d826549290b to your computer and use it in GitHub Desktop.
Save mythi/0c7381613510a72ed4810d826549290b to your computer and use it in GitHub Desktop.
SGX EPC cgroups for Kubernetes
1. Prepare the kernel
git clone --depth 1 -b sgx_cg_upstream_v12 https://github.com/haitaohuang/linux.git linux-epc-cgroups
Added config:
CONFIG_CGROUP_SGX_EPC=y
2. Boot the VM and check SGX cgroups
host:$ qemu-system-x86_64 \
...
-object memory-backend-epc,id=mem1,size=64M,prealloc=on \
-M sgx-epc.0.memdev=mem1 \
-drive file=jammy.raw,if=virtio,aio=threads,format=raw,index=0,media=disk \
-kernel ./arch/x86_64/boot/bzImage \
...
guest:$ grep sgx_epc /sys/fs/cgroup/misc.capacity
sgx_epc 67108864
3. Setup (a single node) K8S cluster w/ containerd 1.7 and SGX EPC NRI plugin on Ubuntu 22.04
$ dpkg -l |grep containerd
ii containerd 1.7.2-0ubuntu1~22.04.1 amd64 daemon to control runC
# NB: config.toml: enable nri (disable = false), systemdCgroup = true
$ grep -A7 nri\.v1 /etc/containerd/config.toml
[plugins."io.containerd.nri.v1.nri"]
disable = false
disable_connections = false
plugin_config_path = "/etc/nri/conf.d"
plugin_path = "/opt/nri/plugins"
plugin_registration_timeout = "5s"
plugin_request_timeout = "2s"
socket_path = "/var/run/nri/nri.sock"
$ sudo ls /var/run/nri/
nri.sock
$ git clone -b PR-2023-050 https://github.com/mythi/intel-device-plugins-for-kubernetes.git
$ cd intel-device-plugins-for-kubernetes
$ make intel-deviceplugin-operator
$ docker save intel/intel-deviceplugin-operator:devel > op.tar
$ sudo ctr -n k8s.io i import op.tar
$ kubectl apply -k deployments/operator/default/
$ kubectl apply -f deployments/operator/samples/deviceplugin_v1_sgxdeviceplugin.yaml
4. Run
Use https://raw.githubusercontent.com/containers/nri-plugins/main/scripts/testing/kube-cgroups and run
watch -n 1 "./kube-cgroups -n 'sgxplugin-*' -f '(misc|memory).(max|current)'" -p 'sgx-epc-*'
(with the targeted namespace (-n) and podname filter (-p))
Run a pod requesting sgx.intel.com/epc: "65536"
5. e2e test framework
$ git clone -b PR-2023-050 https://github.com/mythi/intel-device-plugins-for-kubernetes.git
$ cd intel-device-plugins-for-kubernetes
$ make stress-ng-gramine intel-sgx-admissionwebhook
$ docker save intel/intel-sgx-admissionwebhook:devel > wh.tar
$ sudo ctr -n k8s.io i import wh.tar
$ docker save intel/stress-ng-gramine:devel > gr.tar
$ sudo ctr -n k8s.io i import gr.tar
$ go test -v ./test/e2e/... -ginkgo.v -ginkgo.focus "Device:sgx.*App:sgx-epc-cgroup"
NB: The e2e test framework expects cert-manager is deployed in the cluster
NB: The e2e test framework deletes all but kube-system and cert-manager namespaces before running the tests so do not run in a cluster with something important deployed!
@CyanDevs
Copy link

CyanDevs commented Nov 8, 2023

Awesome, removing the kernel.config labelling rule worked like magic and the sgx.intel.com/epc resource is now registered with the node. Not sure why the kernel config is missing. I'll give kubeadm a try later - thanks for the suggestion. Also - yes the minikube nodes are using cgroupv2.

My sgx_epc is not reporting the expected 65536 allocation though. It's probably because my nri plugin isn't running. I'll investigate why tomorrow.

Thanks for all your help @mythi, I appreciate it.

docker@minikube:/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podcfe87044_f090_498e_81e7_f8f032d945
9c.slice/cri-containerd-3b11506602acad6132b5da715a016d0a166c4aae541af7be5ccfe74b0e777581.scope$ cat misc.max
sgx_epc max

The job spec

apiVersion: batch/v1
kind: Job
metadata:
  name: oe-helloworld
  namespace: default
spec:
  template:
    metadata:
      labels:
        app: oe-helloworld
    spec:
      containers:
      - name: oe-helloworld
        image: mcr.microsoft.com/acc/samples/oe-helloworld:1.1
        command: [ "sleep", "infinity" ]
        resources:
          limits:
            sgx.intel.com/epc: "65536"
          requests:
            sgx.intel.com/epc: "65536"
        volumeMounts:
        - name: var-run-aesmd
          mountPath: /var/run/aesmd
      restartPolicy: "Never"
      volumes:
      - name: var-run-aesmd
        hostPath:
          path: /var/run/aesmd
  backoffLimit: 0

pod

$ kubectl describe pod oe-helloworld-xpcvg
Name:             oe-helloworld-xpcvg
Namespace:        default
Priority:         0
Service Account:  default
Node:             minikube/192.168.49.2
Start Time:       Wed, 08 Nov 2023 06:48:31 +0000
Labels:           app=oe-helloworld
                  batch.kubernetes.io/controller-uid=00df9e9d-ba40-482f-92c8-68dc1808745f
                  batch.kubernetes.io/job-name=oe-helloworld
                  controller-uid=00df9e9d-ba40-482f-92c8-68dc1808745f
                  job-name=oe-helloworld
Annotations:      sgx.intel.com/epc: 64Ki
Status:           Running
IP:               10.244.0.11
IPs:
  IP:           10.244.0.11
Controlled By:  Job/oe-helloworld
Containers:
  oe-helloworld:
    Container ID:  containerd://3b11506602acad6132b5da715a016d0a166c4aae541af7be5ccfe74b0e777581
    Image:         mcr.microsoft.com/acc/samples/oe-helloworld:1.1
    Image ID:      mcr.microsoft.com/acc/samples/oe-helloworld@sha256:64033ee002d17d69790398e4c272a9c467334a931ca0fb087b98b96b9f3be3db
    Port:          <none>
    Host Port:     <none>
    Command:
      sleep
      infinity
    State:          Running
      Started:      Wed, 08 Nov 2023 06:48:31 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      sgx.intel.com/enclave:  1
      sgx.intel.com/epc:      65536
    Requests:
      sgx.intel.com/enclave:  1
      sgx.intel.com/epc:      65536
    Environment:              <none>
    Mounts:
      /var/run/aesmd from var-run-aesmd (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xjdmk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  var-run-aesmd:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/aesmd
    HostPathType:
  kube-api-access-xjdmk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  4m55s  default-scheduler  Successfully assigned default/oe-helloworld-xpcvg to minikube
  Normal  Pulled     4m55s  kubelet            Container image "mcr.microsoft.com/acc/samples/oe-helloworld:1.1" already present on machine
  Normal  Created    4m55s  kubelet            Created container oe-helloworld
  Normal  Started    4m55s  kubelet            Started container oe-helloworld

intel-sgx-plugin doesn't have a section for nri-sgx-epc

$ kubectl describe pod intel-sgx-plugin-42txt -n inteldeviceplugins-sy
stem
Name:             intel-sgx-plugin-42txt
Namespace:        inteldeviceplugins-system
Priority:         0
Service Account:  default
Node:             minikube/192.168.49.2
Start Time:       Tue, 07 Nov 2023 23:32:26 +0000
Labels:           app=intel-sgx-plugin
                  controller-revision-hash=868bb58f4b
                  pod-template-generation=1
Annotations:      <none>
Status:           Running
IP:               10.244.0.2
IPs:
  IP:           10.244.0.2
Controlled By:  DaemonSet/intel-sgx-plugin
Containers:
  intel-sgx-plugin:
    Container ID:  containerd://dacad378e7115d25edc2fe9a67e799ee3a542e5d42caf920fe6087e119eac345
    Image:         intel/intel-sgx-plugin:0.28.0
    Image ID:      docker.io/intel/intel-sgx-plugin@sha256:51b768fb07611454d62b1833ecdbd09d41eeb7f257893193dab1f7e061f9c54c
    Port:          <none>
    Host Port:     <none>
    Args:
      -v
      4
      -enclave-limit
      110
      -provision-limit
      110
    State:          Running
      Started:      Wed, 08 Nov 2023 00:36:11 +0000
    Last State:     Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Tue, 07 Nov 2023 23:32:28 +0000
      Finished:     Wed, 08 Nov 2023 00:35:44 +0000
    Ready:          True
    Restart Count:  1
    Environment:    <none>
    Mounts:
      /dev/sgx_enclave from sgx-enclave (ro)
      /dev/sgx_provision from sgx-provision (ro)
      /var/lib/kubelet/device-plugins from kubeletsockets (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kubeletsockets:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/device-plugins
    HostPathType:
  sgx-enclave:
    Type:          HostPath (bare host directory volume)
    Path:          /dev/sgx_enclave
    HostPathType:  CharDevice
  sgx-provision:
    Type:          HostPath (bare host directory volume)
    Path:          /dev/sgx_provision
    HostPathType:  CharDevice
QoS Class:         BestEffort
Node-Selectors:    intel.feature.node.kubernetes.io/sgx=true
Tolerations:       node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                   node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                   node.kubernetes.io/not-ready:NoExecute op=Exists
                   node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                   node.kubernetes.io/unreachable:NoExecute op=Exists
                   node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:            <none>

@mythi
Copy link
Author

mythi commented Nov 8, 2023

@CyanDevs You are most likely missing

$ make intel-deviceplugin-operator
$ docker save intel/intel-deviceplugin-operator:devel > op.tar
$ sudo ctr -n k8s.io i import op.tar

that is: make sure the operator deployment does not pull the image from dockerhub but it uses the custom image built from my devel branch

@CyanDevs
Copy link

CyanDevs commented Nov 9, 2023

Hi @mythi I've successfully validated this on my end using an Azure VM. The issue was with using minikube and that inherently had issues running the NRI plugin (as well as the missing kernel.config for NFD). Once I switched to using kubeadm, these issues were non-existent and everything ran as expected.

$ cd ./kubepods-besteffort-pode845916d_a5eb_4abf_8c5c_e6d3a2d4f5b6.slice/cri-containerd-ac823861137eed2323214cefdc27b7295bbbaf4d55e4ee919e772fef133d02c3.scope
$ cat misc.max
sgx_epc 65536

I'm grateful for your guidance and prompt responses here. Thank you!

@mythi
Copy link
Author

mythi commented Nov 9, 2023

@CyanDevs Great to hear! Any suggestions where I could improve the documentation here other than clearly mention minikube is known not to work? I'm also about to add the steps to get cAdvisor set up for the telemetry piece.

Go ahead with more (stress) testing and let me and Haitao know if there are issues.

@CyanDevs
Copy link

@mythi I sent you my notes that I wrote as I went through the steps. This guide is great. Some improvements I can think of is including notes for installing cert-manager and NFD -- I did not know this as I had never used intel-device-plugin before this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment