Skip to content

Instantly share code, notes, and snippets.

@chrw
Forked from abhioncbr/eviction1.md
Created June 24, 2020 13:08
Show Gist options
  • Save chrw/d0096e0846409d0158e9b4af9a5117c2 to your computer and use it in GitHub Desktop.
Save chrw/d0096e0846409d0158e9b4af9a5117c2 to your computer and use it in GitHub Desktop.

It was the second day of the long weekend; I was watching Money Heist on Netflix (a good one to watch, free recommendation by a human), and in-between, I got the slack notification on one channel, "Is something wrong with our application?" By the time I will start my MacBook to check, another slack message "application is not running, pod status says Evicted." Luckily, the application was in a pre-prod environment, but we want pod up & running. It looks like an effortless task; restart the pod, right? We did, pod evicted again in some time. The application code was simple and shouldn't be consuming many resources of the K8s cluster. Frankly, before this, we haven't encountered such an issue, and we weren't sure about the solution, and there start a saga of pod eviction resolution. Eviction is a complex topic, many factors, objects and policies involved; Let's get started with basics and slowly building the related corresponding concepts. In this first post, I am targeting to cover the pod phase, status, container status, state, some commands, sample K8 object to imitate the eviction and QoS class of the pods and correlating all of them.

What do you mean by Evicted?

Evicted, is it a pod phase or status? How pod status is related to container status, and what is the container state? Hmm, so many different terms, I am perplexed right now. Don't worry; we will try to follow all of them, let's start with the easiest and most generic one first, i.e. pod phase. It is a simple, high-level summary of where the pod is in its lifecycle. Below are the five possible pod phase values

  • Pending: accepted by the Kubernetes, either not scheduled by the Kube-scheduler or downloading the container image.

  • Running: pod bounded to a node, and all the containers are running.

  • Failed: all containers in the pod have terminated, and at least one container execution leads to failure. Failure means task ends with a non-zero status, or the Kubernetes system killed it.

  • Succeeded: all containers terminated with zero status, and the pod will not restart.

  • Unknown: failed to determine the state of the pod.

From the programmatical perspective, phase is just a String type instance variable in object PodStatus. The below sketch depicts the relationship between various object types.

Let's explore the next one, i.e. pod status(PodStatus). PodStatus is an aggregated view of all the containers within a pod. PodStatus object consists of a bunch of array instance variables based on ContainerStatus. Also, it worth noting some other important instance variables like reason & qosClass. Finally, I want to highlight two important points

  • The information reported as Pod status depends on the current ContainerState.

  • The instance variable reason value is a brief CamelCase message indicating details about why the pod is in this state. e.g. Evicted.

Hold on, did you read the last line thoroughly? Read it once more and keep a note of it. Now, we are moving further to explore container status and state. ContainerStatus object represents complete information about a container in a pod. Among several instance variables, it's worth to notice state one; data type of state is ContainerState. There are only three permissible value of container state,

  • Waiting: the default state of the container. Pulling an image, applying a secret, and several other operations happen in this state.

  • Running: indicates the container is running without any issue.

  • Terminated: container completed the execution and stopped running. Either the container execution is successful, i.e. zero exit code or failed because of some issue. Object ContainerStateTerminated represents the terminated state. This object has exitCode & reason instance variables, which show what happened with the container. Evicted is one of the values of the reason variable. Here is the source code for the Evicted container state.

If you are with me till now, I am sure you guys understand what exactly Evicted is. Time for a ten seconds break.

Command to get evicted pods

kubectl get po -a --all-namespaces -o json | jq  '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) 

Simulate pods eviction

Code file will create the following Kubernetes objects

  • evict-example namespace.
  • high-priority PriorityClass.
  • pod-with-defined-resources-example Deployment with 3 replicas, each with resource request & limits 570Mi
  • pod-oom-killed-example pod.
  • pod-evict-burstable-example pod.
  • pod-evict-best-effort-example pod.

command for creating & deleting objects:

kubectl apply -f pod-evict-resources.yaml # for creation of objects.
kubectl get pods -n evict-example         # for listing all pods.
kubectl delete all --all -n evict-example # for deletion of objects.

All this resources are created with perspective of simulating eviction in a node having approx 2GB of memory resources. Please adjust the memory values as per your need. Also, image used for the container on pod is simple python based abhioncbr/pod-evict. Note: python script contains infinite loop for conionous more memory usage

How Kubelet decide to evict the pod?

Kubelet evicts pods based on the QoS. What the hell is QoS? Wait, I noticed a variable qosClass in object PodStatus. Is it the same? Excellent observation, yes, it is. Okay, let's explore it further, what does it mean and what part it played in pod eviction, but before it, we have to visit resources request and limit for the pods.

  • Requests: it is the minimum resources required by the container. If an available node resource is less than the request values, the pod will stay Pending. Requests are at the container level, and if two more containers are present, than the total resources required will be the sum of both container request.

  • Limits: it is the maximum resources of a node that a pod can utilize. If a container tries to consume more than the mentioned limits, pod gets terminated with error reason OOMKilled or CPU throttled.

Based on the presence, non-presence of the resources and their values, Kubernetes assign three different classes of QoS to the pod. The following are the three classes.-

  • Guaranteed: all the containers of a pod have requests and limits, and their values request and limit values are equal too.

  • Burstable: at least one container in a pod with either request or limits values and not satisfying the guaranteed class condition is burstable.

  • BestEffort: all containers in a pod without any requests and limits configuration falls into the BestEffort class.

If you check pods created above in simluation step, all three types of pods are present. You can check by using the following command.

kubectl get pods -n evict-example
kubectl describe pod pod-with-defined-resources-XXXXXXX
kubectl describe pod pod-evict-burstable-example
kubectl describe pod pod-evict-best-effort-example

Cool, we made the understanding of resources and limits and how Kubernetes determines the QoS class of a pod. Only one last thing left, how it is related to eviction?

Kubelet evicts pod in series, first the BestEffort one than Burstable one and in last Guaranteed one. In our case, our application pod was of Burstable type. We converted it into the Guaranteed and saved our evening of the long weekend.

Our quest for finding the details about why Kubelet started pod eviction continued for several weeks. I know this post turned to be a long one, and I am concluding it here. In my next post, I will be covering more about soft and hard eviction thresholds signals, pod disruption budget, priority class, resource quotas, limit ranges and more. Thanks for reading and looking for feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment