Skip to content

Instantly share code, notes, and snippets.

@svantelidman
Created December 2, 2018 18:13
Show Gist options
  • Save svantelidman/eb2cddbcc86c0e5caae2e03dac9927b4 to your computer and use it in GitHub Desktop.
Save svantelidman/eb2cddbcc86c0e5caae2e03dac9927b4 to your computer and use it in GitHub Desktop.
What do the resource requests and limits mean in OpenShift and how might they effect your Java application?

What do the resource requests and limits mean in OpenShift and how might they effect your Java application?

Java resource management

To the extent it is implementation specific the discussion below is based on the OpenJDK HotSpot runtime.

Threads

The JVM maps threads in Java to OS threads 1:1 , i.e., there are no "green threads" in Java like there are in some other virtual machines, for example BEAM, the Erlang virtual machine.

A thread in the JVM can be in one of the following states:

  • thread_new: a new thread in the process of being initialized
  • thread_in_Java: a thread that is executing Java code
  • thread_in_vm: a thread that is executing inside the VM
  • thread_blocked: the thread is blocked for some reason (acquiring a lock, waiting for a condition, sleeping, performing a blocking I/O operation and so forth)

In addition to the application threads the JVM spawns a number of internal threads where the most important ones are:

  • The VM thread
  • GC threads
  • JIT threads

On a high level you can say that the VM thread coordinates the other threads, for example that all application threads are blocked durring a stop-the-world GC which can happen with all the commonly used garabage collectors.

Garbage collection and threads

Both the CMS and the G1 garbage collectors have concurrent and non-concurrent phases. They are referred to as mostly concurrent garbage collectors. Concurrent means concurrent to running application threads. A full GC with compaction is non-concurrent for both CMS and G1.

Typically the JVM will assign the number of threads for the concurrent phases automatically. For systems with less than 8 physical core it will assign one thread per physical core by default. This can be overriden by passing command line options to the JVM.

On a CPU-constrained system there is a risk that the lower priority concurrent GC processing lags behind. Then it seems that there will be an incresed risk and frequency of stop-the-world GC events where all application threads are paused. Regardless there seems to be no situation in which lack of GC thread priority can hang processing inderminately. It will just mean that you are likely to see more stop-the-world GC phases. We may want to tune how we allocate GC threads as a consequence of this. Here is an excerpt from one of the references below.

We prevent our container from being throttled prematurely and permits more opportunities for our application threads to execute by limiting the number of JVM threads to at most the number of cores available. Our base Docker image automatically detects the resources available to the container and tunes the JVM accordingly at start time. Setting the flags -XX:ParallelGCThreads, -XX:ConcGCThreads, and -Djava.util.concurrent.ForkJoinPool.common.parallelism prevents many unnecessary pauses. However, many JVM components rely on Runtime.getRuntime.availableProcessors() which still returns the number of physical cores available. To overcome this, we compile and load a C library that overrides the JVM native function JVM_ActiveProcessorCount and returns our calculated value instead. This gives us complete control to limit all dynamically scalable aspects of the JVM without performance penalties.

The Linux Scheduler - Completely Fair Scheduler (CFS)

CFS handles the scheduling of tasks and both processes and threads are tasks. CFS ensures that all tasks eventually gets scheduled. All tasks gets equally sized time slices but lower priority tasks will get time slices less often compared to higher priority tasks on a CPU constrained system. This is controlled by the relative weight of the task which by default is 1024 (translates roughly to 1024 / (1.25)^ (nice)) and which determines how often the task will get scheduled in relation to other tasks that share the same CPU (or CPU quota in the cgroup case, see below).

Cgroups, Docker, and OpenShift

You may know that Docker utilizes Linux cgroups and namespaces to do its magic and that OpenShift/Kubernetes orchestrates the execution of Docker container across a cluster of machines called worker nodes. But how does it all fit together and what do the different settings in Openshift/Kubernetes really mean?

Cgroups

Docker provides a way to run what is known as a Docker image in an isolated environment called a container. Typically you will run a single process inside your container and the image specifies the full environment that the process will run in including operation system, libraries and so on. The Linux kernel on the host system is however common to all containers running on the system. On Linux Docker use the cgroups and namespaces to achieve this. Here we will focus on the cgroup ascpect of things.

Cgroups, short for resource control groups (from the beginning call process containers) has been part of the Linux kernel since 2008 and provides a mechanism to isolate, track and limit the resource usage (CPU, memory, I/O etc.) of a group of processes.

Docker uses cgroups on Linux to make it possible to set resource constraints for a particular container. Cgroups are arranged in a tree like structure and typically systemd is the root Cgroup where one of the children is the System Cgroup where the Docker dameon, dockerd, will run and all containers will then be arranged as cgroups below the dockerd cgroup. But what are the different constraints that can be set for a cgroup and what do they mean?

  • For memory you can set a soft and a hard limit but it is the hard limit that really matters. If your container exceeds the hard limit it is subject to be killed without any compromise. The soft limit is basically a warning that does not have any effect apart from raising the warning itself.
  • For CPU you can set a value called share per cgroup which is a weight in relation to the all other cgroups that have the same parent cgroup. The default value for this weight is 1024.

Docker on Linux

Setting CPU Limits

Setting Docker CPU limits on Linux maps directly to Cgroups as described above. The Docker command lines options that we need to consider are:

Option Description
--cpus=<value> Specify how much of the available CPU resources a container can use. For instance, if the host machine has two CPUs and you set --cpus="1.5", the container is guaranteed at most one and a half of the CPUs. This is the equivalent of setting --cpu-period="100000" and --cpu-quota="150000". Available in Docker 1.13 and higher.
--cpu-period=<value> Specify the CPU CFS scheduler period, which is used alongside --cpu-quota. Defaults to 100 micro-seconds. Most users do not change this from the default. If you use Docker 1.13 or higher, use --cpus instead.
--cpu-quota=<value> Impose a CPU CFS quota on the container. The number of microseconds per --cpu-period that the container is limited to before throttled. As such acting as the effective ceiling. If you use Docker 1.13 or higher, use --cpus instead.
--cpuset-cpus Limit the specific CPUs or cores a container can use. A comma-separated list or hyphen-separated range of CPUs a container can use, if you have more than one CPU. The first CPU is numbered 0. A valid value might be 0-3 (to use the first, second, third, and fourth CPU) or 1,3 (to use the second and fourth CPU).
--cpu-shares Set this flag to a value greater or less than the default of 1024 to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles. This is only enforced when CPU cycles are constrained. When plenty of CPU cycles are available, all containers use as much CPU as they need. In that way, this is a soft limit. --cpu-shares does not prevent containers from being scheduled in swarm mode. It prioritizes container CPU resources for the available CPU cycles. It does not guarantee or reserve any specific CPU access.

Openshift

Limitation on containers

For each container in a pod you can set how much memory and cpu the container requests and also how much memory and cpu the container is limited to.

CPU values are specified in millicores and memory is typically specified in megabytes or gigabytes.

CPU

Requested CPU is not a hard limit but is an indication to the Kubernetes scheduler as to on which working node in the cluster it might be good to place the pod. If the node is not CPU-constrained it does however not place any upper bound on the CPU it consumes. It does however provide a guruantuee that the node will get at least this amount of CPU. This setting maps to CPU-shares of Cgroups which then are used by the CFS scheduler to determine how often the Cgroup will get a slice of the available CPU resources.

CPU Limit on the other hand is a limit. If a container attempts to exceed the limit it will get throttled even if there are spare CPU slots available on the working node. Setting a limit will make the performance of the service/pod more predictable and not dependent on what else is scheduled on the working node.

Memory

Similarly as for CPU, requested memory is a hint to the Kubernetes scheduler to decide on which working node the pod should be placed. If the pod exeeds the memory limit (if set) then the pod will likely be killed and restarted dependent on container restart policy.

Quotas for projects

You can set quotas per Openshift project for a number of different things like number of config maps, persistent volume claims, and so on. Here we are however only discussing quota for memory and CPU.

Quotas are expressed as the limit for the sum of requested CPU and memory across all pods in the project and the sum of the limits for CPU and memory across all pods in the project. Note that these limits are not applied in runtime but enforcement of the quota is done by stopping the creation of new resources that would violate the quota constraint.

References

Threads in the OpenJDK HotSpot JVM

G1 Garbage Collector

CMS Garbage Collector

Completely Fair Scheduler - 15 min Video Overview

Cgroup basics

Here is a great article that brings Java and containers together. Beware that this may have been fixed in RHEL though as described in the next link.

Nobody puts Java in a container

This article contains a similar discussion focusing around tuning GC threads on startup when running Java 8 apps in Kubernetes.

Understanding Linux Container Scheduling

Openshift Container Compute Resources

Openshift Quotas and Limit Ranges

Linux nice command

@larkly
Copy link

larkly commented Dec 3, 2018

An important aspect of Openshift limits and requests is how the QoS tiers are calculated based on defined limits and requests. When no requests or limits are defined, the QoS tier will be BestEffort. When requests are defined with a lower value than limits, it will have a higher priority QoS tier called Burstable. If the request and limit are of the same value, the QoS tier will be Guaranteed.

QoS Burstable will allow a pod to use more resources than what is limited, but resources in excess of the limit will be shared with all other containers based on available resources and their QoS priority. not allow a pod to use more resources than what is limited, but if limits are undefined, it will be allowed to burst up to the resources available on the node, sharing resources equally with other burstable deployments.

https://docs.openshift.com/container-platform/3.7/dev_guide/compute_resources.html#quality-of-service-tiers

@atgper
Copy link

atgper commented Dec 6, 2018

Apparently Java 10 behaves better in containers so perhaps we should just aim for that?

@atgper
Copy link

atgper commented Dec 6, 2018

Or let's say Java 11 as it is the LTS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment