Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
What happens when resource limit and request are set...

This research focuses on what happens inside of Kubernetes when you apply a Pod with Limit and Request in source code level.

Basically, when you apply kubectl apply -f pod.yaml, it will be transformed into JSON format and will be sent to kube-api-server in your kubectl cli, unless you use server-side-apply but it's a different story for now.

When kube-api-server gets the request, it stores the Pod spec to etcd as desired state. This is where our story begins.

The parameter flow is the following order.

Pod spec -> kubelet -> CRI Runtime -> OCI runtime -> cgroups


Kubelet has SyncPod process which literally syncs each Pod spec to existing container. kubelet process runs on each node and it gets all pod information that has the same nodename as itself.

func (m *kubeGenericRuntimeManager) SyncPod()

Pod creation is defined in Step 7 in this function, so let's jump there.

	// Step 7: start containers in podContainerChanges.ContainersToStart.
	for _, idx := range podContainerChanges.ContainersToStart {
		start("container", containerStartSpec(&pod.Spec.Containers[idx]))

This start() being called exists right above this as helper.

start := func(typeName string, spec *startSpec) error {
    // NOTE (aramase) podIPs are populated for single stack and dual stack clusters. Send only podIPs.
    if msg, err := m.startContainer(podSandboxID, podSandboxConfig, spec, pod, podStatus, pullSecrets, podIP, podIPs); err != nil {
        startContainerResult.Fail(err, msg)

This helper calls startContainer that exists in kuberuntime_container.go

func (m *kubeGenericRuntimeManager) startContainer(podSandboxID string, podSandboxConfig *runtimeapi.PodSandboxConfig, spec *startSpec, pod *v1.Pod, podStatus *kubecontainer.PodStatus, pullSecrets []v1.Secret, podIP string, podIPs []string) (string, error) {
	// Step 2: create the container.
	// For a new container, the RestartCount should be 0
	restartCount := 0
	containerStatus := podStatus.FindContainerStatusByName(container.Name)
	if containerStatus != nil {
		restartCount = containerStatus.RestartCount + 1


	containerConfig, cleanupAction, err := m.generateContainerConfig(container, pod, restartCount, podIP, imageRef, podIPs, target)
	if cleanupAction != nil {
		defer cleanupAction()

OK, generateContainerConfig seems to be the config that will be passed to CRI. I found cpuRequest and cpuLimit here.

var cpuShares int64
cpuRequest := container.Resources.Requests.Cpu()
cpuLimit := container.Resources.Limits.Cpu()
memoryLimit := container.Resources.Limits.Memory().Value()
oomScoreAdj := int64(qos.GetContainerOOMScoreAdjust(pod, container,
if m.cpuCFSQuota {
  // if cpuLimit.Amount is nil, then the appropriate default value is returned
  // to allow full usage of cpu resource.
  cpuPeriod := int64(quotaPeriod)
  if utilfeature.DefaultFeatureGate.Enabled(kubefeatures.CPUCFSQuotaPeriod) {
    cpuPeriod = int64(m.cpuCFSQuotaPeriod.Duration / time.Microsecond)
  cpuQuota := milliCPUToQuota(cpuLimit.MilliValue(), cpuPeriod)
  lc.Resources.CpuQuota = cpuQuota
  lc.Resources.CpuPeriod = cpuPeriod

OK, so these CpuQuota and CpuPeriod seem to be set in the config. milliCPUToQuota simply convert CPU numbers with each CPU cycle that are used in Cgroups.


func milliCPUToQuota(milliCPU int64, period int64) (quota int64) {
	// CFS quota is measured in two values:
	//  - cfs_period_us=100ms (the amount of time to measure usage across)
	//  - cfs_quota=20ms (the amount of cpu time allowed to be used across a period)
	// so in the above example, you are limited to 20% of a single CPU
	// for multi-cpu environments, you just scale equivalent amounts
	// see for details
	if milliCPU == 0 {

	// we then convert your milliCPU to a value normalized over a period
	quota = (milliCPU * period) / milliCPUToCPU

	// quota needs to be a minimum of 1ms.
	if quota < minQuotaPeriod {
		quota = minQuotaPeriod


FYI: CFS quota is a process scheduler in Linux kernel

It basically converts from "Human readable request and limit to cgroups friendly value Quota in milliseconds.

Appendix: This implementation came from and now it's part of libcontainer which is the has core logic of runc.

I let you know now, but this CpuQuota and CpuPeriod are used in runc too. So basically this value won't be converted and used as it is.

In short, kubelet converts Request and Limit to cgroups value. It will just be passed to CRI/OCI runtime.

How kubelet calls CRI

Anyways, finally kubelet calls CreateContainer() to call CRI runtime in gRPC.


Let's see what containerd does as a CRI example.

Here we start.

OK so this is where container is created? :thinking_face:

Actually, I was stuck here. So I asked a question in containerd channel on CNCF Slack.

So we need to find NewTask...

Looks like c.client.TaskService().Create(ctx, request) in the following code is where containerd calls OCI runtime.

OK, so we need to find Create() in runc now.


Here we come.

Huh, so you call struct pointer and that's it? I don't really understand what this does here, but it doesn't really matter to know Request and Limit for now.

c := &linuxContainer{
  id:            id,
  root:          containerRoot,
  config:        config,
  initPath:      l.InitPath,
  initArgs:      l.InitArgs,
  criuPath:      l.CriuPath,
  newuidmapPath: l.NewuidmapPath,
  newgidmapPath: l.NewgidmapPath,
  cgroupManager: l.NewCgroupsManager(config.Cgroups, nil),
if intelrdt.IsCATEnabled() || intelrdt.IsMBAEnabled() {
  c.intelRdtManager = l.NewIntelRdtManager(config, id, "")
c.state = &stoppedState{c: c}
return c, nil
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment