Skip to content

Instantly share code, notes, and snippets.

@buzztaiki
Created January 18, 2023 02:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save buzztaiki/6adefb2a77cecf318bb68e8d8723cf43 to your computer and use it in GitHub Desktop.
Save buzztaiki/6adefb2a77cecf318bb68e8d8723cf43 to your computer and use it in GitHub Desktop.
exec probe と timeout メモ

exec probe と timeout メモ

以下の deployment のとき

---
apiVersion: v1
kind: Pod
  metadata:
    labels:
      app: myapp
  spec:
    containers:
      - name: myapp
        image: busybox
        command: [tail, -f, /dev/null]
        livenessProbe:
          exec:
            command: [sleep, 5]
          timeoutSeconds: 2
          periodSeconds: 10
          failureThreshold: 3
    terminationGracePeriodSeconds: 0

以下の events になって、再起動してくれない

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Pulling    4m47s                 kubelet            Pulling image "busybox"
  Normal   Pulled     4m45s                 kubelet            Successfully pulled image "busybox" in 1.958676639s
  Normal   Created    4m45s                 kubelet            Created container myapp
  Normal   Started    4m45s                 kubelet            Started container myapp
  Warning  Unhealthy  72s (x21 over 4m32s)  kubelet            Liveness probe errored: Rpc error: code = Unknown desc = deadline exceeded ("DeadlineExceeded"): context deadline exceeded

ps すると sleep が最大で3人くらい居たりする。なんでや。

普通の場合

command を ["false"] にすると以下の events になって普通に restart する

Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Pulled     2m7s                kubelet            Successfully pulled image "busybox" in 1.866816854s
  Normal   Pulled     95s                 kubelet            Successfully pulled image "busybox" in 1.943729073s
  Normal   Created    65s (x3 over 2m7s)  kubelet            Created container myapp
  Normal   Started    65s (x3 over 2m7s)  kubelet            Started container myapp
  Normal   Pulled     65s                 kubelet            Successfully pulled image "busybox" in 1.865027888s
  Warning  Unhealthy  39s (x9 over 119s)  kubelet            Liveness probe failed:
  Normal   Killing    39s (x3 over 99s)   kubelet            Container myapp failed liveness probe, will be restarted
  Normal   Pulling    37s (x4 over 2m9s)  kubelet            Pulling image "busybox"

ソース読む

ExecProbeTimeout feature が有効なら失敗になって restart してくれそうなんだが。

https://github.com/kubernetes/kubernetes/blob/7f8be71148f5461df9ae61b011c732d0ba2f551c/pkg/probe/exec/exec.go#L74-L77

			if utilfeature.DefaultFeatureGate.Enabled(features.ExecProbeTimeout) {
				// When exec probe timeout, data is empty, so we should return timeoutErr.Error() as the stdout.
				return probe.Failure, timeoutErr.Error(), nil
			}

https://github.com/kubernetes/kubernetes/blob/a12b886b1da059e0190c54d09c5eab5219dd7acf/pkg/features/kube_features.go#L939

	ExecProbeTimeout:                               {Default: true, PreRelease: featuregate.GA}, // lock to default and remove after v1.22 based on KEP #1972 update

errored だろうが、failed だろうが failure は返してる https://github.com/kubernetes/kubernetes/blob/2f2240400391add53983c9c04cb91ec8a8df5c67/pkg/kubelet/prober/prober.go#L105-L111

		if err != nil {
			klog.V(1).ErrorS(err, "Probe errored", "probeType", probeType, "pod", klog.KObj(pod), "podUID", pod.UID, "containerName", container.Name)
			pb.recordContainerEvent(pod, &container, v1.EventTypeWarning, events.ContainerUnhealthy, "%s probe errored: %v", probeType, err)
		} else { // result != probe.Success
			klog.V(1).InfoS("Probe failed", "probeType", probeType, "pod", klog.KObj(pod), "podUID", pod.UID, "containerName", container.Name, "probeResult", result, "output", output)
			pb.recordContainerEvent(pod, &container, v1.EventTypeWarning, events.ContainerUnhealthy, "%s probe failed: %s", probeType, output)
		}

なんとなく、probe probess が残ってると restart しないって実装がどこかにありそうな気がしてる。

KEP と実装 PR

こんな事が書いてある。プロセスが残るのはこれが理由?

Non-Goals

  • ensuring exec processes that timed out have been killed by kubelet.
  • introducing CRI errors for handling scenarios such as time ou

PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment