The kstatus document describes a few concepts related to the status conditions and a few standard conditions that controllers can implement. This document tries to provide more details about the status conditions with various examples to describe the mechanics of how the conditions work, what they mean and what we can expected from them.
Starting with an example of status conditions:
status:
conditions:
- type: DiskPressure
status: "False"
reason: KubeletHasNoDiskPressure
message: kubelet has no disk pressure
lastHeartbeatTime: "2019-11-17T14:18:26Z"
lastTransitionTime: "2019-10-22T16:27:53Z"
- type: Ready
status: "True"
reason: KubeletReady
message: kubelet is posting ready status. AppArmor enabled
lastHeartbeatTime: "2019-11-17T14:18:26Z"
lastTransitionTime: "2019-10-22T16:27:53Z"
The above example status contains two conditions of type DiskPressure
and
Ready
. The status
field in the conditions indicates the status of the
condition, either True
or False
. reason
is a short, one-word, description
of the reason behind the condition. message
is a human readable phrase or
sentence, which may contain details about the condition.
The status conditions are a way to communicate to others about the status of an
object. An observer should be able to read the status of an object it's
interested in and know the state of the object. The kstatus status
and
polling
packages provide higher level APIs for a client to read
object status. The status conditions are machine readable.
The conditions defined in the kstatus document and packages are designed to adhere to the "abnormal-true" polarity pattern.
In the above example conditions, Ready
is an example of
positive-polarity/normal-true condition.
DiskPressure
is an example of a negative-polarity/abnormal-true condition.
When the status
is True
, it indicates that something isn't right, the
system is not in a normal state. The negative-polarity/abnormal-true conditions
are present and with a value of true whenever something unusual happens.
Absence of such conditions means that everything is normal.
NOTE: The above example is not based on kstatus, hence the
negative-polarity condition DiskPressure
with value False
is present in the
status. As described above, it should not be present when it's false.
Kubernetes native objects don't implement kstatus conventions at the moment.
The kstatus document describes using generation
and observedGeneration
to
indicate any discrepancies in the reconciliation of the objects. The
generation
specifies the generation of the configuration or the
desired state of a resource and is part of the object metadata. The kubernetes
api-conventions describes it as:
generation: a sequence number representing a specific generation of the desired state. Set by the system and monotonically increasing, per-resource. May be compared, such as for RAW and WAW consistency.
The observedGeneration
specifies the generation of the configuration or the
desired state that has been fully reconciled with the actual state. It is not a
built-in property of object status. It needs to be added in the status API of
objects.
Example:
kind: HelmRepository
apiVersion: source.toolkit.fluxcd.io/v1beta1
metadata:
...
generation: 1
name: podinfo
namespace: default
spec:
...
status:
...
observedGeneration: 1
In the above example, metadata.generation
is the HelmRepository
object's
generation
and status.observedGeneration
is the observedGeneration
. Since
they are equal here, it means that the desired state matches with the actual
state of the Helm repository.
The Ready
condition shows if everything is normal or if there's some
abnormality due to which an object isn't in the normal state.
As per the kstatus document:
Reconciling: The controller is currently working on reconciling the latest changes.
and:
Stalled: The controller has encountered an error during the reconcile process or it has made insufficient progress (timeout).
Ready
is a positive-polarity condition.
Reconciling
and Stalled
are negative-polarity conditions.
Based on the above definitions, we can define some rules about how to use these three conditions.
Reconciling
andStalled
conditions are mutually exclusive to each other. They both can't be present at the same time.- In presence of either
Reconciling
orStalled
conditions, theReady
condition will have false value, since reconciling and stalled indicate abnormality.
Some general rules for the Reconciling
condition:
Reconciling
condition can be set when a new generation of configuration is available. This can be detected by comparing the objectgeneration
and the statusobservedGeneration
.Reconciling
condition should be removed on a successful reconciliation, the desired state matches with the actual state.Reconciling
condition can be set when a domain specific drift is detected. This is when an object depends on another object or system and they change, resulting in the object to be reconciled to match the desired state.Reconciling
condition should be set and persist across reconciliations during long running operations which may consist of multiple retries.
Some general rules for the Stalled
condition:
Stalled
condition can be set when the controller is sure about some misconfiguration that can't be resolved upon retry. Such situations require the desired configuration to be updated with a new generation.- When
Stalled
condition is determined, it means that the provided object generation is reconciled successfully and it resulted in a stalled state. TheobservedGeneration
can be updated to be equal to the currentgeneration
.
Along with Ready
, Reconciling
and Stalled
conditions, controllers may add
extra conditions to provide extra information about the status of the objects.
Following are examples of a few scenarios with various status conditions. They are meant to show and describe the mechanics of status conditions and observedGeneration and how they behave with change in status. The examples are based on the flux helm repository reconciler.
NOTE: The scenarios are not strictly sequential. There may be cases where a scenario may appear to be based on the previous scenario.
Failure on first reconciliation:
status:
conditions:
- lastTransitionTime: "2021-12-17T11:53:35Z"
message: Reconciling new generation 1
observedGeneration: 1
reason: NewGeneration
status: "True"
type: Reconciling
- lastTransitionTime: "2021-12-17T11:53:36Z"
message: 'Failed to get secret ''default/my-secret'': Secret "my-secret" not found'
observedGeneration: 1
reason: AuthenticationFailed
status: "False"
type: Ready
- lastTransitionTime: "2021-12-17T11:53:36Z"
message: 'Failed to get secret ''default/my-secret'': Secret "my-secret" not found'
observedGeneration: 1
reason: AuthenticationFailed
status: "True"
type: FetchFailed
The above shows status conditions resulting from a failure in the first
reconciliation of the provided configuration. Since it's the first generation
of configuration and it failed to reconcile successfully, there's no
status.observedGeneration
set. The individual status conditions have
observedGeneration
with the value of the generation that caused those
conditions to be set, in this case 1
.
The Reconciling
condition specifies that the reason for reconciliation is a
new generation of object configuration.
The FetchFailed
condition specifies the reason for failure.
The Ready
condition specifies the actual cause of the overall failure, for
which it shows the same reason
and message
as in the FetchFailed
condition.
In this scenario, the failure seems to be due to a missing secret reference.
This could be due to accidental deletion of a secret or the secret hasn't been
created yet. So, the controller can retry looking for the secret until it
becomes available. The object continues to be Reconciling
.
If the object configuration is updated to refer to another secret that actually exists, it would result in a successful reconciliation. But since the configuration is updated, it'd result in a new generation. Following is the status after a successful reconciliation with new configuration:
status:
artifact:
checksum: 83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111
lastUpdateTime: "2021-12-17T11:55:22Z"
path: helmrepository/default/podinfo/index-83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111.yaml
revision: 83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111
url: http://localhost:9090/helmrepository/default/podinfo/index-83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111.yaml
conditions:
- lastTransitionTime: "2021-12-17T11:55:22Z"
message: Stored artifact for revision '83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111'
observedGeneration: 2
reason: Succeeded
status: "True"
type: Ready
observedGeneration: 2
url: http://localhost:9090/helmrepository/default/podinfo/index.yaml
This status condition shows a normal state, without any abnormal conditions.
All the other abnormal conditions that existed before have been removed because
they are no longer true, absent when things are normal. The ready condition has
observedGeneration
value of 2
, which is also the value of
status.observedGeneration
. This means that the desired state matches the
actual state.
Failure introduced by an update in the configuration.
status:
artifact:
checksum: 83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111
lastUpdateTime: "2021-12-17T11:55:22Z"
path: helmrepository/default/podinfo/index-83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111.yaml
revision: 83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111
url: http://localhost:9090/helmrepository/default/podinfo/index-83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111.yaml
conditions:
- lastTransitionTime: "2021-12-17T11:56:14Z"
message: Reconciling new generation 3
observedGeneration: 3
reason: NewGeneration
status: "True"
type: Reconciling
- lastTransitionTime: "2021-12-17T11:56:14Z"
message: 'Failed to get secret ''default/my-secret'': Secret "my-secret" not found'
observedGeneration: 3
reason: AuthenticationFailed
status: "False"
type: Ready
- lastTransitionTime: "2021-12-17T11:56:14Z"
message: 'Failed to get secret ''default/my-secret'': Secret "my-secret" not found'
observedGeneration: 3
reason: AuthenticationFailed
status: "True"
type: FetchFailed
observedGeneration: 2
url: http://localhost:9090/helmrepository/default/podinfo/index.yaml
The above shows status conditions similar to Scenario 1, but the
status.observedGeneration
is set to 2
and the observedGeneration
in the
individual conditions are 3
. This means that the object was in normal state
before in generation 2 of the configuration. Reconciling generation 3 of the
configuration resulted in some abnormal state. The reason for failure is
similar to the failure in scenario 1, reference to a secret that does not
exist. The new generation may have updated the secret reference.
Initially, when everything is normal, status contains the happy conditions:
status:
artifact:
checksum: 83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111
lastUpdateTime: "2021-12-17T11:57:10Z"
path: helmrepository/default/podinfo/index-83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111.yaml
revision: 83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111
url: http://localhost:9090/helmrepository/default/podinfo/index-83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111.yaml
conditions:
- lastTransitionTime: "2021-12-17T11:57:10Z"
message: Stored artifact for revision '83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111'
observedGeneration: 1
reason: Succeeded
status: "True"
type: Ready
observedGeneration: 1
url: http://localhost:9090/helmrepository/default/podinfo/index.yaml
In this example, this object depends on another object, a secret reference to perform the reconciliation. If the secret object gets deleted without any change in this object's configuration, the next time when this object is reconciled, it'll result in a failure. The following shows an example of the status due to the failure:
status:
artifact:
checksum: 83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111
lastUpdateTime: "2021-12-17T11:57:10Z"
path: helmrepository/default/podinfo/index-83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111.yaml
revision: 83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111
url: http://localhost:9090/helmrepository/default/podinfo/index-83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111.yaml
conditions:
- lastTransitionTime: "2021-12-17T11:58:10Z"
message: 'Failed to get secret ''default/my-secret'': Secret "my-secret" not found'
observedGeneration: 1
reason: AuthenticationFailed
status: "False"
type: Ready
- lastTransitionTime: "2021-12-17T11:58:10Z"
message: 'Failed to get secret ''default/my-secret'': Secret "my-secret" not found'
observedGeneration: 1
reason: AuthenticationFailed
status: "True"
type: FetchFailed
observedGeneration: 1
url: http://localhost:9090/helmrepository/default/podinfo/index.yaml
The failure seems to be due to the same reason as in the previous scenarios.
But in this case, the object wasn't updated. Due to this, the
status.observedGeneration
value and the observedGeneration
value are equal.
The current generation of the configuration was reconciled successfully
previously, but the periodic reconciliation of the same configuration may have
resulted in some failures. Since this specific failure is not due to a domain
specific drift, for example, an update in the remote source that this controller
may have been observing and needs to rebuild the artifact, there's no
Reconciling
status condition. There's only Ready
condition and
FetchFailed
condition.
Instead of the above failure, if a domain specific drift was detected, that
would have resulted in Reconciling
condition to be added.
When the value in the configuration is invalid, the object enters into a stalled state. Usually, invalid values can be caught by object schema validation and validating webhooks, but just for the sake of this example, the provided configuration has some bad values, resulting in the following status:
status:
conditions:
- lastTransitionTime: "2021-12-17T12:00:22Z"
message: 'parse "https+$://stefanprodan.github.io/podinfo": first path segment
in URL cannot contain colon'
observedGeneration: 1
reason: URLInvalid
status: "True"
type: Stalled
- lastTransitionTime: "2021-12-17T12:00:22Z"
message: 'Invalid Helm repository URL: parse "https+$://stefanprodan.github.io/podinfo":
first path segment in URL cannot contain colon'
observedGeneration: 1
reason: URLInvalid
status: "False"
type: Ready
- lastTransitionTime: "2021-12-17T12:00:22Z"
message: 'Invalid Helm repository URL: parse "https+$://stefanprodan.github.io/podinfo":
first path segment in URL cannot contain colon'
observedGeneration: 1
reason: URLInvalid
status: "True"
type: FetchFailed
observedGeneration: 1
This shows that object is in the Stalled
state with Stalled
condition.
FetchFailed
shows the reason for the failure.
Stalled
condition shows the actual reason for the failure, similar to the
FetchFailed
condition.
Ready
condition shows the overall reason for failure based on the
FetchFailed
condition.
If we compare this with Scenario 1 where the first generation of configuration
failed, this status has status.observedGeneration
set to the 1
, which is
also the observedGeneration
of each of the conditions. This means that the
controller has successfully processed the given configuration generation and it
resulted in a stalled state. Retrying the same configuration will not resolve
this failure. A new generation of configuration is needed to resolve this
failure.