This alert fires when no test-operator
pod in the Running
state has
been detected for 10 minutes.
The test-operator
is the first Operator to start in a cluster. Its primary
responsibilities include the following:
The test-operator
deployment has a default replica of 2 pods.
This alert indicates a failure at the level of the cluster. Critical cluster-wide management functionalities, such as certification rotation, upgrade, and reconciliation of controllers, might not be available.
The test-operator
is directly responsible for the management of the resources
in the cluster. Therefore, its temporary unavailability affects significantly
the workloads.
-
Set the
NAMESPACE
environment variable:$ export NAMESPACE="$(kubectl get tests -A \ -o custom-columns="":.metadata.namespace)"
-
Check the status of the
test-operator
deployment:$ kubectl -n $NAMESPACE get deploy test-operator -o yaml
-
Obtain the details of the
test-operator
deployment:$ kubectl -n $NAMESPACE describe deploy test-operator
-
Check the status of the
test-operator
pods:$ kubectl get pods -n $NAMESPACE -l=app=test-operator
-
Check for node issues, such as a
NotReady
state:$ kubectl get nodes
Based on the information obtained during the diagnosis procedure, try to find the root cause and resolve the issue.