Level | Definition | Issues | Key Metrics |
---|---|---|---|
Application | Application monitoring is the process of tracking the health and performance of the applications running inside the pods in a Kubernetes cluster | Application bugs, crashes, or exceptions that affect the application functionality or reliability - Application performance issues that affect the application responsiveness or scalability - Application quality issues that affect the application usability or user satisfaction - Application business metrics that measure the application value or impact |
Application availability - Application performance - Application quality - Application business |
Cluster | Cluster monitoring is the process of tracking the health and performance of an entire Kubernetes cluster | Nodes that are unreachable, unresponsive, or under high load - Pods that are failing, crashing, or restarting frequently - Insufficient or overprovisioned resources for the cluster or its namespaces - Bottlenecks or anomalies in the cluster network traffic - Errors or failures in the cluster control plane components, such as the API server, scheduler, controller manager, etcd |
Cluster availability - Cluster capacity - Cluster utilization - Cluster saturation - Cluster errors |
Control Plane | Control plane monitoring is the process of tracking the health and performance of the control plane components in a Kubernetes cluster | Errors or failures in the control plane components, such as the API server, scheduler, controller manager, etcd | Control plane availability - Control plane performance - Control plane errors |
Etcd | Etcd monitoring is the process of tracking the health and performance of the etcd store in a Kubernetes cluster | Etcd version mismatches, unexpected leader elections, slow queries, disk usage, network issues | Etcd availability - Etcd performance - Etcd consistency - Etcd errors |
Node | Node monitoring is the process of tracking the health and performance of individual nodes in a Kubernetes cluster | Nodes that are unreachable, unresponsive, or under high load - Nodes that have insufficient or overprovisioned resources - Nodes that are experiencing hardware or software issues |
Node availability - Node capacity - Node utilization - Node errors |
Pod | Pod monitoring is the process of tracking the health and performance of individual pods in a Kubernetes cluster | Pods that are unable to start, terminate, or scale due to configuration errors, resource constraints, or scheduling conflicts - Pods that are experiencing high latency, low throughput, or poor quality of service due to network issues or application errors - Pods that are consuming more resources than expected or allocated due to inefficient code or resource leaks - Pods that are vulnerable to security threats or compliance violations due to misconfigured policies or permissions |
Pod availability - Pod utilization - Pod saturation - Pod errors |
Created
September 5, 2023 02:20
-
-
Save EliFuzz/9f1351f5510a76eb7ad1578350e2d5e9 to your computer and use it in GitHub Desktop.
Overview Table: Monitoring in Kubernetes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment