Skip to content

Instantly share code, notes, and snippets.

@EliFuzz
Created September 5, 2023 02:20
Show Gist options
  • Save EliFuzz/9f1351f5510a76eb7ad1578350e2d5e9 to your computer and use it in GitHub Desktop.
Save EliFuzz/9f1351f5510a76eb7ad1578350e2d5e9 to your computer and use it in GitHub Desktop.
Overview Table: Monitoring in Kubernetes
Level Definition Issues Key Metrics
Application Application monitoring is the process of tracking the health and performance of the applications running inside the pods in a Kubernetes cluster Application bugs, crashes, or exceptions that affect the application functionality or reliability
- Application performance issues that affect the application responsiveness or scalability
- Application quality issues that affect the application usability or user satisfaction
- Application business metrics that measure the application value or impact
Application availability
- Application performance
- Application quality
- Application business
Cluster Cluster monitoring is the process of tracking the health and performance of an entire Kubernetes cluster Nodes that are unreachable, unresponsive, or under high load
- Pods that are failing, crashing, or restarting frequently
- Insufficient or overprovisioned resources for the cluster or its namespaces
- Bottlenecks or anomalies in the cluster network traffic
- Errors or failures in the cluster control plane components, such as the API server, scheduler, controller manager, etcd
Cluster availability
- Cluster capacity
- Cluster utilization
- Cluster saturation
- Cluster errors
Control Plane Control plane monitoring is the process of tracking the health and performance of the control plane components in a Kubernetes cluster Errors or failures in the control plane components, such as the API server, scheduler, controller manager, etcd Control plane availability
- Control plane performance
- Control plane errors
Etcd Etcd monitoring is the process of tracking the health and performance of the etcd store in a Kubernetes cluster Etcd version mismatches, unexpected leader elections, slow queries, disk usage, network issues Etcd availability
- Etcd performance
- Etcd consistency
- Etcd errors
Node Node monitoring is the process of tracking the health and performance of individual nodes in a Kubernetes cluster Nodes that are unreachable, unresponsive, or under high load
- Nodes that have insufficient or overprovisioned resources
- Nodes that are experiencing hardware or software issues
Node availability
- Node capacity
- Node utilization
- Node errors
Pod Pod monitoring is the process of tracking the health and performance of individual pods in a Kubernetes cluster Pods that are unable to start, terminate, or scale due to configuration errors, resource constraints, or scheduling conflicts
- Pods that are experiencing high latency, low throughput, or poor quality of service due to network issues or application errors
- Pods that are consuming more resources than expected or allocated due to inefficient code or resource leaks
- Pods that are vulnerable to security threats or compliance violations due to misconfigured policies or permissions
Pod availability
- Pod utilization
- Pod saturation
- Pod errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment