Skip to content

Instantly share code, notes, and snippets.

@iamabhishek-dubey
Last active January 19, 2018 06:09
Show Gist options
  • Save iamabhishek-dubey/ba42e9de753853bf6a7ca4a08686f4b6 to your computer and use it in GitHub Desktop.
Save iamabhishek-dubey/ba42e9de753853bf6a7ca4a08686f4b6 to your computer and use it in GitHub Desktop.
Prometheus Rule File for POD Failure

Prometheus Rule File for Pod Failure

There are two rule files for Pod Failure of Openshift or Kubernetes in Prometheus

  1. One is:-
- name: pod.rules
  rules:
  - alert: Pod is failed
    expr: sum (rate (container_total_failed{image!="",name=~"^k8s_.*",kubernetes_io_hostname=~"^$Node$"}[1m])) by (pod_name)
    for: 1m
    labels:
      severity: warning
    annotations:
      description: The Kubelet pod has been failed
      summary: Node status is Failed
  1. Second is :-
groups:
- name: pod.rules
  rules:
  - alert: Pod is Failed
    expr: sum(container_tasks_state{container_name!="POD",pod_name!="",state!="failed",status!="true"}) > 0
    for: 1m
    labels:
      severity: warning
    annotations:
      description: The Kubelet pod has been failed
      summary: Node status is Failed
  1. Third one is:-
groups:
- name: pod.rules
  rules:
  - alert: Pod is Failed
    expr: absent(((time() - container_last_seen{name=""}) < 5))
    for: 1m
    labels:
      severity: warning
    annotations:
      description: The Kubelet pod has been failed
      summary: Node status is Failed
  1. Fourth one is:-
groups:
- name: pod.rules
  rules:
  - alert: Pod is Failed
    expr: sum((kube_pod_status_phase{phase="Failed"})) > 0
    for: 1m
    labels:
      severity: warning
    annotations:
      description: "{{$labels.pod}} in {{$labels.namespace}} is Failed!"
      summary: Pod Status is Failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment