Skip to content

Instantly share code, notes, and snippets.

@iamsudip
Created November 26, 2020 20:19
Show Gist options
  • Save iamsudip/a0bc21e43f086b646b34f7c49bf63f70 to your computer and use it in GitHub Desktop.
Save iamsudip/a0bc21e43f086b646b34f7c49bf63f70 to your computer and use it in GitHub Desktop.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app: kube-prometheus-stack
release: prometheus
name: job-failure-test
spec:
groups:
- name: CronJobAlert
rules:
- record: job_cronjob:kube_job_status_start_time:max
expr: |-
label_replace(
label_replace(
max(
kube_job_status_start_time
* ON(exported_job) GROUP_RIGHT()
kube_job_labels{label_cronjob!=""}
) BY (exported_job, label_cronjob)
== ON(label_cronjob) GROUP_LEFT()
max(
kube_job_status_start_time
* ON(exported_job) GROUP_RIGHT()
kube_job_labels{label_cronjob!=""}
) BY (label_cronjob),
"job", "$1", "exported_job", "(.+)"),
"cronjob", "$1", "label_cronjob", "(.+)")
- record: job_cronjob:kube_job_status_failed:sum
expr: |-
clamp_max(
job_cronjob:kube_job_status_start_time:max,
1)
* ON(job) GROUP_LEFT()
label_replace(
label_replace(
(kube_job_status_failed != 0),
"job", "$1", "exported_job", "(.+)"),
"cronjob", "$1", "label_cronjob", "(.+)")
- alert: CronJobFailed
labels:
severity: error
for: 1m
expr: |-
job_cronjob:kube_job_status_failed:sum
* ON(cronjob) GROUP_RIGHT()
kube_cronjob_labels
> 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment