Skip to content

Instantly share code, notes, and snippets.

@stefanprodan
Last active January 8, 2020 05:17
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stefanprodan/fa9d484517cf34af2dffdb94edd5089e to your computer and use it in GitHub Desktop.
Save stefanprodan/fa9d484517cf34af2dffdb94edd5089e to your computer and use it in GitHub Desktop.
prom-alerts
apiVersion: alerting.cloud.weave.works/v1
kind: PrometheusRule
metadata:
name: node-rules
spec:
groups:
- name: node-alerts
rules:
- alert: NodeDown
expr: up{job="kubernetes-nodes"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Kubernetes node is not available"
impact: "The cluster will be operating at a reduced capacity for scheduling workloads."
detail: "Node: {{$labels.instance}}"
- name: clock-alerts
rules:
- alert: ClockSkewiff
expr: abs(node_ntp_drift_seconds) > 15
for: 5m
labels:
severity: warning
annotations:
summary: "Clock is out of sync with NTP."
impact: "Random things are about to break for our users"
detail: "Node: {{$labels.node}} is {{$value}} seconds off"
- alert: ClockSyncBroken
expr: node_timex_sync_status != 1
for: 5m
labels:
severity: warning
annotations:
summary: "The clock is not being synced."
impact: "Random things are about to break for our users"
detail: "Node: {{$labels.node}}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment