Skip to content

Instantly share code, notes, and snippets.

@rewanthtammana
Created December 20, 2022 08:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rewanthtammana/564cf6493ca546f2cffc1fa5f80cc576 to your computer and use it in GitHub Desktop.
Save rewanthtammana/564cf6493ca546f2cffc1fa5f80cc576 to your computer and use it in GitHub Desktop.
groups:
- name: loki
rules:
# Rules inspired from loki-mixins - https://github.com/grafana/loki/blob/main/production/loki-mixin-compiled/alerts.yaml
- alert: LokiRequestErrors
annotations:
description: This alert checks that we have less than 10% errors on Loki requests.
expr: |
100 * sum(rate(loki_request_duration_seconds_count{status_code=~"5.."}[1m])) by (namespace, job, route)
/
sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route)
> 10
for: 120m
labels:
area: services
cancel_if_apiserver_down: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_status_updating: "true"
cancel_if_scrape_timeout: "true"
cancel_if_outside_working_hours: false
severity: page
topic: observability
- alert: LokiRequestPanics
annotations:
description: This alert checks that we have no panic errors on Loki.
expr: |
sum(increase(loki_panic_total[10m])) by (namespace, job) > 0
labels:
area: services
cancel_if_apiserver_down: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_status_updating: "true"
cancel_if_scrape_timeout: "true"
cancel_if_outside_working_hours: false
severity: page
topic: observability
- alert: LokiRingUnhealthy
annotations:
description: 'Loki pod {{ $labels.pod }} (namespace {{ $labels.namespace }}) sees {{ $value }} unhealthy ring members'
expr: |
sum (min_over_time(cortex_ring_members{state="Unhealthy"}[10m])) by (app, container, name, namespace, pod) > 0
labels:
area: services
cancel_if_apiserver_down: "true"
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_status_updating: "true"
cancel_if_scrape_timeout: "true"
cancel_if_outside_working_hours: "true"
severity: page
topic: observability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment