Skip to content

Instantly share code, notes, and snippets.

@vyta
Created June 27, 2018 21:00
Show Gist options
  • Save vyta/81c6d473c050d34a42b77d6d406c2012 to your computer and use it in GitHub Desktop.
Save vyta/81c6d473c050d34a42b77d6d406c2012 to your computer and use it in GitHub Desktop.
Alerts with Prometheus

Alerts with Prometheus

The helm installation makes it particularly simple to get going: https://github.com/kubernetes/charts/tree/master/stable/prometheus.

Now you just have to worry about

  1. Creating the alert
  2. Creating the alert rules

Creating the alert

What kind of alert do you want and where do you want it to go? This is defined in the configMap for alertManager, which can be found in the alertmanagerFiles.alertmanager.yml portion of the values.yaml file of the helm chart.

alertmanagerFiles:
  alertmanager.yml:
    global: {}
      # slack_api_url: ''

    receivers:
      - name: default-receiver
        # slack_configs:
        #  - channel: '@you'
        #    send_resolved: true

    route:
      group_wait: 10s
      group_interval: 5m
      receiver: default-receiver
      repeat_interval: 3h

An example for creating an email alert may look something like:

global:
      smtp_smarthost: http://example-mail-service:25
      smtp_from: 'example@example.co'

    receivers:
      - name: default-receiver
        email_configs:
        - to: example-receiver@example.co

    route:
      group_wait: 10s
      group_interval: 5m
      receiver: default-receiver
      repeat_interval: 3h

Creating the alert rule

When should alerts be fired? This is defined in the configMap for the server, which can be found in the serverFiles.alerts portion of the values.yaml file

serverFiles:
  alerts: {}

An example for alerting when a HPA has scaled out to its max replicas:

serverFiles:
  alerts:
    groups:
    - name: group-name
      rules:
      - alert: MaxReplicaReached
        expr: kube_hpa_status_current_replicas{hpa="name-of-hpa"} == kube_hpa_spec_max_replicas{hpa="name-of-hpa"}
        for: 3m
        labels:
          severity: page
        annotations:
          summary: Trigger a alerts if max replicas is reached for more than 3 minutes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment