Skip to content

Instantly share code, notes, and snippets.

@BrianKopp
Last active October 14, 2019 15:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save BrianKopp/0ff8457d5100dc6b1db4cf478659a037 to your computer and use it in GitHub Desktop.
Save BrianKopp/0ff8457d5100dc6b1db4cf478659a037 to your computer and use it in GitHub Desktop.
Prometheus-Operator setup

Example Prometheus Operator Values.yaml

# replacing https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/prometheus/rules-1.14/general.rules.yaml
# notice the TargetDown expression excluding kube-proxy
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app: prometheus-operator
release: prom-op
name: prom-op-prometheus-operato-general-no-kube-proxy.rules
namespace: monitoring
spec:
groups:
- name: general.rules
rules:
- alert: TargetDown
annotations:
message: '{{ $value }}% of the {{ $labels.job }} targets are down.'
expr: 100 * (count(up{job!="kube-proxy"} == 0) BY (job) / count(up{job!="kube-proxy"}) BY (job)) > 10
for: 10m
labels:
severity: warning
- alert: Watchdog
annotations:
message: |
This is an alert meant to ensure that the entire alerting pipeline is functional.
This alert is always firing, therefore it should always be firing in Alertmanager
and always fire against a receiver. There are integrations with various notification
mechanisms that send a notification when this alert is not firing. For example the
"DeadMansSnitch" integration in PagerDuty.
expr: vector(1)
labels:
severity: none
# 1 - Disable rules for EKS
kubeControllerManager:
enabled: false
kubeScheduler:
enabled: false
defaultRules:
rules:
kubeScheduler: false
general: false
prometheus:
# 2 - Ingress
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx-internal
certmanager.k8s.io/cluster-issuer: letsencrypt-production
hosts:
- prometheus.<YOURDOMAIN>.com
paths:
- "/"
tls:
- hosts:
- prometheus.<YOURDOMAIN>.com
secretName: prom-op-prometheus-prod-cert
prometheusSpec:
# 3 - Thanos - IAM Configuration
podMetadata:
annotations:
iam.amazonaws.com/role: <ARN OF THANOS ROLE>
# 4 - Node affinity
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: duty
operator: In
values:
- stable
# 3 - Thanos
thanos:
baseImage: quay.io/thanos/thanos
version: v0.2.1
objectStorageConfig:
key: thanos.yaml
name: thanos-objstore-config
prometheusOperator:
# 4 - Node affinity
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: duty
operator: In
values:
- stable
alertmanager:
# 2 - Ingress
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx-internal
certmanager.k8s.io/cluster-issuer: letsencrypt-production
hosts:
- alertmanager.<YOURDOMAIN>.com
paths:
- "/"
tls:
- hosts:
- alertmanager.<YOURDOMAIN>.com
secretName: prom-op-alertmanager-prod-cert
# 5 - AlertManager configuration
config:
global:
slack_api_url: '<SLACK_WEBHOOK_URL>'
route:
repeat_interval: 4h
routes:
- match:
alertname: Watchdog
receiver: 'null'
- match:
receiver: 'slack'
continue: true
receivers:
- name: 'null'
- name: 'slack'
slack_configs:
- channel: '#<CHANNEL_NAME>'
send_resolved: true
title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Monitoring Event Notification'
text: |-
{{ range .Alerts }}
*Alert:* {{ .Labels.alertname }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.message }}
*Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:> *Runbook:* <{{ .Annotations.runbook_url }}|:spiral_note_pad:>
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
# 4 - Node affinity
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: duty
operator: In
values:
- stable
grafana:
# 2 - Ingress
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx-internal
certmanager.k8s.io/cluster-issuer: letsencrypt-production
hosts:
- grafana.<YOURDOMAIN>.com
paths:
- "/"
tls:
- hosts:
- grafana.<YOURDOMAIN>.com
secretName: prom-op-grafana-prod-cert
# 4 - Node affinity
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: duty
operator: In
values:
- stable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment