Skip to content

Instantly share code, notes, and snippets.

@alicek106
Last active April 15, 2020 11:45
Show Gist options
  • Save alicek106/acc99e0aec47868b1bcdaaf57eb473b5 to your computer and use it in GitHub Desktop.
Save alicek106/acc99e0aec47868b1bcdaaf57eb473b5 to your computer and use it in GitHub Desktop.
example-prometheusrule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
owner: alicek106
role: alert-rules
name: example-app-latency
namespace: default
spec:
groups:
- name: example-app-latency.rules
rules:
- alert: app-latency
expr: histogram_quantile(0.99, sum(irate(istio_request_duration_seconds_bucket{reporter="source",destination_service=~"ingress-annotation-test-svc.example-app.svc.cluster.local"}[1m])) by (le, destination_workload)) > 0.2
for: 5s # 얼마나 expr이 지속되어야만 Trigger되는지 설정할 수 있다.
labels:
application: example-app
severity: critical
class: istio
annotations:
triggered: "{{ $labels.destination_workload }} 서비스의 p99 레이턴시가 너무 높습니다."
resolved: "{{ $labels.destination_workload }} 서비스의 p99 레이턴시가 안정화되었습니다."
value: "{{ printf \"%.5fs\" $value }}" # $value 값이 너무 긴 소수점이라서 round함
identifier: "ingress-annotation-test-svc"
summary: "{{ $labels.destination_workload }} 서비스에서 알람이 발생했습니다."
query: '`histogram_quantile(0.99, sum(irate(istio_request_duration_seconds_bucket{reporter="source",destination_service=~"ingress-annotation-test-svc.example-app.svc.cluster.local"}[1m])) by (le, destination_workload)) > 0.2`'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment