Skip to content

Instantly share code, notes, and snippets.

@aiwantaozi
Last active May 8, 2020 01:37
Show Gist options
  • Save aiwantaozi/0b8576381583d910ef9ed8b46448de2b to your computer and use it in GitHub Desktop.
Save aiwantaozi/0b8576381583d910ef9ed8b46448de2b to your computer and use it in GitHub Desktop.
alert-troubleshooting.md

Alerting debug steps

Can't receive alert troubleshooting steps

metric alert is trigger by prometheus, when expression condition matched, prometheus will send alert to alertmanager, while others are trigger by rancher, rancher will call alertmanager API to trigger ·

Check app status

  • Go to System project -> Apps -> check apps status for monitoring-operator, cluster-alerting, cluster-monitoring, make sure these apps deployed success

Check notifier

  • Go to Tools -> Notifier, Select the notifier you configured, click Test button, make sure you could receive a test message
  • Make sure you configure the receiver for your alert group, if no receiver configured, alert rule under this alert group wouldn't be triggered.

Check rancher server could access alert API

For metric expression alert

Make sure the expression is correct

  • Go to System project -> Apps -> cluster-monitoring -> click index.html go to prometheus page, paste your expression here for test

Make sure the expression is synced to prometheus

  • In the prometheus page, click Alerts tab, make sure your expression is configured here if your alert expression isn't included here, may be got errors when generating alert rule, could check rancher server error.
  • You could see alert status in prometheus Alerts page, if the alert status is Fire, means prometheus will call alertmanager API and send an alert event it alertmanager, at this time, you could see the alert event under alertmanager page(System project -> Apps -> cluster-alerting -> click index.html), otherwise, prometheus failed to call alertmanager API, could go to see the prometheus's logs.

For build-in alert (alert exclude expression alert)

  • These alerts are trigger by rancher, you could enable rancher server's debug logs, rancher server may fail to call alertmanager
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment