Skip to content

Instantly share code, notes, and snippets.

@ichandan16
Created December 21, 2016 15:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ichandan16/8e516050271a1e34c31d0fc1b19d139d to your computer and use it in GitHub Desktop.
Save ichandan16/8e516050271a1e34c31d0fc1b19d139d to your computer and use it in GitHub Desktop.
Watcher configuration
# This example uses scripting in the watch (https://www.elastic.co/guide/en/watcher/2.3/scripts-templates.html#scripts).
# Requires inline scripting to be enabled (https://www.elastic.co/guide/en/elasticsearch/reference/2.3/modules-scripting.html#enable-dynamic-scripting)
# eg. script.inline: true
# Uses chained input feature (https://www.elastic.co/guide/en/watcher/2.3/input.html#input-chain)
# In my test case, I am running all 3 nodes on the same machine, which is why the http requests are going to
# localhost:9200, localhost:9201 and localhost:9202. You will have to customize the watch below
# to change localhost:9200 to the host/port of your first node, localhost:9201 to the host/port of your second node, and so on ...
# The example shows how you can pass in basic auth (if your cluster uses Shield), if not, remove the auth: portion from the 3 http inputs
# Update the 3 references for localhost:9200, localhost:9201, and localhost:9202 in the transform script to the name of your 3 nodes accordingly
# Ideally, Watcher should run from a remote monitoring cluster (https://www.elastic.co/guide/en/watcher/current/installing-watcher.html#deploying-separate-cluster)
# Otherwise, in a 3 master+node cluster, if 2 nodes go down, then the cluster will block because of not enough master and the watch will simply fail (expected)
# This demonstrates using the logging action (which writes the action to the ES log file for the elected master node from where Watcher is running). Once you can confirm that it is giving the right output, you can then switch back to email, etc..
PUT _watcher/watch/cluster_health_watch
{
"trigger": {
"schedule": {
"interval": "1m"
}
},
"input": {
"chain": {
"inputs": [
{
"first": {
"http": {
"request": {
"host": "localhost",
"port": 9200,
"path": "/_cluster/health",
"auth": {
"basic": {
"username": "elastic",
"password": "changeme"
}
}
}
}
}
},
{
"second": {
"http": {
"request": {
"host": "localhost",
"port": 9201,
"path": "/_cluster/health",
"auth": {
"basic": {
"username": "elastic",
"password": "changeme"
}
}
}
}
}
},
{
"third": {
"http": {
"request": {
"host": "localhost",
"port": 9202,
"path": "/_cluster/health",
"auth": {
"basic": {
"username": "elastic",
"password": "changeme"
}
}
}
}
}
}
]
}
},
"condition": {
"script": {
"lang": "groovy",
"inline": "def first=null; def second=null; def third=null; if (ctx.payload.first!=null) first = ctx.payload.first.status; if (ctx.payload.first!=null) first = ctx.payload.first.status; if (ctx.payload.second!=null) second = ctx.payload.second.status; if (ctx.payload.third!=null) third = ctx.payload.third.status; if ((first == 'green') && (second == 'green') && (third == 'green')) return false; else return true;"
}
},
"actions": {
"log": {
"transform": {
"script": {
"inline":"temp='\\n\\nCluster health API status as reported by each node:\\n'; down = '/_cluster/health http request failed, this node is likely down or unresponsive'; first=down; second=down; third=down; if (ctx.payload.first.status!=null) first = ctx.payload.first.status; if (ctx.payload.second.status!=null) second = ctx.payload.second.status; if (ctx.payload.third.status!=null) third = ctx.payload.third.status; temp = temp + 'localhost:9200 - ' + first + '\\n' + 'localhost:9201 - ' + second + '\\n' + 'localhost:9202 - ' + third; return [ temp : temp ];",
"lang": "groovy"
}
},
"logging": {
"text": "{{ctx.payload.temp}}"
}
}
}
}
The above returns the following if the cluster health is not green (when localhost:9202 is shutdown):
Cluster health API status as reported by each node:
localhost:9200 - yellow
localhost:9201 - yellow
localhost:9202 - /_cluster/health http request failed, this node is likely down or unresponsive
Note: And the watch does not fire if the cluster health report sent to each node is successful and all reports green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment