Skip to content

Instantly share code, notes, and snippets.

@aaronfeng
Last active August 13, 2020 19:51
Show Gist options
  • Star 36 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save aaronfeng/4583640 to your computer and use it in GitHub Desktop.
Save aaronfeng/4583640 to your computer and use it in GitHub Desktop.

Application specific host grouping in Riemann-dash

It is generally desirable to group all the hosts for a specific service into a single dashboard view. For example, all the web servers are in single view while all the database servers are in another view.

This is usually not an issue when you are sending custom metrics using Riemann client. However, there are cases where you are using something that you do not control how the metrics are being sent. i.e., Riemann-tools.

Since Riemann-tools scripts are application agnostic, in order for the dashboard view to group hosts, we must inject some application specific information into the tags field. Tags is a collection of arbitrary strings. In the case of Riemann-tools scripts you can pass in arbitrary strings on the command line.

riemann-health --host 127.0.0.1 --tag "prod" --tag "webserver"

In this case, the above health check will include two extra tags "prod" and "webserver". On the dashboard side, you can create a grid view contains the following query:

'(tagged = "prod" and tagged = "webserver")'

alt text

tagged keyword in the query will search for all tags of an event. The query above will return only "prod" and "webservers" hosts.

Count total number of hosts

Sometimes it is useful to know how many hosts are sending data to Riemann. This is especially useful in cloud type of environment where nodes are constantly scaling up and down.

Since Riemann server knows all the sending hosts, we will create a new stream that will keep track of unique hosts and index it as a new service so the Riemann dashboard can query it. In riemann.config you should have something that looks like below:

(let [index (default :ttl 300 (update-index (index)))]
    (streams
      prn
      index))

Since stream is just a function that takes an event as an argument, we will just create an anonymous function to existing streams that will do all the work.

(let [hosts (atom #{})]
  (fn [event]
    (swap! hosts conj (:host event))
    (prn :hosts @hosts)
    (index {:service "unique hosts"
            :time (unix-time)
            :metric (count @hosts)})))

On the dashboard you can create a gauge view with the following query:

'(service = "unique hosts")'

alt text

If you created the query correctly, you should see something like below:

alt text

Count number of hosts by application grouping

Assuming you already included special tags that you would like to group your application hosts by. You can use the following code in riemann.config to create a index that will count the number of unique hosts for a given group of tags.

(let [hosts (atom {}) host (atom #{})]
  (fn [event]
    (let [tag-str (keyword (clojure.string/join "-" (:tags event)))]
    (swap! hosts assoc tag-str (conj (tag-str @hosts #{}) (:host event)))
    (index {:service (str (name tag-str) "-count")
            :time (unix-time)
            :metric (count (tag-str @hosts))})
    (swap! hosts (atom {})))))

The above code will create a unique service using all your tags with "-count" appended to the end. For example, if you have "webserver" and "prod" tags, the new service that will count unique hosts will be named "webserver-prod-count". In your dashboard you can query it like below:

'(service = "webserver-prod-count")'

alt text

If you create a new gauge view with that query, you will get the current count of all your production web servers.

alt text

@alfredocambera
Copy link

This gist has been very useful. Didn't know how to create basic filters for riemann-dash.

@svetlyak40wt
Copy link

svetlyak40wt commented Jan 18, 2017

Interesting, when I enter a simple query like '(tagged = "osx")' in the grid cell, I see that server starts to write such tracebacks to it's log and nothing happens on the dashboard:

WARN [2017-01-18 14:31:03,024] worker-2 - riemann.transport.websockets - ws-handler caught; closing websocket connection.
clj_antlr.ParseError: token recognition error at: '''
extraneous input '=' expecting STRING
token recognition error at: '''
        at clj_antlr.common$parse_error.invokeStatic(common.clj:146)
        at clj_antlr.common$parse_error.invoke(common.clj:141)
        at clj_antlr.interpreted.SinglethreadedParser.parse(interpreted.clj:100)
        at clj_antlr.interpreted.ThreadLocalParser.parse(interpreted.clj:122)
        at clj_antlr.core.ParserWrapper.parse(core.clj:13)
        at clj_antlr.core$parse_STAR_.invokeStatic(core.clj:28)
        at clj_antlr.core$parse_STAR_.invoke(core.clj:19)
        at clj_antlr.core$parse.invokeStatic(core.clj:35)
        at clj_antlr.core$parse.invoke(core.clj:30)
        at clj_antlr.core.ParserWrapper.invoke(core.clj:17)
        at riemann.query$ast.invokeStatic(query.clj:118)
        at riemann.query$ast.invoke(query.clj:115)
        at riemann.transport.websockets$ws_index_handler.invokeStatic(websockets.clj:72)
        at riemann.transport.websockets$ws_index_handler.invoke(websockets.clj:63)
        at riemann.transport.websockets$ws_handler$handle__11432$fn__11433.invoke(websockets.clj:155)
        at riemann.transport.websockets$ws_handler$handle__11432.invoke(websockets.clj:137)
        at org.httpkit.server.HttpHandler.run(RingHandler.java:91)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Do you know, why?

I use riemann 0.2.12

@kenrestivo-stem
Copy link

I too am seeing token errors on basic queries copied from documentation. Don't know what I could be doing wrong here.

@daviddyball
Copy link

The syntax appears to have changed since this was written. To filter by service you would use this as the query (tagged "webserver") no = (equals) or ' (single quotes).

@jerryz1982
Copy link

jerryz1982 commented Mar 9, 2018

hi
i have an issue with updating the host count. it only increases, won't decrease after one host is taken down.
it may be because i remove (swap! hosts (atom {})) in the end because riemann is complaining:
clojure.lang.Compiler$CompilerException: java.lang.ClassCastException: clojure.lang.Atom cannot be cast to clojure.lang.IFn

How could get his right?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment