Skip to content

Instantly share code, notes, and snippets.

@jrydberg
Last active March 18, 2017 14:51
Show Gist options
  • Save jrydberg/4446153 to your computer and use it in GitHub Desktop.
Save jrydberg/4446153 to your computer and use it in GitHub Desktop.
Scalable Monitoring using Riemann

Riemann is a quite simple but powerful monitoring system. In its core it is an event processing framework allowing a user to handle an incoming event stream. Events are small objects that originate from a host, that has a service and metric value associated with it. Events may also have tags and a free text description. This simple data structure enables an application transmit both exception tracebacks and application metric data using the same mechanism.

Riemann has a few problems though. First, it is a single-machine application, only allowing vertical scaling. Secondly, there's no redundancy or availability solution. In this post we'll try to address the first issue.

Making Riemann Horizontally Scalable

The obvious solution is to have multiple Riemann instances running and then partition the event stream based on the service field of the event. Given a set of Riemann instances, S, a partitioning router R can take events and forward them to an instance in S according to consistent hashing on the key (service field).

Since configuration is code in Riemann this "partitioning router" can be a running instance of Rieeman that has a set of tcp-clients that is spreads events over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment