Skip to content

Instantly share code, notes, and snippets.

@cjchand
Last active January 25, 2018 17:14
Show Gist options
  • Save cjchand/4053452832dd6b2005ccf1c1cb7aae4b to your computer and use it in GitHub Desktop.
Save cjchand/4053452832dd6b2005ccf1c1cb7aae4b to your computer and use it in GitHub Desktop.
Sensu Dashboard Blog Post.md

(Oprah "You get a dashboard..." pic)

Introduction

Like many other Monitoring Nerds™, I started off using Nagios, and it served me well. But, we grew, matured, and we started Deving some Ops, I found myself looking for alternatives. My search ultimately ended with Sensu. While getting into all of those is more of a book than a blog post, one of the key factors was Sensu's API-first design - and all of the greatness that this design enabled.

A prime example of that can be shown via how many users will interact with Sensu: Dashboards.

Sensu: An Overview

Before I get into some examples of dashboarding for Sensu, it is worthwhile to take a brief detour to talk a bit about Sensu itself. One of the selling points that made me a fan of Sensu is that was designed with the 12 Factor App principles in mind. It ticks all of the buzzword boxes - but not just for the sake of Marketecture.

While getting into all of the guts of Sensu is best saved for another post, here are a few key callouts:

  • Sensu is a monitoring framework, not a monolithic “product”
    • Ultimately, it’s an event router and handler (though that truly sells short what it's capable of)
    • Config can be defined server-side, client-side, or a mix
    • There are a ton of community-provided checks, but you can very easily create your own (and even reuse Nagios checks)
  • Keeping with the 12 Factor goodness, clients, check results, etc stored in Redis
  • Also 12 Factor-y, there are separate services for processing and handling checks (sensu-server) and serving up an API to perform CRUD operations on the state data in Redis (sensu-api)

These APIs were not a bolt-on; Sensu was built from the beginning with the expectation that viewing and managing state (e.g.: "What's the state of the checks on my DB server?") would only be done via these APIs. Perhaps most importantly, the APIs are all public and fully documented, not locked away only for internal use.

Some Sensu Dashboarding Options

Because of the aforementioned APIs we have flexibility not only in our choice in dashboard, but also how Sensu deployments can be grouped in those dashboards. These will become plain as we talk about some dashboarding options available to us as Sensu users.

Uchiwa

Far and away, the most commonly Sensu dashboard is Uchiwa. It is Community-provided, yet maintained by Sensu, Inc. as part of the overall Sensu project.

Uchiwa provides the things you would expect from a dashboard, including, but not limited to:

  • List view of all current events
  • List view of all clients (monitored entities, like servers, services, etc)
  • The ability to drill-down into these items to get more info
  • Acknowledge/silence/resolve events

All of these happen through Sensu's APIs. For example, this screen in Uchiwa...

... is simply calling the /clients API behind the scenes, similar to this:

curl -s 'https://oss-sensu-dit.example.com/clients' | jq
[
  {
    "name": "ditweb132",
    "address": "10.10.10.36",
    "subscriptions": [
      "client:ditweb132"
    ],
    "version": "1.0.2",
    "timestamp": 1516820560
  },
  {
    "name": "ditweb133,
    "address": "10.10.13.33",
    "subscriptions": [
      "client:ditweb133"
    ],
    "version": "1.0.2",
    "timestamp": 1516820542
  },
  {
    "name": "ditweb134",
    "address": "10.10.141.181",
    "subscriptions": [
      "client:ditweb134"
    ],
    "version": "1.0.2",
    "timestamp": 1516820546
  },
  {
    "name": "ditweb135",
    "address": "10.10.141.194",
    "subscriptions": [
      "client:ditweb135"
    ],
    "version": "1.0.2",
    "timestamp": 1516820542
  }
]

Uchiwa's Datacenter Paradigm

Each Sensu deployment is comprised of 1 (or more) sensu-server process(es), 1 (or more) sensu-api process(es), and their dependencies (namely: RabbitMQ and Redis, which may or may not be shared across Sensu deployments).

For many customers, it makes sense to have more than one Sensu deployment. For example, teams might have separate Sensu deployments for Dev vs Stage vs Production. Others might deploy a dedicated Sensu setup per Development team so each Dev team can control their own monitoring destiny.

While you can deploy a separate Uchiwa server (or servers) per Sensu deployment, often it is preferred to have a single view into all of these Sensu deployments - all in the same Uchiwa. To manage this, Uchiwa implements a concept of a "Datacenter."

In Uchiwa parlance, a "datacenter" is simply just a group of Sensu API endpoints. If it helps, when you see "Datacenter" in Uchiwa, you can think, "Sensu cluster." The mapping of Sensu API endpoint(s) to Datacenters lives in the Uchiwa configuration.

The Uchiwa documentation provides a simple example. Here, we have two Sensu API endpoints that live under a Datacenter called "sensu":

{
  "sensu": [
    {
      "name": "us-east-1",
      "host": "10.0.0.1",
      "port": 4567
    },
    {
      "name": "us-east-1",
      "host": "10.0.0.2",
      "port": 4567
    }
  ]  
}

Later on, we will show a real-life, multi-datacenter example.

Sensu Enterprise

Sensu follows an "Open Core" model where anyone is free to deploy the Open Source version of Sensu and Uchiwa, with others preferring to buy Enterprise licenses for enhanced support and expanded, pre-built features that provide a more "batteries included" approach. One of the benefits of purchasing an Enterprise license is the Sensu Enterprise dashboard.

At its core, Sensu Enterprise is Uchiwa with some additional features. While getting into Sensu Enterprise's features is outside the scope of this post, the key takeaway is that it uses the exact same APIs as Uchiwa.

Sensu Grid

A prime example of how Sensu's APIs can be used to build a dashboard to suit your particular needs is Sensu Grid. While Uchiwa provides a great list view of clients and events, there are some scenarios you might want a higher-level, summarized view of what is happening. That is what Sensu Grid aims to provide, and it does it all using - you guessed it - the same APIs as Uchiwa and Sensu Enterprise.

More details will be provided in the next section, but here is a screenshot to whet your appetite:

Deployment Example: Multiple Environments, Multiple View Options

Now that we have a baseline understanding of Uchiwa, Sensu's APIs, and how those things relate to each other, let's get into a real-world example of how we use two of the dashboards mentioned above: Uchiwa and Sensu Grid.

Multi-Datacenter Uchiwa: One Dashboard to Rule Them All

For reasons I will spare you the details of, we have many pre-production environments. These environments need to be viewed holistically as a unit. Because of this, we have a Sensu deployment for each environment (as opposed to by service, by Development team, etc).

While we have an Uchiwa per environment so deployments can se self-contained, we also deploy an "Uber" Uchiwa that allows us to see all environments at once. Not only does this make things simpler (one URL to remember versus one per environment), but we can also quickly drill-down to a given environment with a quick click in the Uchiwa UI.

To show this in action, clicking the last icon in Uchiwa's left-side menu will show you the list of configured Datacenters, including the version of sensu-api is running, whether it is connected to Redis and RabbitMQ, the number of events, clients, and other information specific to that Sensu deployment.

This is what that list looks like in our deployment:

... and here is the entire Uchiwa config file (with some redaction, of course) that makes this possible:

{
  "sensu": [
    {
      "name": "qlab01",
      "host": "oss-sensu-qlab01.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "qlab02",
      "host": "oss-sensu-qlab02.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "qlab03",
      "host": "oss-sensu-qlab03.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "qlab06",
      "host": "oss-sensu-qlab06.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "qlab07",
      "host": "oss-sensu-qlab07.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "zlab01",
      "host": "oss-sensu-zlab01.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "npe-shared",
      "host": "oss-sensu-npe-shared.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "ilab01",
      "host": "oss-sensu-ilab01.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "ilab02",
      "host": "oss-sensu-ilab02.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "ilab03",
      "host": "oss-sensu-ilab03.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "mlab01",
      "host": "oss-sensu-mlab01.example.com",
      "port": 4567,
      "timeout": 5
    },
    {
      "name": "mlab02",
      "host": "oss-sensu-mlab02.example.com",
      "port": 4567,
      "timeout": 5
    }
  ],
  "uchiwa": {
    "host": "0.0.0.0",
    "port": 3000,
    "interval": 5
  }
}

Having all of these Sensu deployments in Uchiwa's config allows us to see a unified view of all clients and all checks across all of these environments... all in one page.

<Uchiwa Screenshot: The Hot Mess™>

Those scary numbers you see on the top-left are the number of checks in a non-OK status (132) and the total number of clients (655). I did mention this is non-production, right? :)

Hovering over these numbers, we can get a pop-out with the breakdown of check and host states.

And where does all of this data come from? Say it with me: "Sensu's APIs!"

Let's say that I wanted to take a peek at just a given environment, rather than the deluge of stuff across all environments. That is as simple as clicking the "Datacenter" drop-down in the upper left, then choosing the environment.

Better yet, I can combine Uchiwa's ability to group events by check name in conjunction with the Datacenter drop-down. Here is an example:

If I suspected that there might be issues with free memory on servers in a given environment, I can click the "All Checks" drop-down to see a list of checks that Uchiwa has discovered from you know where.... Sensu's APIs. By choosing the "Check Memory" check, my world view goes from seeing all events:

... to just the events triggered by failing Check Memory checks:

I can further refine this by clicking the "Datacenter" drop down and choosing a specific Datacenter (AKA: Sensu deployment):

And if I want to view these events in the context of all events for this Datacenter, I can go back to the "All Checks" drop-down and choose "All Checks" to see all events for this Datacenter:

<Uchiwa screenshot: All Events - ILAB03>

Sensu Grid: A Monitor/Executive-Friendly View

While Uchiwa is great for folks responding to and investigating events, there are times where you just need what I call a "chicklet"-based view of the world; a high-level summary that helps me quickly assess how things are going. This might be for display on wall-mounted monitors in a support center or a more Executive-friendly dashboard where deep detail of what is happening would be inappropriate.

For these reasons, and I am sure many others, Alex Leonhardt created Sensu Grid. This is a completely home-grown project and is a perfect example of how anyone can build a custom dashboard for Sensu if the existing ones do not suit their needs.

Sensu Grid shows much of the same data that Uchiwa does, but displays it in a more summarized fashion. Like Uchiwa, it gets this data from the same suite of Sensu APIs and supports a multi-datacenter paradigm.

You can choose to drill-down to see all events for a given Datacenter. Here, we see events for ILAB03, which is the same environment we looked at in our "Check Memory" example above.

<Sensu Grid Screenshot: Events - ILAB03>

At the risk of sounding like a broken record, this is the same data we saw in Uchiwa, obtained via the same APIs.

There is also a per-client view that shows a summary with the number of events triggered on that client, as well as coloring the box to indicate the highest-severity event happening on that client.

<Sensu Grid Screenshot: Clients - ILAB01>

As an added bonus, the "Details" drill-downs in Sensu Grid send you to the appropriate page in Uchiwa where you can see a more detailed view of the event (e.g.: check output details, check history). This makes it very easy to go from a macro-level view of one or more Datacenters into a micro-level view of a specific client or check.

Conclusion: Aren't Open APIs Awesome?

I am sure you are sick of hearing it by now, but it is hard to deny is that without Sensu's open, robust APIs, none of this dashboard-y goodness would be available to use, extend, and even create anew. Like everything else with Sensu, there is a rich foundation of existing solutions to common problems, yet it is built with an openness and composability that allows people to extend and improve upon those foundations to suit their individual needs.

It is this spirit of extensibility, openness, and community that first endeared me to Sensu - and what keeps me loyal to it today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment