Skip to content

Instantly share code, notes, and snippets.

@sciurus
Last active April 9, 2021 13:17
Show Gist options
  • Save sciurus/3a1cd4c203891c8d33b2 to your computer and use it in GitHub Desktop.
Save sciurus/3a1cd4c203891c8d33b2 to your computer and use it in GitHub Desktop.
Why Datadog

Datadog is very nice. Here's something I wrote when asked what value we were getting from it.

Why datadog?

I would break it down into four pieces. Datadog is

  1. providing functionality

  2. we need

  3. in an easy-to-use manner

  4. that would be difficult to build and maintain ourselves

1) Functionality

The agent

It gathers system metrics, integrates with key software we use, and provides a standard interface to which our applications can send custom metrics.

Integrations

Datadog has prebuilt integrations to pull data from almost every important service we use.

Events

Through the integrations datadog generates a consolidated event stream that we can filter and earch as needed.

Dashboards

Datadog lets us build dashboards that combine metrics from many different sources. We can combine and transform metrics to make them more useful. It also provides an powerful interface for interactive exploration of metrics.

Alerting

Datadog has nice stream processing capabilities for generating alerts, and it can surface them in services we use like pagerduty and slack.

2) Need

The Agent

We don't get nearly enough insight from cloudwatch alone, we need an on-instance tool to gather system and app metrics.

Integrations

There are lots of services with operational signficiance, but many of them don't provide a good way to access their data.

Events

We would spend dramatically longer investigating problems if we had to look at eash source of events in isolation. Many of our event sources don't even provide a way for us to view past events or to query them.

Dashboards

Per-service and per-instance dashboards are important for investigating problems quickly. The consolidation of data from multiple sources is again a key feature.

Alerting

We need to do anaylze trends in our metrics and alert on them.

3) Ease of use

The agent

The agent is deployable via a chef cookbook datadog wrote for us. It requires minimal configuration. It knows which system and application metrics are worth gathering.

Integrations

Integrating with all the data sources is literally a few clicks.

Events

The interface makes searching and filtering events straightforward.

Dashboards

There are prebuilt dashbaords for lots of things we care about. Snazzy features like autocomplete and templating make building our own dashboards easy.

Alerting

The guided steps and previewed outputs make creating alerts simple.

4) Hard to replicate

Here I described a system of collectd, custom code to pull metrics from cloudwatch, custom code to pull or receive events from various sources (airbrake, cloudtrail, chef, pagerduty, jenkins, etc) influxdb, and grafana.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment