Skip to content

Instantly share code, notes, and snippets.

@StevenACoffman
Last active January 4, 2024 08:43
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save StevenACoffman/59b9ab36d4cab46d2f521e6f306a9e99 to your computer and use it in GitHub Desktop.
Save StevenACoffman/59b9ab36d4cab46d2f521e6f306a9e99 to your computer and use it in GitHub Desktop.
Logs and Metrics

Logs vs Metrics and implementations

(Also, more thoughts are here )

In working out my thoughts, this is borrowing from several sources, notably:

Monitoring means knowing what’s going on inside your system, how much traffic it’s getting, how it’s performing, how many errors there are. This is not the end goal though, merely a means. Our goal is to be able to detect, debug and resolve any problems that occur, and monitoring is an integral part of that process.

There is a division in approaches to collecting the monitoring data. These are logging as exemplified by Elasticsearch as part of the ELK stack (Elasticsearch, Logstash and Kibana), and metrics as exemplified by the TICK Stack (Telegraf, InfluxDB, Chronograf / Grafana, Kapacitor).

Logs messages are notifications about events as they pertain to a specific transaction. Metrics are notifications that an event occurred, without any ties to a transaction.

Ok so what’s the difference? Well again putting on my Operations hat, metrics can be incredibly smaller because they convey considerably less information. They’re also extremely easier to evaluate. Both of these points have impact around how we store, process and retain metrics.

A log file however, gives you details on a transaction which may allow you to tell a more complete story for a given event. The transactional nature of the log message in aggregate, gives you much more flexibility in terms of surfacing information (not just data) about the business.

Logging has other business purposes beyond monitoring, which are not relevant to my analysis here.

Both logs and metrics need to be collected, and there's a variety of ways to collect them.

Ithaka Fluent Logstash Metrics
Leaf collector Bittybuffer fluent-bit Beats Statsd
Routing/Predigest Logbuffer fluentd Logstash Graphite

ELK Stack (or ELKK, EFKK)

Summary: ELK is a popular open sourced application stack for visualizing and analyzing logs.

  • Elasticsearch: Distributed Real-time search and analytics engine.
  • Logstash: Collect and parse all data sources into an easy-to-read JSON format (Fluent is a modern replacement)
  • Kibana: Elasticsearch data visualization engine
  • Kafka: Data transport, queue, buffer, and short term storage

TICK Stack (or TIGK)

Summary: Solution for collecting, storing, visualizing and alerting on time-series data at scale. All components of the platform are designed to work together seamlessly.

  • Telegraf: Collects time-series data from a variety of sources
  • InfluxDB: Eventually consistent Time-series database
  • Chronograf: Visualizes and graphs, replaced with Grafana sometimes
  • Kapacitor: Alerting, ETL and detects anomalies in time-series data

Metrics old school stack

Summary: Well understood, established ecosystem.

  • Metrics Gatherer - (statsd, collectd, dropwizard metrics)
  • Listener (Carbon)
  • Storage Database (Whisper or InfluxDB)
  • Visualizer (Grafana, Graphite-Web)

Prometheus

Summary: Metrics pull based model

  • PushGateway: for ephemeral or batch jobs
  • uh...? I'm not well versed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment