Skip to content

Instantly share code, notes, and snippets.

@StevenACoffman
Last active April 2, 2024 22:34
Show Gist options
  • Save StevenACoffman/4e267f0f60c8e7fcb3f77b9e504f3bd7 to your computer and use it in GitHub Desktop.
Save StevenACoffman/4e267f0f60c8e7fcb3f77b9e504f3bd7 to your computer and use it in GitHub Desktop.
Fluentd Fluent-bit FileBeat memory and cpu resources

Fluent-bit rocks

A short survey of log collection options and why you picked the wrong one. 😜

Who am I? Where am I from?

I'm Steve Coffman and I work at Ithaka. We do JStor (academic journals) and other stuff. How big is it?

Number what it means
101,332,633 unique visitors in 2017
30,419,294 messages on busiest kafka topic (each a fastly request info) in november
1000-ish AWS EC2 instances
100-ish Engineers

Our cloud spend is large-ish. Each of our AWS EC2 instances has 2GB of overhead, mostly for observability (log + metric collection, microservice tracing), even if that instance is ideally utilized (most of our apps are memory bound, not cpu, or io). Our move to Kubernetes promises to let us achieve near optimal utilization for cpu, memory, and io.

We are trying to further reduce overhead.

Fluent-bit delivers.

Log Collection

Principle 11 of the 12 Factor App is to "Treat logs as event streams".

While most traditional applications store log information in a file, the Twelve-Factor app directs it, instead, to stdout as a stream of events; it’s the execution environment that’s responsible for collecting those events. That might be as simple as redirecting stdout to a file, but in most cases it involves using a log router such as Fluentd, Filebeat, or Fluent-bit and saving the logs to Hadoop or a service such as Splunk. From How do you build 12-factor apps using Kubernetes?

In docker, the default log driver is json-file, but it also supports others, such as fluentd. Collection and shipping is otherwise bring your own.

In Kubernetes, you have at least two battle tested choices for automatic logging capture: Stackdriver Logging if you’re using Google Cloud, and Fluentd to Elasticsearch if you’re not. Both of those are actually Fluentd, since Stackdriver Logging uses a Google-customized and packaged Fluentd agent. You can find more information on setting Fluentd Kubernetes logging destinations here.

Filebeat is more common outside Kubernetes, but can be used inside Kubernetes to produce to ElasticSearch.

Fluent-bit is a newer contender, and uses less resources than the other contenders.

Why Fluent-bit rocks:

  • Uses 1/10th the resource (memory + cpu)
  • Extraordinary throughput and resiliency/reliability
  • Supports multi-line (e.g. stacktrace) as single message
  • Enrich's kubernetes metadata with log messages (if you want that)
  • Kubernetes apps annotated to suggest appropriate parser
  • Instrumented with prometheus metrics
  • Outputs to elasticsearch, kafka, fluentd, etc.
  • You can also use it to ship metrics (cpu, memory, disk usage) to InfluxDB
  • TL;DR use 0.13-dev branch or newer

Resource Comparison

Without monitoring to tailor to our workloads, just going from the recommended resource requests and limits, we have a stark contrast between the different logging collection.

Beats are lightweight data shippers that you install as agents on your servers to send specific types of operational data to Elasticsearch. Beats have a small footprint and use fewer system resources than Logstash.

Logstash has a larger footprint, but provides a broad array of input, filter, and output plugins for collecting, enriching, and transforming data from a variety of sources.

Fluentd and Fluent Bit projects are both created and sponsored by Treasure Data and they aim to solves the collection, processing and delivery of Logs.

Both projects share a lot of similarities, Fluent Bit is fully based in the design and experience of Fluentd architecture and general design. Choosing which one to use depends of the final needs, from an architecture perspective we can consider:

Fluentd is a log collector, processor, and aggregator. Fluent Bit is a log collector and processor (it doesn't have strong aggregation features such as Fluentd).

Combinations

Fluent-bit or Beats can be a complete, although bare bones logging solution, depending on use cases. Fluentd or Logstash are heavier weight but more full featured.

You can combine Fluent-bit (one per node) and Fluentd (one per cluster) just as you can combine Filebeat (one per node) and Logstash (one per cluster).

Comparisons

Fluent-bit from this file

        resources:
          requests:
            cpu: 5m
            memory: 10Mi
          limits:
            cpu: 50m
            memory: 60Mi

Fluentd from this file:

        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 200Mi

FileBeat from this file:

        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi

Recently AWS ran some benchmarks:

To get a better feeling for the performance, we performed a benchmarking test to compare the above Fluent Bit plugin with the Fluentd CloudWatch and Kinesis Firehose plugins. All our tests were performed on a c5.9xlarge EC2 instance. Here are the results:

CloudWatch Plugins: Fluentd vs Fluent Bit

Log Lines Per second Data Out Fluentd CPU Fluent Bit CPU Fluentd Memory Fluent Bit Memory
100 25 KB/s 0.013 vCPU 0.003 vCPU 146 MB 27 MB
1000 250 KB/s 0.103 vCPU 0.03 vCPU 303 MB 44 MB
10000 2.5 MB/s 1.03 vCPU 0.19 vCPU 376 MB 65 MB

Our tests show that the Fluent Bit plugin is more resource-efficient than Fluentd. On average, Fluentd uses over four times the CPU and six times the memory of the Fluent Bit plugin.

Kinesis Firehose Plugins: Fluentd vs Fluent Bit

Log Lines Per second Data Out Fluentd CPU Fluent Bit CPU Fluentd Memory Fluent Bit Memory
100 25 KB/s 0.006 vCPU 0.003 vCPU 84 MB 27 MB
1000 250 KB/s 0.073 vCPU 0.033 vCPU 102 MB 37 MB
10000 2.5 MB/s 0.86 vCPU 0.13 vCPU 438 MB 55 MB

In this benchmark, on average Fluentd uses over three times the CPU and four times the memory than the Fluent Bit plugin consumes. Keep in mind that this data does not represent a guarantee; your footprint may differ. However, the above data points suggest that the Fluent Bit plugin is significantly more efficient than Fluentd.

Keeping Stacktraces together

Most programs contain bugs, and those lead to valuable multi-line stacktraces which are unpleasant to reassemble after being shipped to an eventually consistent distributed data sink (ElasticSearch, Kafka, AWS S3, DynamoDB, what-have-you). It is more convenient if the collector could understand and keep those as single messages.

In fluentd, this is accomplished through fluent-plugin-detect-exceptions which has artisanally hand-crafted regexes for most languages.

In fluent-bit, you configure a multi-line parser for each language you wish to support, and have your application add an annotation that hints what parser to use. Feel free to steal regexes from the fluentd plugin above.

Resilience and Reliability

In kubernetes, using the default docker json-file log driver already provides a measure of on disk buffering for ephemeral containers. When Fluent-bit is tailing those files, it the recommended option is to use a sqlite database file can be used so the plugin can have a history of tracked files and a state of offsets. This is very useful to resume the state if the service is restarted. You may specify a retry limit for shipping logs to different outputs (including False which will retry forever).

In order to avoid backpressure, Fluent Bit implements a mechanism in the engine that restrict the amount of data than an input plugin can ingest, this is done through the configuration parameter Mem_Buf_Limit.

Monitoring

Prometheus Metrics out of the box in the 0.13.x series! Woohoo!

Log Pipeline

Ok great, we're collecting and shipping... and then what? If you want to do more than just searching ElasticSearch, you might consider a solution like minipipe to enable sophisticated analytics.

At Ithaka, here's a presentation about what our Log Pipeline and Analytics stack look(ed) like

What about metrics?

Fluent-bit does that too. Fluent-bit can capture CPU, memory, and disk usage as inputs and output to Influxdb

Metrics: What are they again?

Logs vs Metrics and implementations

In working out my thoughts, this is borrowing from several sources, notably:

Monitoring means knowing what’s going on inside your system, how much traffic it’s getting, how it’s performing, how many errors there are. This is not the end goal though, merely a means. Our goal is to be able to detect, debug and resolve any problems that occur, and monitoring is an integral part of that process.

There is a division in approaches to collecting the monitoring data. These are logging as exemplified by Elasticsearch as part of the ELK stack (Elasticsearch, Logstash and Kibana), and metrics as exemplified by the TICK Stack (Telegraf, InfluxDB, Chronograf / Grafana, Kapacitor).

Logs messages are notifications about events as they pertain to a specific transaction. Metrics are notifications that an event occurred, without any ties to a transaction.

Ok so what’s the difference? Well again putting on my Operations hat, metrics can be incredibly smaller because they convey considerably less information. They’re also extremely easier to evaluate. Both of these points have impact around how we store, process and retain metrics.

A log file however, gives you details on a transaction which may allow you to tell a more complete story for a given event. The transactional nature of the log message in aggregate, gives you much more flexibility in terms of surfacing information (not just data) about the business.

Logging has other business purposes beyond monitoring, which are not relevant to my analysis here.

Both logs and metrics need to be collected, and there's a variety of ways to collect them.

Ithaka Fluent Logstash Metrics
Leaf collector Bittybuffer fluent-bit Beats Statsd
Routing/Predigest Logbuffer fluentd Logstash Graphite

ELK Stack (or ELKK, EFKK)

Summary: ELK is a popular open sourced application stack for visualizing and analyzing logs.

  • Elasticsearch: Distributed Real-time search and analytics engine.
  • Logstash: Collect and parse all data sources into an easy-to-read JSON format (Fluent is a modern replacement)
  • Kibana: Elasticsearch data visualization engine
  • Kafka: Data transport, queue, buffer, and short term storage

TICK Stack (or TIGK)

Summary: Solution for collecting, storing, visualizing and alerting on time-series data at scale. All components of the platform are designed to work together seamlessly.

  • Telegraf: Collects time-series data from a variety of sources
  • InfluxDB: Eventually consistent Time-series database
  • Chronograf: Visualizes and graphs, replaced with Grafana sometimes
  • Kapacitor: Alerting, ETL and detects anomalies in time-series data

Metrics old school stack

Summary: Well understood, established ecosystem.

  • Metrics Gatherer - (statsd, collectd, dropwizard metrics)
  • Listener (Carbon)
  • Storage Database (Whisper or InfluxDB)
  • Visualizer (Grafana, Graphite-Web)

Prometheus

Summary: Metrics pull based model

  • PushGateway: for ephemeral or batch jobs
  • uh...? I'm not well versed
@sagarjauhari
Copy link

You can find more information on setting Fluentd Kubernetes logging destinations here.

the link is stale

@valihanov-rz
Copy link

The file is called "fluent-filebeat-comparison.md". Where is comparison with filebeat? =)

@StevenACoffman
Copy link
Author

@sagarjauhari Thanks! Updated the link.

@valihanov-rz Look for "beat" and that is the extent of the filebeat at present. My difficulty with exhaustive performance breakdowns were deciding if you needed to also run LogStash and include those. Filebeat can ship directly to ElasticSearch. You would only then need to use LogStash if you:

  • Perform event transforms that Filebeat and ES aren't capable of.
  • Additional inputs and outputs.

My purpose was to ship to Kafka (not ElasticSearch) and lightly alter / aggregate the log messages in a way that Filebeat wasn't capable of doing (at least at the time, but I think also currently). Comparing the CPU and memory usage of Logstash + Filebeat to Fluent-bit alone seemed ridiculous. The mentions of the Beats ecosystem seemed sufficient for context, but I left an exhaustive comparison to someone who's needs would line up more closely (shipping directly to ES without event transforms) and speak to real world monitoring results.

@jonathanviber
Copy link

I read a while back that fluent bit didn't handle volume as well as FluentD. I think the number of lines p/second was slower in Fluent-bit when it hit super-high volumes. Has anyone experienced this?

@StevenACoffman
Copy link
Author

In my experience, at super high volumes, fluent-bit outperformed fluentd with higher throughput, lower latency, lower CPU, and lower memory usage. There are features that fluentd has which fluent-bit does not, like detecting multiple line stack traces in unstructured log messages. To my mind, that is the only reason to use fluentd.

@jonathanviber
Copy link

Thanks for the reply. Perhaps this comment came from an older version of fluent bit.
We send from FluentD to Kinesis using a Kinesis aggregator. The aggregator was only recently added to Fluent-bit so that makes it a potential solution for us.

@jeff303
Copy link

jeff303 commented Apr 2, 2024

It seems as though multi-line parsing was added, at least in 3.0 (see here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment