Skip to content

Instantly share code, notes, and snippets.

@hhoover
Last active February 28, 2016 17:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hhoover/86d50e40764cb3c314d1 to your computer and use it in GitHub Desktop.
Save hhoover/86d50e40764cb3c314d1 to your computer and use it in GitHub Desktop.
FluentD blog post

Logging in a Hybrid Cloud with Fluentd and ObjectRocket

By: Hart Hoover & Ryan Walker

Recently, the Rackspace DevOps Automation team announced a service that sends alerts from New Relic to Rackspace support. These alerts will generate tickets for our DevOps Engineers to respond to, so our customers can sleep soundly when alerts are generated at 3 a.m. When combined with other data points collected about our customers’ environments, our Engineers will identify where issues lie and then execute the proper course of action.

While designing the infrastructure for this service, we encountered a common, but interesting problem in that we needed to limit access to Rackspace internal systems for security while still maintaining a public endpoint that New Relic could talk to. Our solution was to design a service with public API endpoints and private workers completely segregated from each other. The public API endpoints receive alerts from New Relic and pass them to an ObjectRocket Redis instance acting as a queue. Worker services run internally behind a RackConnect firewall and pull messages from the queue and create alerts.

This partitions the environments very well, but did create a problem for us with regards to log aggregation. We run an ElasticSearch/Kibana stack inside our private environment. Behind the firewall, we use fluentd to push logs directly to ElasticSearch. Outside the firewall, the EK stack can't be reached. To solve this, we started using fluentd to push logs from our public API services to an ObjectRocket MongoDB instance. Internally, we use fluentd again to pull the logs from ObjectRocket into ElasticSearch. This gives us a single destination for all of our environment's activities.

What is Fluentd?

Fluentd is an open source data collector that tries to structure data as JSON as much as possible. This means that you don't have to write and maintain a bunch of scripts to get logging data in a similar format. It's all JSON.

The power of fluentd is in its support for multiple sources and destinations. For example, you can collect data from a Twitter stream and notify you about it in IRC. There are tons of community plugins available.

Using Fluentd with Docker

Using the MongoDB fluentd plugin, one can easily push logs into ObjectRocket. First, sources must be defined. Since all of our services are using Docker we have to get our container logs into fluentd. There's a great post that complements this one on how to accomplish log aggregation with docker-gen and fluentd by Jason Wilder here. Once the fluentd container is running (and docker-gen has generated the fluentd configuration), you should have a section like this for each running container:

<source>
  type tail
  format json
  time_key time
  path /var/lib/docker/containers/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898-json.log
  pos_file /var/lib/docker/containers/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898-json.log.pos
  tag docker.container.c835298de6dd
  rotate_wait 5
</source>

This tails the container log, and keeps track of where it is in the log with a position file. It is important to note that the tag present in this configuration section is a fluentd tag, used to tell fluentd what to do with the data it aggregates.

Using Fluentd with MongoDB

On the public side, we tell fluentd what to do with data with a "match". In this case, replace the variables with actual information from your ObjectRocket database in the same configuration file:

<match docker.**>
  type mongo
  database $DBNAME
  collection prod
  host $HOSTNAME
  port $PORT
  ssl
  capped
  capped_size 100m
  user $MONGOUSER
  password $MONGOPASS
  include_tag_key true
</match>

The setting include_tag_key tells fluentd to include the tag in the record for the log in MongoDB. This way we know exactly which log entry belongs to which container. Fluentd will start populating MongoDB with data, which we can then pull down on the private side of our application.

On the private side, we still use the fluentd MongoDB plugin, but this time set it as a source:

<source>
  type mongo_tail
  database $DBNAME
  collection prod
  host $HOSTNAME
  port $PORT
  user $MONGOUSER
  password $MONGOPASS
  ssl
  time_key time
  wait_time 5
  tag prod
  id_store_file /app/prod_last_id
</source>

Then, we provide a "match" for our logs to push them into ElasticSearch:

<match **>
  type forest
  subtype elasticsearch
  <template>
    host elasticsearch.domain.com
    port 9200
    index_name fluentd
    logstash_format true
    buffer_type memory
    type_name ${tag}
    flush_interval 3
    retry_limit 17
    retry_wait 1.0
    num_threads 1
  </template>
</match>

We're also using the forest fluentd plugin which simplifies our tagging configuration across multiple environments.

Fluentd is a great way to aggregate your Docker logs across multiple hosts and push them to a MongoDB database. In our case, ObjectRocket is a way station between our public and private environments for log aggregation. Other use cases could include real-time analytics on the data you're collecting. The best part for our team is that we don't have to manage MongoDB, thanks to ObjectRocket's reliability and knowledge.

About the Authors

Hart Hoover started his career at Rackspace in 2007 as a Linux Systems Administrator, providing technical support for managed dedicated server environments. He moved to the cloud in 2009 to help design and implement the Managed Cloud Servers support model, leading Rackspace to be the #1 Managed Cloud company. Hart then created and delivered cloud application and architecture training for all of Rackspace Sales. He now serves customers as an Engineer with the DevOps Automation team at Rackspace while leading San Antonio DevOps, a local meetup group. You can follow him on twitter at @hhoover.

Ryan Walker has been a Racker since 2009 and has worn many hats along the way. Starting as a Linux Systems Administrator, Ryan has since helped design and implement products such as Rackspace Managed Cloud Servers and Rackspace Deployments. Currently, he works as an Engineer with the DevOps Automation team at Rackspace providing solutions for customers and Rackers with a focus on the bleeding edge. You can follow him on twitter at @theryanwalker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment