hhoover/gist:86d50e40764cb3c314d1 Secret

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Logging in a Hybrid Cloud with Fluentd and ObjectRocket

By: Hart Hoover & Ryan Walker
Recently, the Rackspace DevOps Automation team announced a service that sends
alerts from New Relic to Rackspace support. These alerts will generate
tickets for our DevOps Engineers to respond to, so our customers can sleep
soundly when alerts are generated at 3 a.m. When combined with other data
points collected about our customers’ environments, our Engineers will identify
where issues lie and then execute the proper course of action.
While designing the infrastructure for this service, we encountered a common,
but interesting problem in that we needed to limit access to Rackspace internal
systems for security while still maintaining a public endpoint that New Relic
could talk to. Our solution was to design a service with public API endpoints
and private workers completely segregated from each other. The public API
endpoints receive alerts from New Relic and pass them to an ObjectRocket Redis
instance acting as a queue. Worker services run internally behind a
RackConnect firewall and pull messages from the queue and create alerts.
This partitions the environments very well, but did create a problem for us
with regards to log aggregation. We run an ElasticSearch/Kibana stack inside
our private environment. Behind the firewall, we use fluentd to push logs
directly to ElasticSearch. Outside the firewall, the EK stack can't be reached.
To solve this, we started using fluentd to push logs from our public API
services to an ObjectRocket MongoDB instance. Internally, we use fluentd again
to pull the logs from ObjectRocket into ElasticSearch. This gives us a single
destination for all of our environment's activities.
What is Fluentd?

Fluentd is an open source data collector that tries to structure data as JSON as
much as possible. This means that you don't have to write and maintain a bunch
of scripts to get logging data in a similar format. It's all JSON.
The power of fluentd is in its support for multiple sources and destinations.
For example, you can collect data from a Twitter stream and notify you
about it in IRC. There are tons of community plugins available.
Using Fluentd with Docker

Using the MongoDB fluentd plugin, one can easily push logs into
ObjectRocket. First, sources must be defined. Since all of our services are
using Docker we have to get our container logs into fluentd. There's a
great post that complements this one on how to accomplish log aggregation with
docker-gen and fluentd by Jason Wilder here. Once the fluentd container
is running (and docker-gen has generated the fluentd configuration), you should
have a section like this for each running container:
<source>
  type tail
  format json
  time_key time
  path /var/lib/docker/containers/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898-json.log
  pos_file /var/lib/docker/containers/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898-json.log.pos
  tag docker.container.c835298de6dd
  rotate_wait 5
</source>

This tails the container log, and keeps track of where it is in the log with a
position file. It is important to note that the tag present in this configuration
section is a fluentd tag, used to tell fluentd what to do with the data it
aggregates.
Using Fluentd with MongoDB

On the public side, we tell fluentd what to do with data with a "match". In this
case, replace the variables with actual information from your ObjectRocket
database in the same configuration file:
<match docker.**>
  type mongo
  database $DBNAME
  collection prod
  host $HOSTNAME
  port $PORT
  ssl
  capped
  capped_size 100m
  user $MONGOUSER
  password $MONGOPASS
  include_tag_key true
</match>

The setting include_tag_key tells fluentd to include the tag in the record for
the log in MongoDB. This way we know exactly which log entry belongs to which
container. Fluentd will start populating MongoDB with data, which we can then
pull down on the private side of our application.
On the private side, we still use the fluentd MongoDB plugin, but this time set
it as a source:
<source>
  type mongo_tail
  database $DBNAME
  collection prod
  host $HOSTNAME
  port $PORT
  user $MONGOUSER
  password $MONGOPASS
  ssl
  time_key time
  wait_time 5
  tag prod
  id_store_file /app/prod_last_id
</source>

Then, we provide a "match" for our logs to push them into ElasticSearch:
<match **>
  type forest
  subtype elasticsearch
  <template>
    host elasticsearch.domain.com
    port 9200
    index_name fluentd
    logstash_format true
    buffer_type memory
    type_name ${tag}
    flush_interval 3
    retry_limit 17
    retry_wait 1.0
    num_threads 1
  </template>
</match>

We're also using the forest fluentd plugin which simplifies our tagging
configuration across multiple environments.
Fluentd is a great way to aggregate your Docker logs across multiple hosts and
push them to a MongoDB database. In our case, ObjectRocket is a way station
between our public and private environments for log aggregation. Other use cases
could include real-time analytics on the data you're collecting. The best part
for our team is that we don't have to manage MongoDB, thanks to ObjectRocket's
reliability and knowledge.
About the Authors

Hart Hoover started his career at Rackspace in 2007 as a Linux Systems
Administrator, providing technical support for managed dedicated server
environments. He moved to the cloud in 2009 to help design and implement the
Managed Cloud Servers support model, leading Rackspace to be the #1 Managed
Cloud company. Hart then created and delivered cloud application and
architecture training for all of Rackspace Sales. He now serves customers as an
Engineer with the DevOps Automation team at Rackspace while leading
San Antonio DevOps, a local meetup group. You can follow him on twitter at
@hhoover.
Ryan Walker has been a Racker since 2009 and has worn many hats along the way.
Starting as a Linux Systems Administrator, Ryan has since helped design and
implement products such as Rackspace Managed Cloud Servers and Rackspace
Deployments. Currently, he works as an Engineer with the DevOps Automation team
at Rackspace providing solutions for customers and Rackers with a focus on the
bleeding edge. You can follow him on twitter at @theryanwalker.