stackedsax/blog_07_2014.md Secret

## blog_07_2014.md

      
    Raw
  

              blog_07_2014.md
            
          
    Cloud Metrics Snapshot, August 2014

An important part of Rackspace's Monitoring pipeline is the metrics that we gather in the process.  We have a small team called Cloud Metrics who is dedicated to these metrics.  We are otherwise known as the Blueflood team since we authored the blueflood.io project which is the technology at the heart of our Cloud Metrics service.  We've been hard at work improving this part of our business and some changes are underway that I think are worth sharing.
What We're Up To

In short, these are the two primary things we're working on:

We're upgrading our hardware to have even more capacity
We're making this product an http-based service and making it public

What This Means

Capacity

The upgrade to newer machines has a couple of implications, but the obvious one is our desire for increased capacity and performance.  We ingest close to 2M metrics/minute right now, but we want more.  To scale for various Rackspace-wide projects, we are expecting to increase our ingestion rate to 40-50M metrics/minute over the next year, so we are preparing ourselves for the onslaught.
Public

The more visible change will be that this service is publicly accessible.  Cloud Metrics used to be merely a step-child of the Cloud Monitoring product.  As such, it had a thrift API that the Cloud Monitoring team had developed for its own internal purposes.
We have removed the thrift API and made Cloud Metrics available as an HTTP API.  We have set up all the necessary wiring to make this just another standard-issue Rackspace service.  While the only 'customer' right now is Cloud Monitoring, these changes pave the way for any customer, big or small, to send metrics our way and retreive them through a standard-issue HTTP API.
Where We're At

We're about halfway through the changover.  Here's a breakdown of the things we've done and what we're working on right now:
Progress So Far


Our new production hardware is set up and ingesting production data as we speak
We've migrated all the old rollups to the new production hardware
We have all the wiring for the public HTTP API set up

Still In Progress


Point all queries to the new production cluster
Work out a new method of metric indexing
Deprecate the old production hardware

So, some big, big milestones accomplished; some big milestones yet to reach.
What Happens Next

Finishing all the work in progress would be a huge relief for the team and allow us to work on a world of problems and questions we have been eager to address.  Things like:

Using Blueflood as a backend to Graphite and Grafana
Annotation support in Blueflood
Integrations with other teams in Rackspace
A better data persistence layer with Kafka
Using Cloud Metrics for more than monitoring data
Courting the open-source community with the Blueflood project

We have a lot that we want to accomplish and the work that we're doing right now will set us up to achieve all of it.  I'll post another update in a couple months to let you know how far along we are.  In addition, the team is planning on writing a few articles that go into more technical depth on how we have done what we've done.