An important part of Rackspace's Monitoring pipeline is the metrics that we gather in the process. We have a small team called Cloud Metrics who is dedicated to these metrics. We are otherwise known as the Blueflood team since we authored the blueflood.io project which is the technology at the heart of our Cloud Metrics service. We've been hard at work improving this part of our business and some changes are underway that I think are worth sharing.
In short, these are the two primary things we're working on:
- We're upgrading our hardware to have even more capacity
- We're making this product an http-based service and making it public
The upgrade to newer machines has a couple of implications, but the obvious one is our desire for increased capacity and performance. We ingest close to 2M metrics/minute right now, but we want more. To scale for various Rackspace-wide projects, we are expecting to increase our ingestion rate to 40-50M metrics/minute over the next year, so we are preparing ourselves for the onslaught.
The more visible change will be that this service is publicly accessible. Cloud Metrics used to be merely a step-child of the Cloud Monitoring product. As such, it had a thrift API that the Cloud Monitoring team had developed for its own internal purposes.
We have removed the thrift API and made Cloud Metrics available as an HTTP API. We have set up all the necessary wiring to make this just another standard-issue Rackspace service. While the only 'customer' right now is Cloud Monitoring, these changes pave the way for any customer, big or small, to send metrics our way and retreive them through a standard-issue HTTP API.
We're about halfway through the changover. Here's a breakdown of the things we've done and what we're working on right now:
- Our new production hardware is set up and ingesting production data as we speak
- We've migrated all the old rollups to the new production hardware
- We have all the wiring for the public HTTP API set up
- Point all queries to the new production cluster
- Work out a new method of metric indexing
- Deprecate the old production hardware
So, some big, big milestones accomplished; some big milestones yet to reach.
Finishing all the work in progress would be a huge relief for the team and allow us to work on a world of problems and questions we have been eager to address. Things like:
- Using Blueflood as a backend to Graphite and Grafana
- Annotation support in Blueflood
- Integrations with other teams in Rackspace
- A better data persistence layer with Kafka
- Using Cloud Metrics for more than monitoring data
- Courting the open-source community with the Blueflood project
We have a lot that we want to accomplish and the work that we're doing right now will set us up to achieve all of it. I'll post another update in a couple months to let you know how far along we are. In addition, the team is planning on writing a few articles that go into more technical depth on how we have done what we've done.
"What happens next" refers to using Blueflood as a backend to UI systems, but a newcomer might not know that BF underpins the entire system, or that is an open source project.