Skip to content

Instantly share code, notes, and snippets.

View sacreman's full-sized avatar
💭
Adding tremendous value

Steven Acreman sacreman

💭
Adding tremendous value
  • ThunderOps
  • United Kingdom
View GitHub Profile
{
"dashboard": {
"title": "Hosts",
"description": "Basic host stats: CPU, Memory Usage, Disk Utilisation, Filesystem usage and Predicted time to filesystems filling",
"id": null,
"rows": [{
"collapse": false,
"editable": true,
"height": "250px",
"panels": [{

My Top10 Open Source Time Series databases blog has been incredibly popular with over 10,000 views and growing. It sat on the front page of Reddit /r/programming for a day or two and we got a bunch of traffic from Hacker News and DBWeekly. The raw data in the spreadsheet that accompanies the blog has been constantly updated by a team of volunteers which now includes some of the database authors.

It has quickly become the single point of reference for anyone looking for a new time series database.

insert tweet referencing back blog to us

Someone called my blog biased on Twitter which I thought was funny. It's true that I am biased towards mostly solving my own problems and like anyone I can only draw upon my own experiences. However, I'm fairly impartial when it comes to these topics in general.

@sacreman
sacreman / graphite.py
Created July 8, 2015 13:44
Statsite Graphite + Dataloop Sink
"""
Supports flushing metrics to graphite
"""
import sys
import socket
import logging
class GraphiteStore(object):
def __init__(self, host="localhost", port=2003, prefix="statsite.", attempts=3):

Monitoring Kubernetes with Prometheus + Long Term Storage

Running an online service isn't easy. Every day you make complex decisions about how to solve problems and often there is no right or wrong answer, there are just different ways with different results. On the infrastructure side you have to weigh up where everything will be hosted. Is that on a cloud service like AWS, or in your own data centres, or any number of other options, perhaps even a mix.

Monitoring choices are equally hard. There are the tools that are familiar and a known quantity, some new ones that look interesting from reading blogs, and then the option to buy one of any number of SaaS products.

Let's imagine for the sake of brevity of this blog that you are looking to move into AWS from your traditional data centre and want to upgrade from your Nagios, Graphite and StatsD stack to something a bit newer. This is actually an incredibly common scenario that we see every day.

The first decision to make is to analyse up front whether to build or buy. To properly make that decision you'll need to

  • aws
  • awsapi
  • metricsBrowser
  • aws-i

Introduction

Welcome to the Dataloop API documentation!

To use the API you'll need an api key which can be created in Dataloop under your user account settings. When integrating services you may want to consider creating an application specific user in Dataloop with access to accounts at the correct role level.

You will also need to know the organisation name and account name that you want to work with. These match the organisation and account names in Dataloop. Use these details where you see <org name> and <account name> in the examples.

#!/usr/bin/env python
import sys
from dlcli import api
'''
Returns a sum of the number of agents that have returned the base.count metric in the last minute.
You will need to update the TAG, org, account and key variables in the settings below.
'''
#!/usr/bin/env python
import sys
from dlcli import api
# need to set the org, account and key
settings = {
'url': 'https://app.dataloop.io/api/v1',
'org': '',
'account': '',
'key': ''

Problem

tldr; We don't want to be a Datadog level solution forever that keeps the price low(ish) by only supporting light weight events. We also don't want to provide standard SaaS log pricing with one of those slide bars that increases the $$$ as the GB amount goes up until you give up and setup open source.

We want a middle ground that's cheap enough not to worry constantly about log volume, that's incredibly quick so it's nice to use in the app and covers most of the use cases for DevOps troubleshooting. So people can get alerted by time series data and drill down into the context of what happened with log data.

Long Version

Now is the opportune moment to look at event data while the time series analytics front-end is being finished. Event data is similar to time series data but differs just enough that it requires a different data store and a bunch of additional services.