Skip to content

Instantly share code, notes, and snippets.

View sacreman's full-sized avatar
💭
Adding tremendous value

Steven Acreman sacreman

💭
Adding tremendous value
  • ThunderOps
  • United Kingdom
View GitHub Profile

Problem

tldr; We don't want to be a Datadog level solution forever that keeps the price low(ish) by only supporting light weight events. We also don't want to provide standard SaaS log pricing with one of those slide bars that increases the $$$ as the GB amount goes up until you give up and setup open source.

We want a middle ground that's cheap enough not to worry constantly about log volume, that's incredibly quick so it's nice to use in the app and covers most of the use cases for DevOps troubleshooting. So people can get alerted by time series data and drill down into the context of what happened with log data.

Long Version

Now is the opportune moment to look at event data while the time series analytics front-end is being finished. Event data is similar to time series data but differs just enough that it requires a different data store and a bunch of additional services.

My Top10 Open Source Time Series databases blog has been incredibly popular with over 10,000 views and growing. It sat on the front page of Reddit /r/programming for a day or two and we got a bunch of traffic from Hacker News and DBWeekly. The raw data in the spreadsheet that accompanies the blog has been constantly updated by a team of volunteers which now includes some of the database authors.

It has quickly become the single point of reference for anyone looking for a new time series database.

insert tweet referencing back blog to us

Someone called my blog biased on Twitter which I thought was funny. It's true that I am biased towards mostly solving my own problems and like anyone I can only draw upon my own experiences. However, I'm fairly impartial when it comes to these topics in general.

Write Performance Benchmark

This document will allow anyone to verify the benchmark result of writing 2 - 3 million metrics per second into DalmatinerDB. This is a single node benchmark to keep things simple and easily comparable between time series databases that don't support clustering.

We will setup 2 Haggar servers to generate metrics and fire them at a single node DalmatinerDB server as per this diagram.

dalmatiner benchmark

You can expect near linear performance results as a DalmatinerDB cluster is horizontally scaled.

@sacreman
sacreman / tsdb_blog.md
Last active January 7, 2024 23:25
Time Series DB Smack Down

Databases are a crazy topic and it seems everyone has an opinion. The trouble is that opinions are like belly buttons. Just because everyone has one it doesn’t mean they are useful for anything.

Time series databases (TSDB’s) in particular always provoke the usual “have you tried X”, where X is some obscure project with 50 commits back in 2009. Invariably, if X is something a bit more mainstream then yes, it probably has been played with. It’s probably good at certain things and bad at others like all software.

With all of the above in mind I decided to pen a magnum opus of my own opinions. Something I can point the HaveYouTriedX’ers at next time they make an appearance. So here it is..

My Top10 TSDB’s:

  1. DalmatinerDB (no surprise here)
  2. InfluxDB

DalmatinerDB Installation Guide (Linux)

These instructions outline how to install DalmatinerDB on a single Linux x86_64 physical server or virtual machine. Scaling out will be covered in a future document. This setup guide also covers configuring CAdvisor and Telegraf to send in monitoring data and Grafana to build dashboards.

Here's how everything connects together:

dalmatiner architecture

Create a VM

#!/usr/bin/env python
import sys
from dlcli import api
# need to set the org, account and key
settings = {
'url': 'https://app.dataloop.io/api/v1',
'org': '',
'account': '',
'key': ''
#!/usr/bin/env python
from gevent import monkey
monkey.patch_all()
from gevent.pool import Pool
import requests
import uuid
settings = {
'url': 'https://app.dataloop.io/api/v1',
'org': 'yolo',
#!/usr/bin/env python
from gevent import monkey, socket
monkey.patch_all()
from gevent.pool import Pool
import requests
import uuid
settings = {
'url': 'https://app.dataloop.io/api/v1',
'org': '',
Steven-MacPro:dalmatiner-docker sacreman$ docker logs b5cd325303fd
*** Running /etc/my_init.d/00_regen_ssh_host_keys.sh...
*** Running /etc/rc.local...
*** Booting runit daemon...
*** Runit started as PID 7
2016/03/29 12:06:33 [I] Starting Grafana
2016/03/29 12:06:33 [I] Version: 2.6.0, Commit: v2.6.0, Build date: 2015-12-14 14:18:01 +0000 UTC
2016/03/29 12:06:33 [I] Configuration Info
Config files:
[0]: /usr/share/grafana/conf/defaults.ini
@sacreman
sacreman / plugin_requirements.txt
Created March 15, 2016 15:42
Dataloop Agent plugin_requirements.txt
##
# Plugin dependencies. These are required to run the out of the box plugins
##
backports.ssl-match-hostname==3.4.0.2
BeautifulSoup==3.2.1
boto==2.23.0
cffi==0.8.6
cryptography==0.6.1
docker-py==1.4.0
futures==2.2.0