sacreman/benchmark_blog.md

## benchmark_blog.md

      
    Raw
  

              benchmark_blog.md
            
          
    My Top10 Open Source Time Series databases blog has been incredibly popular with over 10,000 views and growing. It sat on the front page
of Reddit /r/programming for a day or two and we got a bunch of traffic from Hacker News and DBWeekly. The raw data in the spreadsheet
that accompanies the blog has been constantly updated by a team of volunteers which now includes some of the database authors.
It has quickly become the single point of reference for anyone looking for a new time series database.
insert tweet referencing back blog to us
Someone called my blog biased on Twitter which I thought was funny. It's true that I am biased towards mostly solving my own problems
and like anyone I can only draw upon my own experiences. However, I'm fairly impartial when it comes to these topics in general.
I help promote DalmatinerDB and we use it at Dataloop but if something better came along I'd give it a better review. It certainly doesn't score highest in every row, although it does whoop ass in terms of reproducible benchmarks so that probably annoys people.
insert tweet picture from french dude
I make zero money from promoting any of these databases and it actually benefits Dataloop more if my blogs are credible from a technical perspective and brutally honest. So if your database is slow, boring and bad to drive you're going to get a similar experience to what you would get from Jeremy Clarkson on the old Top Gear show. It's quite refreshing to be able to write on a subject where I don't need
to be diplomatic at all.
The part that caused most uproar was the benchmark results. Despite the spreadsheet being at least 95% unarguable fact we did seem to
attract the religious elements of various databases who felt compelled to defend the performance of their chosen solution. Admittedly, the spreadsheet did start off a bit loosely worded and has hardened up over time. We're now at a stage where those scores are pretty  defendable and are colour coded with reference links.
With that in mind here's the list again, ordered by performance.
Top Write Performance - Single Node

DalmatinerDB (3 million metrics / sec)
Akumuli (2 million metrics / sec)
Prometheus (800k metrics / sec)
InfluxDB (470k metrics / sec)
Graphite - custom setup (220k metrics / sec)
KairosDB, Blueflood, Graphite, Hawkular, Heroic, MetricTank (60k metrics / sec)
Riak TS, OpenTSDB (32k metrics / sec)
ElasticSearch (30k metrics / sec)
Druid (25k metrics / sec

Pro Tip: If you disagree with any of these numbers then open the spreadsheet, find the write performance row, check the colour to see
how accurate it is and click the link to find out why it got that number. If you still disagree provide a link to some data that can be verified.
Top Query Performance

DalmatinerDB, InfluxDB (Fast)
ElasticSearch (Moderate)
KairosDB, Blueflood, Hawkular (Slow)

We took InfluxDB's query benchmark work and extended it to cover DalmatinerDB. They had already benchmarked InfluxDB, Cassandra and ElasticSearch so it gave us a head start. Any Cassandra based databases without an external index got an inferred Slow score.
What about the ones not listed? I think we can assume that if nobody is willing to provide a benchmark then they probably aren't fast.
There isn't much incentive to publish slow or mediocre scores. There is a discussion under way currently to start performing benchmarks
for some of these databases, a bit like Aphyr did with Jepsen for testing data safety claims. Although, we're a busy bunch of people
and would prefer if database authors or users would submit some.