sacreman/ddb_benchmark.md

## ddb_benchmark.md

      
    Raw
  

              ddb_benchmark.md
            
          
    Write Performance Benchmark

This document will allow anyone to verify the benchmark result of writing 2 - 3 million metrics per second into DalmatinerDB. This is a single node benchmark to keep things simple and easily comparable between time series databases that don't support clustering.
We will setup 2 Haggar servers to generate metrics and fire them at a single node DalmatinerDB server as per this diagram.

You can expect near linear performance results as a DalmatinerDB cluster is horizontally scaled.
Query performance and storage compression will be handled in separate benchmarks.
Benchmark Hardware

We picked a moderate size server for testing that is relatively cheap to spin up for a few hours on GCE or AWS. At the time of writing an n1-standard-16 with a local SSD disk is $0.673 per hour.

1 x DalmatinerBD server

GCE n1-standard-16 (16 cpu, 60GB memory, 1 x 375G local SSD disk)


2 x Haggar load generating servers

GCE n1-highcpu-8 (8 cpu, 8GB memory, 100Gb disk)


The equivalent size DalmatinerDB hardware choice on AWS would be hi1.4xlarge which is 16 cpu, 60GB memory and 2 x 1TB local SSD disks.
The Haggar servers use less than 2GB memory and around 20% cpu with negligable disk usage. Each Haggar server will generate approximately 20Mb/s of network traffic to the DalmatinerDB server.
We benchmarked the locally attached SSD disk on the GCE server with fio.
fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test \
--filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

The GCE server local storage benchmarked at approximately 20,000 IOPS write and 70,000 IOPS read.
Single Node Optimisation

Like all benchmarkes we've done a bit of tweaking for the size of the hardware. These are two settings that you would change in a real world single node scenario. The default settings are more applicable for scaling out to 5+ nodes in a cluster over time which is what we believe most people will want to do.
If you don't change these defaults you will still see performance of 2.5 million - 3 million metrics per second. However, we expect everyone else to optimise so to set a level playing field we have also.
Ring Size

The default ring size is 64 which is great for a single node that will be scaled out to more nodes in future. We changed the ring size to 16 for this benchmark as that is more appropriate for a singe node server. Changing the ring size means wiping the data.
To change the ring size edit /etc/ddb/dalmatinerdb.conf:
ring_size=16

Cache Points

The default of 120 points in cache is good for a 5 node cluster (or beyond) but isn't optimised for a single node server. Hence we bumped up this setting. You can tweak this setting between restarts to fit the size of your ram.
In /etc/ddb/dalmatinerdb.conf:
cache_points = 600

DalmatinerDB Setup

Setup the DDB DalmatinerDB server as per this setup doc:
https://gist.github.com/sacreman/9015bf466b4fa2a654486cd79b777e64
You will need to modify the disk configuration slightly for a single locally attached SSD. The setup document assumes no additional SSD disk so that it can be played with in a VM easily.
mkdir /data
zpool create -f -o ashift=12 data /dev/sdb
zfs create data/ddb -o compression=lz4 -o atime=off -o logbias=throughput
chown dalmatiner. /data/ddb

Benchmark Software

We are using a modified version of Haggar that includes the DalmatinerDB binary protocol output. To set this up:
go get github.com/dalmatinerdb/haggar
hagger01

nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar1" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar2" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar3" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar4" &

hagger02

nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar13" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar14" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar15" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar16" &

This is 8 processes simulating 50 agents each sending 6000 metrics at 1 second resolution with no batching run from 2 servers over the network.
Results

You should expect to see a consistent 2 - 3 million metrics per second. You can view your results in the Dalmatiner front end at the following address:
http://server_ip:8080/?query=SELECT%20%27dalmatinerdb%40127.0.0.1%27.%27mps%27%20BUCKET%20%27dalmatinerdb%27%20LAST%2060s

The Haggar load testing tool takes about 10 minutes to build up to full speed. This benchmark has been left running for a few days and peformance has stayed level.
We ran the benchmark over 12 hours and calculated the 6 hour average, min and max throughput. The differences are caused by the 1 second jitter in the Haggar benchmark tool which is designed to more closely emulate a real world scenario of a slight fluctuation.
Max


Avg


Min


During peak load DalmatinerDB uses approximately 50% cpu on all 16 cores, approximately 50GB memory and disks spike to 30M/s read and 50M/s write.
DalmatinerDB is bottlenecked by memory on this benchmark. On tests performed with 100GB+ memory DalmatinerDB starts to bottleneck on CPU and disk at approximately 4 million metrics per second. If you have the money and the time feel free to run this benchmark on a mega box and let us know what numbers you get.
Storage

Although the purpose of this benchmark was not to test storage efficiency we did end up with a 12 hour data set. DalmatinerDB advertises 1 byte per data point after compression. In this particular test the storage is 3.5 bits (not byte!) per datapoint.
root@ddb-bench:~# zfs get all data/ddb | grep compressratio
data/ddb  compressratio         18.55x                 -
data/ddb  refcompressratio      18.55x