Write Performance Benchmark
This document will allow anyone to verify the benchmark result of writing 2 - 3 million metrics per second into DalmatinerDB. This is a single node benchmark to keep things simple and easily comparable between time series databases that don't support clustering.
We will setup 2 Haggar servers to generate metrics and fire them at a single node DalmatinerDB server as per this diagram.
You can expect near linear performance results as a DalmatinerDB cluster is horizontally scaled.
Query performance and storage compression will be handled in separate benchmarks.
We picked a moderate size server for testing that is relatively cheap to spin up for a few hours on GCE or AWS. At the time of writing an n1-standard-16 with a local SSD disk is $0.673 per hour.
- 1 x DalmatinerBD server
- GCE n1-standard-16 (16 cpu, 60GB memory, 1 x 375G local SSD disk)
- 2 x Haggar load generating servers
- GCE n1-highcpu-8 (8 cpu, 8GB memory, 100Gb disk)
The equivalent size DalmatinerDB hardware choice on AWS would be hi1.4xlarge which is 16 cpu, 60GB memory and 2 x 1TB local SSD disks.
The Haggar servers use less than 2GB memory and around 20% cpu with negligable disk usage. Each Haggar server will generate approximately 20Mb/s of network traffic to the DalmatinerDB server.
We benchmarked the locally attached SSD disk on the GCE server with fio.
fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test \ --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
The GCE server local storage benchmarked at approximately 20,000 IOPS write and 70,000 IOPS read.
Single Node Optimisation
Like all benchmarkes we've done a bit of tweaking for the size of the hardware. These are two settings that you would change in a real world single node scenario. The default settings are more applicable for scaling out to 5+ nodes in a cluster over time which is what we believe most people will want to do.
If you don't change these defaults you will still see performance of 2.5 million - 3 million metrics per second. However, we expect everyone else to optimise so to set a level playing field we have also.
The default ring size is 64 which is great for a single node that will be scaled out to more nodes in future. We changed the ring size to 16 for this benchmark as that is more appropriate for a singe node server. Changing the ring size means wiping the data.
To change the ring size edit /etc/ddb/dalmatinerdb.conf:
The default of 120 points in cache is good for a 5 node cluster (or beyond) but isn't optimised for a single node server. Hence we bumped up this setting. You can tweak this setting between restarts to fit the size of your ram.
cache_points = 600
Setup the DDB DalmatinerDB server as per this setup doc:
You will need to modify the disk configuration slightly for a single locally attached SSD. The setup document assumes no additional SSD disk so that it can be played with in a VM easily.
mkdir /data zpool create -f -o ashift=12 data /dev/sdb zfs create data/ddb -o compression=lz4 -o atime=off -o logbias=throughput chown dalmatiner. /data/ddb
We are using a modified version of Haggar that includes the DalmatinerDB binary protocol output. To set this up:
go get github.com/dalmatinerdb/haggar
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar1" & nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar2" & nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar3" & nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar4" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar13" & nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar14" & nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar15" & nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar16" &
This is 8 processes simulating 50 agents each sending 6000 metrics at 1 second resolution with no batching run from 2 servers over the network.
You should expect to see a consistent 2 - 3 million metrics per second. You can view your results in the Dalmatiner front end at the following address:
The Haggar load testing tool takes about 10 minutes to build up to full speed. This benchmark has been left running for a few days and peformance has stayed level.
We ran the benchmark over 12 hours and calculated the 6 hour average, min and max throughput. The differences are caused by the 1 second jitter in the Haggar benchmark tool which is designed to more closely emulate a real world scenario of a slight fluctuation.
During peak load DalmatinerDB uses approximately 50% cpu on all 16 cores, approximately 50GB memory and disks spike to 30M/s read and 50M/s write.
DalmatinerDB is bottlenecked by memory on this benchmark. On tests performed with 100GB+ memory DalmatinerDB starts to bottleneck on CPU and disk at approximately 4 million metrics per second. If you have the money and the time feel free to run this benchmark on a mega box and let us know what numbers you get.
Although the purpose of this benchmark was not to test storage efficiency we did end up with a 12 hour data set. DalmatinerDB advertises 1 byte per data point after compression. In this particular test the storage is 3.5 bits (not byte!) per datapoint.
root@ddb-bench:~# zfs get all data/ddb | grep compressratio data/ddb compressratio 18.55x - data/ddb refcompressratio 18.55x