Skip to content

Instantly share code, notes, and snippets.

@sacreman
Last active March 7, 2021 18:00
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save sacreman/b77eb561270e19ca973dd5055270fb28 to your computer and use it in GitHub Desktop.
Save sacreman/b77eb561270e19ca973dd5055270fb28 to your computer and use it in GitHub Desktop.

Write Performance Benchmark

This document will allow anyone to verify the benchmark result of writing 2 - 3 million metrics per second into DalmatinerDB. This is a single node benchmark to keep things simple and easily comparable between time series databases that don't support clustering.

We will setup 2 Haggar servers to generate metrics and fire them at a single node DalmatinerDB server as per this diagram.

dalmatiner benchmark

You can expect near linear performance results as a DalmatinerDB cluster is horizontally scaled.

Query performance and storage compression will be handled in separate benchmarks.

Benchmark Hardware

We picked a moderate size server for testing that is relatively cheap to spin up for a few hours on GCE or AWS. At the time of writing an n1-standard-16 with a local SSD disk is $0.673 per hour.

  • 1 x DalmatinerBD server
    • GCE n1-standard-16 (16 cpu, 60GB memory, 1 x 375G local SSD disk)
  • 2 x Haggar load generating servers
    • GCE n1-highcpu-8 (8 cpu, 8GB memory, 100Gb disk)

The equivalent size DalmatinerDB hardware choice on AWS would be hi1.4xlarge which is 16 cpu, 60GB memory and 2 x 1TB local SSD disks.

The Haggar servers use less than 2GB memory and around 20% cpu with negligable disk usage. Each Haggar server will generate approximately 20Mb/s of network traffic to the DalmatinerDB server.

We benchmarked the locally attached SSD disk on the GCE server with fio.

fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test \
--filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

The GCE server local storage benchmarked at approximately 20,000 IOPS write and 70,000 IOPS read.

Single Node Optimisation

Like all benchmarkes we've done a bit of tweaking for the size of the hardware. These are two settings that you would change in a real world single node scenario. The default settings are more applicable for scaling out to 5+ nodes in a cluster over time which is what we believe most people will want to do.

If you don't change these defaults you will still see performance of 2.5 million - 3 million metrics per second. However, we expect everyone else to optimise so to set a level playing field we have also.

Ring Size

The default ring size is 64 which is great for a single node that will be scaled out to more nodes in future. We changed the ring size to 16 for this benchmark as that is more appropriate for a singe node server. Changing the ring size means wiping the data.

To change the ring size edit /etc/ddb/dalmatinerdb.conf:

ring_size=16

Cache Points

The default of 120 points in cache is good for a 5 node cluster (or beyond) but isn't optimised for a single node server. Hence we bumped up this setting. You can tweak this setting between restarts to fit the size of your ram.

In /etc/ddb/dalmatinerdb.conf:

cache_points = 600

DalmatinerDB Setup

Setup the DDB DalmatinerDB server as per this setup doc:

https://gist.github.com/sacreman/9015bf466b4fa2a654486cd79b777e64

You will need to modify the disk configuration slightly for a single locally attached SSD. The setup document assumes no additional SSD disk so that it can be played with in a VM easily.

mkdir /data
zpool create -f -o ashift=12 data /dev/sdb
zfs create data/ddb -o compression=lz4 -o atime=off -o logbias=throughput
chown dalmatiner. /data/ddb

Benchmark Software

We are using a modified version of Haggar that includes the DalmatinerDB binary protocol output. To set this up:

go get github.com/dalmatinerdb/haggar

hagger01

nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar1" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar2" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar3" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar4" &

hagger02

nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar13" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar14" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar15" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar16" &

This is 8 processes simulating 50 agents each sending 6000 metrics at 1 second resolution with no batching run from 2 servers over the network.

Results

You should expect to see a consistent 2 - 3 million metrics per second. You can view your results in the Dalmatiner front end at the following address:

http://server_ip:8080/?query=SELECT%20%27dalmatinerdb%40127.0.0.1%27.%27mps%27%20BUCKET%20%27dalmatinerdb%27%20LAST%2060s

dalmatiner performance

The Haggar load testing tool takes about 10 minutes to build up to full speed. This benchmark has been left running for a few days and peformance has stayed level.

We ran the benchmark over 12 hours and calculated the 6 hour average, min and max throughput. The differences are caused by the 1 second jitter in the Haggar benchmark tool which is designed to more closely emulate a real world scenario of a slight fluctuation.

Max

dalmatiner max

Avg

dalmatiner avg

Min

dalmatiner min

During peak load DalmatinerDB uses approximately 50% cpu on all 16 cores, approximately 50GB memory and disks spike to 30M/s read and 50M/s write.

DalmatinerDB is bottlenecked by memory on this benchmark. On tests performed with 100GB+ memory DalmatinerDB starts to bottleneck on CPU and disk at approximately 4 million metrics per second. If you have the money and the time feel free to run this benchmark on a mega box and let us know what numbers you get.

Storage

Although the purpose of this benchmark was not to test storage efficiency we did end up with a 12 hour data set. DalmatinerDB advertises 1 byte per data point after compression. In this particular test the storage is 3.5 bits (not byte!) per datapoint.

root@ddb-bench:~# zfs get all data/ddb | grep compressratio
data/ddb  compressratio         18.55x                 -
data/ddb  refcompressratio      18.55x
@saichandra286
Copy link

share proper installtion doc on ubuntu 16.04, present docs is not working seems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment