Skip to content

Instantly share code, notes, and snippets.

History

For a long time I've been really impacted by the ease of use Cassandra and CockroachDB bring to operating a data store at scale. While these systems have very different tradeoffs what they have in common is how easy it is to deploy and operate a cluster. I have experience with them with cluster sizes in the dozens, hundreds, or even thousands of nodes and in comparison to some other clustered technologies they get you far pretty fast. They have sane defaults that provide scale and high availability to people that wouldn't always understand how to achieve it with more complex systems. People can get pretty far before they have to become experts. When you start needing more extreme usage you will need to become an expert of the system just like any other piece of infrastructure. But what I really love about these systems is it makes geo-aware data placement, GDPR concerns potentially simplified and data replication and movement a breeze most of the time.

Several years ago the great [Andy Gross](ht

@kellabyte
kellabyte / hash_benchmarks_test.txt
Last active March 29, 2018 07:02
Go hashing algorithm benchmarks
BenchmarkCRC32/1-8 100000000 15.80 ns/op 63.46 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32/2-8 100000000 15.80 ns/op 126.58 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32/4-8 100000000 15.90 ns/op 250.87 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32/8-8 100000000 16.00 ns/op 498.77 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32/32-8 100000000 18.40 ns/op 1738.69 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32/64-8 100000000 21.00 ns/op 3053.22 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32/128-8 50000000 26.50 ns/op 4823.16 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32/256-8 50000000 38.80 ns/op 6596.60 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32/512-8 30000000 51.00 ns/op 10037.68 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32/1024-8 20000000 84.90 ns/op 12055.07 MB/s

If I run 20 haywire processes using tcmalloc haywire reaches 6.3 million requests/second.

killall hello_world; for i in `seq 20`; do LD_PRELOAD="./lib/gperftools/.libs/libtcmalloc.so" ./build/hello_world --balancer reuseport & done

perf top
   7.42%  hello_world              [.] http_request_buffer_pin
   6.94%  hello_world              [.] http_request_buffer_reassign_pin
   6.63%  hello_world              [.] http_parser_execute
   6.23%  libtcmalloc.so.4.3.0     [.] tc_deletearray_nothrow
   4.94%  hello_world              [.] http_request_buffer_locate

3 million requests/second

./build/hello_world --threads 20 --balancer reuseport
perf top

   9.76%  hello_world              [.] http_parser_execute
   7.85%  libc-2.21.so             [.] malloc
   4.50%  libc-2.21.so             [.] free
   3.43%  libc-2.21.so             [.] __libc_calloc

FSQual run on Linux subsystem for Windows

fsqual - file system qualification tool for asynchonus I/O
https://github.com/avikivity/fsqual

./fsqual
context switch per appending io (iodepth 1): 0 (GOOD)
context switch per appending io (iodepth 3): 0 (GOOD)
context switch per appending io (iodepth 3): 0 (GOOD)

Max throughput benchmark

Max throughput in master

./bin/wrk/wrk --script ./pipelined_get.lua --latency -d 5m -t 40 -c 760 http://server:8000 -- 32
Running 5m test @ http://server:8000
  40 threads and 760 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.16ms    6.70ms 223.68ms   93.20%
    Req/Sec   114.20k    28.04k  410.56k    72.96%

Why does this work on Rubular.com but not in Ruby code? What am I doing wrong?

Source string

Running 1s test @ http://192.168.0.2:8000\n  8 threads and 256 connections\n  Thread Stats   Avg      Stdev     Max   +/- Stdev\n    Latency     2.08ms    3.91ms  61.86ms   92.89%\n    Req/Sec    23.80k    10.45k   60.63k    74.70%\n  Latency Distribution\n     50%    1.10ms\n     75%    1.59ms\n     90%    4.03ms\n     99%   22.04ms\n  197510 requests in 1.10s, 29.76MB read\nRequests/sec: 179561.40\nTransfer/sec:     27.06MB

Ruby regex

output.match("Requests\/sec: (.*)\\n")

Setup

Kestrel only supports 1 thread currently. I've included single threaded mode Haywire benchmarks and multi-threaded mode benchmarks for comparisons.

Kestrel 1 thread HTTP pipelining enabled

Running 10s test @ http://192.168.0.101:5000
  8 threads and 32 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    71.50ms   41.78ms 212.21ms   39.50%

Req/Sec 1.70k 534.07 3.80k 77.39%

#include <iostream>
#include <sstream>
#include <fstream>
#include <haywire.h>
#include "lmdb.h"
#include "common.h"
#include "storage/store.h"
#include "storage/lmdb_store.h"
#include "haywire.h"
#include "hellcat.h"
- 10.26% hellcat [kernel.kallsyms] [k] _raw_spin_lock ◆
+ start_thread ▒
+ uv__thread_start ▒
- connection_consumer_start ▒
- 100.00% uv_run ▒
- 100.00% uv__io_poll ▒
- 99.97% uv__stream_io ▒
- 99.79% uv__read ▒
- 99.98% http_stream_on_read ▒
- 99.98% http_parser_execute ▒