leifwalsh/database_engine_benchmarking_basics.md

## database_engine_benchmarking_basics.md

      
    Raw
  

              database_engine_benchmarking_basics.md
            
          
    Benchmarking database engines is a great way to start discussions, market your product, embarrass yourself, and piss people off, often all at once!
Here is a non-exhaustive checklist of things you must report if you want to be taken seriously:
Hardware


CPU
Disk
RAM
Virtualization tech if used (try to avoid, it usually sucks)
Both server and benchmark client

Software


OS
CPUs/RAM available to engines
Engine versions compared
Engine configurations
Explain non-defaults chosen

Data set


Schema (or sample elt for NoSQL)
Indexes
Avg elt size (extra credit: std dev)
Data set size
Distribution of random vals (uniform ints? Text? Zipfian?)
Sharding distribution if used

Workload


What were you trying to emulate? If you have a real use case in mind, describe it!
Inserts
Updates (single or multi?)
Point queries
Range queries (describe range size & % skipped vs. returned)
Membership queries (i.e. unique inserts)
Relative proportions of ops
Durability enforced (fsync each op? fsync on timer? fuck it mode?)
Replication factor enforced, if available

Measurement


What did you measure? Why? "Fast" isn't a measurement, "throughput" and "latency" are measurements.
Percentiles are awesome. Averages suck.
What system metrics were tracked? Where are the bottlenecks (disk or cpu at least)? If there's a bottleneck in your benchmark client imma laugh at you.