Skip to content

Instantly share code, notes, and snippets.

@leifwalsh
Created October 9, 2014 21:43
Show Gist options
  • Save leifwalsh/279166e54410950e1169 to your computer and use it in GitHub Desktop.
Save leifwalsh/279166e54410950e1169 to your computer and use it in GitHub Desktop.

Benchmarking database engines is a great way to start discussions, market your product, embarrass yourself, and piss people off, often all at once!

Here is a non-exhaustive checklist of things you must report if you want to be taken seriously:

Hardware

  • CPU
  • Disk
  • RAM
  • Virtualization tech if used (try to avoid, it usually sucks)
  • Both server and benchmark client

Software

  • OS
  • CPUs/RAM available to engines
  • Engine versions compared
  • Engine configurations
  • Explain non-defaults chosen

Data set

  • Schema (or sample elt for NoSQL)
  • Indexes
  • Avg elt size (extra credit: std dev)
  • Data set size
  • Distribution of random vals (uniform ints? Text? Zipfian?)
  • Sharding distribution if used

Workload

  • What were you trying to emulate? If you have a real use case in mind, describe it!
  • Inserts
  • Updates (single or multi?)
  • Point queries
  • Range queries (describe range size & % skipped vs. returned)
  • Membership queries (i.e. unique inserts)
  • Relative proportions of ops
  • Durability enforced (fsync each op? fsync on timer? fuck it mode?)
  • Replication factor enforced, if available

Measurement

  • What did you measure? Why? "Fast" isn't a measurement, "throughput" and "latency" are measurements.
  • Percentiles are awesome. Averages suck.
  • What system metrics were tracked? Where are the bottlenecks (disk or cpu at least)? If there's a bottleneck in your benchmark client imma laugh at you.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment