Skip to content

Instantly share code, notes, and snippets.

@chadbrewbaker
Last active December 5, 2021 19:49
Show Gist options
  • Save chadbrewbaker/0fa25acc04b7609008486d9f8078ecf3 to your computer and use it in GitHub Desktop.
Save chadbrewbaker/0fa25acc04b7609008486d9f8078ecf3 to your computer and use it in GitHub Desktop.
Show Notes

Guest Andrey Akinshin @andrey_akinshin

  • Benchmarks shoudld have a question about a business decision in mind.
  • Corner cases are routine.
  • If a few samples goes from 100ms to 5 seconds then we don't need fancy methods.
  • What are practically significant differences in the context of your problem?
  • Avoid measurements from cold starts - unless that is what you are trying to benchmark.
  • Measure of central tendency - you usually want the median not the mean.
  • Median trick - take an odd number of samples to have one number in the middle.
  • Efficiency of the Harrell-Davis quantile estimator
  • New Arxiv paper
  • Perfolizer
  • Try median absolute deviation instead of standard deviation. Plot the distribution of the medians from each experiement.
  • Effect size - median/median absoulte deviation. Understand what noise levels are.
  • Take four medians (quantiles) etc to get more resolution. Take 4*(2n+1) samples so you have four exact numbers.
  • Use sequential analysis - if you are getting steady samples then stop - don't burn your AWS bill.
  • Big four parameters: false postitive rate, false negative rate, effect size, number of measurements.
  • Watch for p-hacking repeated sampling bias.
  • Plotting distributions is helpful. Histogram, density estimation.
  • Performance is almost never a normal distribution - they are usually multimodal.
  • Quantile-respectful density estimation
  • Sheather-Jones bandwidth estimation
  • Make sure your machines have enough free disk space size before running benchmarks. Also be aware of thermal throttling.
  • Ask questions on slowest and longest runs. /usr/bin/time -v
  • Close your other programs when benchmarking - especially the browser.
  • Especially watch browser extensions that are noisy, Spotlight on MacOS, Windows Defender.
  • Use a faraday cage like a microwave to isolate phones for benchmarking.
  • Eliminate as much noise as you can - we want repeatability.
  • Story - all unit tests sped up by 10% on Saturday, slowed down on Monday. VMs had separate CPU cores, but shared disk. No disk contention on weekends.
  • Change point detectors on time series. Need to use a suite as no one algorithm is best for all distributions.
  • ED-PELT works good for short timeseries, can produce a lot of false positives.
  • Jetbrains has about 10 standard checkpoints to compare performance.
  • Evergreen
  • MP² quantile estimator: estimating the moving median without storing values - constant memory footprint.
  • Book/podcast recomendations - write some code after reading or you will forget.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment