Guest Andrey Akinshin @andrey_akinshin
- Benchmarks shoudld have a question about a business decision in mind.
- Corner cases are routine.
- If a few samples goes from 100ms to 5 seconds then we don't need fancy methods.
- What are practically significant differences in the context of your problem?
- Avoid measurements from cold starts - unless that is what you are trying to benchmark.
- Measure of central tendency - you usually want the median not the mean.
- Median trick - take an odd number of samples to have one number in the middle.
- Efficiency of the Harrell-Davis quantile estimator
- New Arxiv paper
- Perfolizer
- Try median absolute deviation instead of standard deviation. Plot the distribution of the medians from each experiement.
- Effect size - median/median absoulte deviation. Understand what noise levels are.
- Take four medians (quantiles) etc to get more resolution. Take 4*(2n+1) samples so you have four exact numbers.
- Use sequential analysis - if you are getting steady samples then stop - don't burn your AWS bill.
- Big four parameters: false postitive rate, false negative rate, effect size, number of measurements.
- Watch for p-hacking repeated sampling bias.
- Plotting distributions is helpful. Histogram, density estimation.
- Performance is almost never a normal distribution - they are usually multimodal.
- Quantile-respectful density estimation
- Sheather-Jones bandwidth estimation
- Make sure your machines have enough free disk space size before running benchmarks. Also be aware of thermal throttling.
- Ask questions on slowest and longest runs. /usr/bin/time -v
- Close your other programs when benchmarking - especially the browser.
- Especially watch browser extensions that are noisy, Spotlight on MacOS, Windows Defender.
- Use a faraday cage like a microwave to isolate phones for benchmarking.
- Eliminate as much noise as you can - we want repeatability.
- Story - all unit tests sped up by 10% on Saturday, slowed down on Monday. VMs had separate CPU cores, but shared disk. No disk contention on weekends.
- Change point detectors on time series. Need to use a suite as no one algorithm is best for all distributions.
- ED-PELT works good for short timeseries, can produce a lot of false positives.
- Jetbrains has about 10 standard checkpoints to compare performance.
- Evergreen
- MP² quantile estimator: estimating the moving median without storing values - constant memory footprint.
- Book/podcast recomendations - write some code after reading or you will forget.