oskopek/statistical_benchmarking_ideas.adoc

## statistical_benchmarking_ideas.adoc

      
    Raw
  

              statistical_benchmarking_ideas.adoc
            
          
    Statistical benchmarking


Goal: Rank solvers agnostic of PCs, JVMs, processes, OSs, runs, … (as much as possible), depending only on the Solver. ✓


Means: Run each single benchmark multiple (N) times (on different JVMs) and evaluate the results statistically. ✓


How big should N be? (calculate statistically) (Bloch: 30+) X


Student’s t-test ✓ (on the fly? + append another run) X


Statistics - results


Have to be modular ✓


f.e.: avg, min, max, median, geom. mean, std. dev ✓


Discussion with Jirka: evaluate against a function (1 hard ~ 300 soft) X


Library support? Implement our own? X


Report ✓


Examples ✓


Box plot ✓ JFreeChart? ✓


Candlestick diagram ✓ JFreeChart? ✓


Difference vs. box plot? ✓


Violin plot ✓ JFreeChart? X


Box plot vs. Violin plot? Choose one. See this article for inspiration.


I prefer the Violin plot, but it doesn’t have an implementation in JFreeChart.


Show as a layer above the current summary/other graphs (tabs in tabs) ?


Do we have support in JFreeChart? Do we need additional libraries? ✓


After resolution ?


Test reliability: Single thread vs 2 vs 4 ?


Validate old performance blog post benchmarks ?


Research ✓


Read up on performance and statistics ✓


Implementation ✓


Excel calculation test X


Don’t forget aggregation


Benchmarking order: random latin squares ?


Descriptive statistics ✓


Score comparator! User-override! + different branch comparators (median, avg, sum, …) X


Links ?


http://www.measuringu.com/blog/small-n.php ?


http://2012books.lardbucket.org/books/beginning-statistics/s13-02-comparison-of-two-population-m.html ?