gs0510/notes.md

## notes.md

      
    Raw
  

              notes.md
            
          
    How does sandmark work and the context behind the decisions:


codespeed - looked at how the folks python, Rust and LLVM were visualizing benchmarks, and there wasn’t a straightforward way to use the Rust benchmarks visualizer (also it wasn’t very stable at the time) or the LLVM benchmarks visualizer. Codespeed was being used by a couple of projects, Python’s pypy for example and seemed like a good choice.

Within codespeed, there’s also an artifacts directory where you can push data (for example the perf data!). There's also sandmark-analyze where the data from artifacts is put into a jupyter notebook if someone wants to graph it differently.
ocamlspeed is a fork because there were bugs in codespeed and getting the patches upstream takes time.
sandmark  is not relevant to our use case since even though it does provide the infrastructure, it moves the compiler and not the code. So you specify the opam package you want to run the compiler benchmarks for, and sandmark builds those packages and runs the benchmarks.


The python scripts in the ocaml-benchmark-scripts are a way to patch things together where it fetches from git, triggers the benchmark suite and uploads to codespeed.
The python scripts also use git first-parent because in a merge there's an interleaved timeline and when a new benchmark is introduced, they wanted to test the benchmark against an older commit in the compiler.
configuring the machine to be deterministic. The way to do it is documented here
F * daily slack -> most folks don't look at the ocaml benchmarks mostly because there's no push notification
yaml for running the sandmark benchmarks on a branch/pr (so one doesn't necessarily need to have a PR!)
In ocaml-current one can have only the benchmarking docker container run on the machine by specifying the resources as 1 here.
sandmark branch in ocurrent

Sandmark runs different commands on different branches. If you want to run the same command on all branches, then the code in examples/github.ml might be useful.


A rough Plan of Action:


extract orun to it’s own opam package out of sandmark
split the benchmarks in index

allow to run specific benchmarks
regex the benchmarks
https://github.com/mirage/alcotest for "inspiration" :D


Specify a pipeline step in ocurrent to run the benchmarks (will have to run our own fork of ocurrent since the interface right now is comon for everyone)
Push the data to some visualisation tool
Have a push notification to a slack channel or a daily email so that the benchmark is looked at :D
Get and configure a dedicated machine to run benchmarks.


Questions!


What about the I/O benchmarks?

Should I/O benchmarks be different from other benchmarks?
A separate step in the ocurrent pipeline?


Is there a way we can use the occurent CI system to trigger the benchmarks instead of having a couple of python scripts and a crontab?
What benchmarks do our end users care about?
Where does the data from ocaml-ci go? Can I specify a step in the pipeline where it pushes the data to some visualisation tool?
Are there visualisation tools one can use where I can see both perf data and orun data?