icristescu/benchmarks.md

## benchmarks.md

      
    Raw
  

              benchmarks.md
            
          
    The following benchmarks are from bootstrapping the layered store and calling a freeze every 50, 500 or 4000 commits. The x axis represents the completion time of requests successfully validated by the block validator, or the prevalidator, as they are read from the logs. The x axis does not represent the commits, so a freeze does not necessarily occur at the 50th mark on the axis. The logs as well as the scripts for generating the graphs are here.
Sometimes I'll use a local benchmark, provided in the layered store branch. The default setup for these benchmark is that every commit add 1000 nodes to the previous commit and a freeze is called every 50 commits.
They can be run by executing dune exec  -- ./test/irmin-pack/layered_bench.exe --nbatches=4 --depth=1000 2> outs for instance and then using the scripts here to generate the graphs.
Blocking freeze

The bootstrapping with RO behaves very similarly to the bootstrapping in single process, except for some occasional bumps corresponding to RO reads after a freeze, where it is expected that they take longer:

and performance still seems ok after aprox 50 freezes:

(The spike does represent the freeze, we checked this by logging the time of a freeze using a file reporter.)
Concurrent freeze

If we do not introduce a pause at the beginning of a freeze, the freeze is blocking the main thread right away. By introducing a pause when the worker starts, the freeze is postponed until the next freeze is called.

For the bootstrapping a similar situation occurs too. If we have two calls at Lwt.pause towards the start of the worker, we just postpone a bit the blocking chunck:

Too many Lwt.pause

One possibility is to be to force the lwt engine to pause often (particularly after each I/O) to reschedule threads. By doing this we end up calling Lwt.pause a lot and this affects the performance.
We can observe the effect on the local benchmarks:

Even if we do not call freeze, the overall performance is affected by the many calls to Lwt.pause:

We can observe this also for the bootstrapping, where the orange line is the bootstrapping with a freeze every 50 commits and with Lwt.pause and Thread.yield for each I/O.

It is blocking, because the worker has too much to do compared to the main thread (approx. 10^5 copies/10^2 adds, not counting the finds); after each IO the two threads interchange, the worker ends up having to do more work before the next freeze.
Lwt.auto_yield instead of Lwt.return (and optimising lwt calls *)

This is branch irmin/layers and branch tezos/layers.
By using Lwt.auto_yield instead of Lwt.return after an I/O (writing to disk) we are forcing the lwt engine to reschedule the threads.

It is blocking as the worker is forced to do too much work in too little time. Without the pauses, the main thread doesn’t give a chance to the worker to execute: the blocking spike is when forcing the worker to finish a freeze, before starting a new one.
But if we give more time for the freeze to complete the lwt scheduler behaves a bit better. Here is the bootstrapping with a freeze every 500 commits:

and with a freeze every 4000 commits:

Let's go back to freeze every 500 commits and see how it behaves over a long run (approx. 400 freeze):

Also, the performance does vary at every run. As an example, here are three runs for the scenario above:

(* We rewrote some code to lighten the lwt engine's work: it consisted of removing calls to lwt.join and replacing lwt_list.iter_p with lwt_list.iter_s.)
The right balance of Lwt.pause in worker and main thread

This is branch irmin/lwt_pauses and branch tezos/lwt_pauses.
We have two parameters that can help to control how often the main and the worker thread pause. In our experiments, we observed that when both the worker and the main thread have too much to do in too little time, the calls to pause are the only moments when a switch between the two thread occurs. So by controlling when the pauses are called, we can split the worker's work in batches over over time.
When configuring the store, we can specify pause_copy (after how many objects copied, the worker thread calls pause) and pause_add (after how many objects added, the main thread calls pause).
These values are tricky to find and have to take into account how much work the main and the worker thread need to do. For bootstrapping with a freeze every 50 commits, the woker has to copy approx. 10^5 objects and the main thread has 10^2 adds (not counting the finds). With the values above, we obtain these graphs (short period zoomed, and longer period):


Other values we tested for this scenario, end up pushing the majority of the worker's thread in one blocking chunck.
Why does a freeze takes so long?

The setup for the benchmarks is:

every commit add 1000 nodes to the previous commit, so the commits are bigger and bigger;
a freeze is called every 50 commits: the red line is non-blocking (concurrent) freezes and the blue line are the blocking freeze. As the main thread doesn’t give the worker thread a chance to run, concurrent freezes are actually  blocking freezes but just postponed;
the orange line are the blocking freezes run with the option keep_max = false.


The issue here is that the freezes are longer and longer: this is because each freeze has to copy the last commit in lower (cheaper because the lower already contains some parts of the commit) and also in the new upper (very expensive, as the new upper is cleared). We can see in the orange line that the freezes that do not have to copy the commits in the new upper are faster and of roughly the same size.
In the next graph, I replaced the times for freezes with 0 so that we can see how the other commits behave.