rogeralsing/benchmark.md Secret

## benchmark.md

      
    Raw
  

              benchmark.md
            
          
    (Running these benchmarks with and without PGO and related compiler settings is relevant too)
Spawn benchmark

https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/SpawnBenchmark
cd into /benchmarks/SpawnBenchmark
dotnet run with release mode

The same benchmark exists here https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/ProtoActorBenchmarks
as an Benchmark.NET version for more accurate numbers.

Purpose of the benchmark

Measures the time it takes to spawn new actors
The benchmark is based on https://github.com/atemerev/skynet
One key part here is the ProcessRegistry which is basically a dict from actor name to actor process.
This dict is currently created as a large array of vanilla .net dicts. where the actor name is first hashed to select the array entry. the entry contains the dictionary we use. this is done within thread locks.
This might sound like a strange approach, but have historically had better numbers than using a concurrent dictionary.
As this is a long-lived registry. and keys are distributed across all these buckets. we mostly have no contention on these dictionaries.
The language Erlang have been known for it's ability to quickly spawn new processes (it's actors, and not to be confused with OS processes)
Beating Erlang would be nice, maybe we already do, but that is a good target to orient around.
In process benchmark

https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/InprocessBenchmark
cd into /benchmarks/InprocessBenchmark
dotnet run with release mode

The same benchmark exists here https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/ProtoActorBenchmarks
as an Benchmark.NET version for more accurate numbers.

Purpose of the benchmark

To measure the actor mailbox pipeline.
How fast can we move messages from one actor to another.
This includes:

placing the messages on the mailbox
scheduling the actor if not already scheduled
process the message from the mailbox into the actor receive pipeline

Some good to know details here:
A message can either be any object, or MessageEnvelope.
The message envelope is a special message that carries extra information.
The extra data here is message headers, think similar to HTTP headers.
And a Sender PID, meaning which actor sent the message.
This is in order to limit allocations on message passing as in many cases, sender or headers are not relevant.
And thus no need to allocate an extra object.
Another aspect here is async statemachines.
I have to the best of my ability tried to optimize that away, returning completed tasks wherever possible.
All of the code in the mailbox to actor receive pipeline is basically split in two.
check if the task is completed, short-circuit and return that.
if not, fall over into async mode.
Due to the mix of object vs MessageEnvelope, and completed tasks vs non completed tasks, the resulting code is pretty ugly.
Remote benchmark

https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/RemoteBenchmark
two consoles
cd into /benchmarks/RemoteBenchmark/Node2
dotnet run with release mode
cd into /benchmarks/RemoteBenchmark/Node1
dotnet run with release mode
This benchmark does not have an Benchmark.NET version.
Maybe it should, maybe it doesn't matter due to IO, network and other inertia in that entire flow.
Purpose of the benchmark

To measure the overhead of moving messages over the network.
This is using Google Protobuf for serialization, and we are using gRPC streams with batching envelopes to optimize this.
Whenever messages arrive into our endpoint writers, they are buffered, and once the buffer is either full, or the endpoint decides to flush data, that is written as a message batch envelope to the gRPC stream.
I think it would be relevant to run this both over real network and over loopback, as both gives insights into performance charactersistics.
e.g. how much is serialization CPU, how much is the network bandwith etc.
Cluster benchmark

https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/AutoClusterBenchmark
cd into /benchmarks/AutoClusterBenchmark
dotnet run with release mode
This one does not have a Benchmark.NET version either for the same reasons as the remote benchmark.
Purpose of the benchmark

The cluster support gives Proto.Actor "virtual actor" support, the same as Microsoft Orleans.
This means that we need to be able to locate actors across many machines in a network, and compensating for failures, timeouts etc.
There are a lot of things going on here.
E.g. distributed hash table to look up where actors live.
A gossip protocol to share state across the cluster.
Rendezvous algorithm implementations etc etc.