Skip to content

Instantly share code, notes, and snippets.

@rogeralsing
Last active January 26, 2022 21:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rogeralsing/12d16983f9d2e27d7c4b39cd25a64b18 to your computer and use it in GitHub Desktop.
Save rogeralsing/12d16983f9d2e27d7c4b39cd25a64b18 to your computer and use it in GitHub Desktop.

(Running these benchmarks with and without PGO and related compiler settings is relevant too)

Spawn benchmark

https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/SpawnBenchmark

cd into /benchmarks/SpawnBenchmark

dotnet run with release mode

The same benchmark exists here https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/ProtoActorBenchmarks as an Benchmark.NET version for more accurate numbers.

Purpose of the benchmark

Measures the time it takes to spawn new actors The benchmark is based on https://github.com/atemerev/skynet

One key part here is the ProcessRegistry which is basically a dict from actor name to actor process. This dict is currently created as a large array of vanilla .net dicts. where the actor name is first hashed to select the array entry. the entry contains the dictionary we use. this is done within thread locks.

This might sound like a strange approach, but have historically had better numbers than using a concurrent dictionary. As this is a long-lived registry. and keys are distributed across all these buckets. we mostly have no contention on these dictionaries.

The language Erlang have been known for it's ability to quickly spawn new processes (it's actors, and not to be confused with OS processes)

Beating Erlang would be nice, maybe we already do, but that is a good target to orient around.

In process benchmark

https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/InprocessBenchmark

cd into /benchmarks/InprocessBenchmark

dotnet run with release mode

The same benchmark exists here https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/ProtoActorBenchmarks as an Benchmark.NET version for more accurate numbers.

Purpose of the benchmark

To measure the actor mailbox pipeline. How fast can we move messages from one actor to another.

This includes:

  • placing the messages on the mailbox
  • scheduling the actor if not already scheduled
  • process the message from the mailbox into the actor receive pipeline

Some good to know details here: A message can either be any object, or MessageEnvelope. The message envelope is a special message that carries extra information. The extra data here is message headers, think similar to HTTP headers. And a Sender PID, meaning which actor sent the message.

This is in order to limit allocations on message passing as in many cases, sender or headers are not relevant. And thus no need to allocate an extra object.

Another aspect here is async statemachines. I have to the best of my ability tried to optimize that away, returning completed tasks wherever possible. All of the code in the mailbox to actor receive pipeline is basically split in two. check if the task is completed, short-circuit and return that. if not, fall over into async mode.

Due to the mix of object vs MessageEnvelope, and completed tasks vs non completed tasks, the resulting code is pretty ugly.

Remote benchmark

https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/RemoteBenchmark

two consoles

cd into /benchmarks/RemoteBenchmark/Node2

dotnet run with release mode

cd into /benchmarks/RemoteBenchmark/Node1

dotnet run with release mode

This benchmark does not have an Benchmark.NET version. Maybe it should, maybe it doesn't matter due to IO, network and other inertia in that entire flow.

Purpose of the benchmark

To measure the overhead of moving messages over the network. This is using Google Protobuf for serialization, and we are using gRPC streams with batching envelopes to optimize this. Whenever messages arrive into our endpoint writers, they are buffered, and once the buffer is either full, or the endpoint decides to flush data, that is written as a message batch envelope to the gRPC stream.

I think it would be relevant to run this both over real network and over loopback, as both gives insights into performance charactersistics. e.g. how much is serialization CPU, how much is the network bandwith etc.

Cluster benchmark

https://github.com/asynkron/protoactor-dotnet/tree/dev/benchmarks/AutoClusterBenchmark

cd into /benchmarks/AutoClusterBenchmark

dotnet run with release mode

This one does not have a Benchmark.NET version either for the same reasons as the remote benchmark.

Purpose of the benchmark

The cluster support gives Proto.Actor "virtual actor" support, the same as Microsoft Orleans. This means that we need to be able to locate actors across many machines in a network, and compensating for failures, timeouts etc.

There are a lot of things going on here. E.g. distributed hash table to look up where actors live. A gossip protocol to share state across the cluster. Rendezvous algorithm implementations etc etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment