Benchmark to test if caching delegates in high-frequence code has any benefit.
so:
Print42(ConsolePrint);
void ConsolePrint(string text) => Console.WriteLine(text);
void Print42(Action<string> print) => print("42");
vs
Action<string> consolePrint = () => Console.WriteLine(text);
Print42(consolePrint);
void Print42(Action<string> print) => print("42");
Lets test with a benchmark: (using BenchmarkDotNet)
using System;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[MemoryDiagnoser]
public class DelegateBenchmark
{
private readonly Action<string> noopPrintDelegate;
public DelegateBenchmark()
{
noopPrintDelegate = NoopPrint;
}
[Benchmark(Baseline = true)]
public void RawMethod() => Print42(NoopPrint);
[Benchmark]
public void CachedDelegate() => Print42(noopPrintDelegate);
[MethodImpl(MethodImplOptions.NoOptimization | MethodImplOptions.NoInlining)]
private void NoopPrint(string text)
{
}
[MethodImpl(MethodImplOptions.NoOptimization | MethodImplOptions.NoInlining)]
private static void Print42(Action<string> print) => print("42");
}
public class Program
{
public static void Main(string[] args) => BenchmarkRunner.Run<DelegateBenchmark>();
}
Results on my laptop: (using the 3.0 preview sdk but should not matter)
BenchmarkDotNet=v0.11.5, OS=macOS Mojave 10.14.5 (18F203) [Darwin 18.6.0]
Intel Core i9-8950HK CPU 2.90GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.0.100-preview5-011568
[Host] : .NET Core 3.0.0-preview5-27626-15 (CoreCLR 4.6.27622.75, CoreFX 4.700.19.22408), 64bit RyuJIT
DefaultJob : .NET Core 3.0.0-preview5-27626-15 (CoreCLR 4.6.27622.75, CoreFX 4.700.19.22408), 64bit RyuJIT
| Method | Mean | Error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |----------:|----------:|----------:|------:|-------:|------:|------:|----------:|
| RawMethod | 14.206 ns | 0.1664 ns | 0.1475 ns | 1.00 | 0.3917 | - | - | 64 B |
| CachedDelegate | 2.401 ns | 0.0718 ns | 0.0769 ns | 0.17 | - | - | - | - |
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Ratio : Mean of the ratio distribution ([Current]/[Baseline])
Gen 0 : GC Generation 0 collects per 1000 operations
Gen 1 : GC Generation 1 collects per 1000 operations
Gen 2 : GC Generation 2 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 ns : 1 Nanosecond (0.000000001 sec)
So quite a big difference (14 ns
vs 2.4 ns
). So why is the caching faster?
This is the il that gets generated for both methods: (easy to get using sharplab.io)
.method public hidebysig
instance void RawMethod () cil managed
{
// Method begins at RVA 0x206c
// Code size 19 (0x13)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldftn instance void DelegateBenchmark::NoopPrint(string)
IL_0007: newobj instance void class [mscorlib]System.Action`1<string>::.ctor(object, native int)
IL_000c: call void DelegateBenchmark::Print42(class [mscorlib]System.Action`1<string>)
IL_0011: nop
IL_0012: ret
} // end of method DelegateBenchmark::RawMethod
.method public hidebysig
instance void CachedDelegate () cil managed
{
// Method begins at RVA 0x2080
// Code size 13 (0xd)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldfld class [mscorlib]System.Action`1<string> DelegateBenchmark::noopPrintDelegate
IL_0006: call void DelegateBenchmark::Print42(class [mscorlib]System.Action`1<string>)
IL_000b: nop
IL_000c: ret
} // end of method DelegateBenchmark::CachedDelegate
On the RawMethod
method it needs to build the delegate:
IL_0007: newobj instance void class [mscorlib]System.Action`1<string>::.ctor(object, native int)
While on the CachedDelegate
method it just invokes the method on our existing delegate.
So that explains the extra cost. Just to be clear this is a super tiny cost, but if your code gets called enough times it can end up mattering.
Just for fun i dug a bit into what happens when you create a delegate:
Action<T>
is defined as delegate void Action<T>(T arg)
in the bcl.
For every delegate the compiler generates a class:
.class nested private auto ansi sealed Action`1<T>
extends [mscorlib]System.MulticastDelegate
{
// Methods
.method public hidebysig specialname rtspecialname
instance void .ctor (
object 'object',
native int 'method'
) runtime managed
{
} // end of method Action`1::.ctor
.method public hidebysig newslot virtual
instance void Invoke (
!T arg
) runtime managed
{
} // end of method Action`1::Invoke
.method public hidebysig newslot virtual
instance class [mscorlib]System.IAsyncResult BeginInvoke (
!T arg,
class [mscorlib]System.AsyncCallback callback,
object 'object'
) runtime managed
{
} // end of method Action`1::BeginInvoke
.method public hidebysig newslot virtual
instance void EndInvoke (
class [mscorlib]System.IAsyncResult result
) runtime managed
{
} // end of method Action`1::EndInvoke
} // end of class Action`1
That class inherts from: MulticastDelegate. MulticastDelegate itself doesn't do much (it mostly comes into play when you 'add' delegates together).
One level deeper it inherts from: System.Delegate.
System.Delegate
is a wrapper around a target object target
(or null for a static method) and a method pointer (IntPtr methodPtr
).
This also explains why its just as fast as calling a method directly as it is just a method pointer in the end.
The method that ends up constructing the delegate is (the internal constructors there are for other scenarios):
[MethodImplAttribute(MethodImplOptions.InternalCall)]
private extern void DelegateConstruct(object target, IntPtr slot);
So this is where the managed trail ends...
If you want to continue the trail then you need to go to: comdelegate.cpp
FCIMPL3(void, COMDelegate::DelegateConstruct, Object* refThisUNSAFE, Object* targetUNSAFE, PCODE method)
{
...
Here it goes way beyond my skils but AFAIK it looks up the memoryadress to the code that was generated by the jit for that method.