In the previous post I have used Ref returns to return some data. I noticed that with slight changes we get totally different code generated by the JIT, which is can have a good or bad effect on our code.
In this post, I will dig deep (with WinDbg) in the JIT generated code. As forefront: I am using 64 bit machine, .net core 2.1 and RyuJIT.
I created a sample benchmark to showcase. I have a Point struct with 2 integer properties. I benchmark setting the values on the struct in 3 different ways, I show related IL and machine code impacting performance.
The benchmark code looks as follows:
[CoreJob]
public class RefReturnBenchmark
{
private int _sum = 0;
private Point _p;
[Benchmark(Baseline = true)]
public int RefMethodArg()
{
var p = new Point();
RefMethodArg2(ref p);
_sum += p.X + p.Y;
return _sum;
}
[MethodImpl(MethodImplOptions.NoInlining)]
private void RefMethodArg2(ref Point p)
{
p.X = 10;
p.Y = 11;
}
[Benchmark]
public int RefReturn()
{
ref var p = ref RefReturn2();
_sum += p.X + p.Y;
return _sum;
}
[MethodImpl(MethodImplOptions.NoInlining)]
private ref Point RefReturn2()
{
ref Point p = ref _p;
p.X = 10;
p.Y = 11;
return ref p;
}
[Benchmark]
public int RefReturnSlow()
{
ref var p = ref RefReturnSlow2();
_sum += p.X + p.Y;
return _sum;
}
[MethodImpl(MethodImplOptions.NoInlining)]
private ref Point RefReturnSlow2()
{
_p.X = 10;
_p.Y = 11;
return ref _p;
}
}
There are 3 use cases:
- RefMethodArg is passing a struct as a ref input parameter to RefMethodArg2
- RefReturnSlow is using ref returns and calling RefReturnSlow2
- RefReturn is calling RefReturn2 (it differs only a single line of code to RefReturnSlow)
The results of the Benchmark:
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.648 (1803/April2018Update/Redstone4), VM=Hyper-V Intel Xeon CPU E5-2673 v3 2.40GHz, 1 CPU, 2 logical and 2 physical cores .NET Core SDK=2.2.101 [Host] : .NET Core 2.1.9 (CoreCLR 4.6.27414.06, CoreFX 4.6.27415.01), 64bit RyuJIT Core : .NET Core 2.1.9 (CoreCLR 4.6.27414.06, CoreFX 4.6.27415.01), 64bit RyuJIT
Job=Core Runtime=Core
Method | Mean | Error | StdDev | Ratio | RatioSD |
---|---|---|---|---|---|
RefMethodArg | 2.611 ns | 0.0982 ns | 0.1796 ns | 1.00 | 0.00 |
RefReturn | 2.178 ns | 0.0928 ns | 0.1998 ns | 0.83 | 0.09 |
RefReturnSlow | 2.483 ns | 0.0981 ns | 0.2090 ns | 0.95 | 0.12 |
To admit, there is some noise, running a couple of times, we get different results in terms of difference, but the overall order remains.
What are the differences? To point them out between the first and second use-case, let's investigate the IL code for each solution.
.locals init (
[0] valuetype StructDeserializingFix.Point
)
// (no C# code)
IL_0000: ldloca.s 0
// Point p = default(Point);
IL_0002: initobj StructDeserializingFix.Point
// this.RefMethodArg2(ref p);
IL_0008: ldarg.0
IL_0009: ldloca.s 0
IL_000b: call instance void StructDeserializingFix.RefReturnBenchmark::RefMethodArg2(valuetype StructDeserializingFix.Point&)
// (no C# code)
IL_0010: ldarg.0
// this._sum += p.X + p.Y; (same for both methods)
...
.locals init (
[0] valuetype StructDeserializingFix.Point&
)
// ref Point p = this.RefReturnSlow2();
IL_0000: ldarg.0
IL_0001: call instance valuetype StructDeserializingFix.Point& StructDeserializingFix.RefReturnBenchmark::RefReturnSlow2()
IL_0006: stloc.0
// (no C# code)
IL_0007: ldarg.0
// this._sum += p.X + p.Y; (same for both methods)
...
The big difference is that RefMethodArg has a local Point, which needs to be initialized, while RefReturnSlow
is using a Point reference of a local variable in the class, and it does not need to pass it to RefReturnSlow2
(but it is being returned as a reference)
This explains why RefReturnSlow
is faster to RefMethodArg
. RefReturn
's IL looks exactly as RefReturnSlow
, only differs in the method called to populate the values on the struct, hence omitted here.
In this section let's compare the JIT-ed code of RefReturn
and RefReturnSlow
.
I load up Windbg, attach to the process, load the SOS extension for coreclr.
.loadby sos coreclr
Then examine the methods:
!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturnSlow
...
!U /d [address]
00007ffc`59c118f0 56 push rsi
00007ffc`59c118f1 4883ec20 sub rsp,20h
00007ffc`59c118f5 488bf1 mov rsi,rcx
00007ffc`59c118f8 488bce mov rcx,rsi
00007ffc`59c118fb e820f8ffff call 00007ffc`59c11120 (StructDeserializingFix.RefReturnBenchmark.RefReturnSlow2(), mdToken: 0000000006000020)
00007ffc`59c11900 8b5608 mov edx,dword ptr [rsi+8]
00007ffc`59c11903 8b08 mov ecx,dword ptr [rax]
00007ffc`59c11905 03d1 add edx,ecx
00007ffc`59c11907 035004 add edx,dword ptr [rax+4]
00007ffc`59c1190a 8bc2 mov eax,edx
00007ffc`59c1190c 894608 mov dword ptr [rsi+8],eax
00007ffc`59c1190f 4883c420 add rsp,20h
00007ffc`59c11913 5e pop rsi
00007ffc`59c11914 c3 ret
Results the two methods JIT-ed code:
!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturnSlow2
...
!U /d [address]
00007ffc`59c11930 488d4110 lea rax,[rcx+10h]
00007ffc`59c11934 488bd0 mov rdx,rax
00007ffc`59c11937 c7020a000000 mov dword ptr [rdx],0Ah
00007ffc`59c1193d 488bd0 mov rdx,rax
00007ffc`59c11940 c742040b000000 mov dword ptr [rdx+4],0Bh
00007ffc`59c11947 c3 ret
!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturn
...
!U /d [address]
00007ffc`59c11960 56 push rsi
00007ffc`59c11961 4883ec20 sub rsp,20h
00007ffc`59c11965 488bf1 mov rsi,rcx
00007ffc`59c11968 488bce mov rcx,rsi
00007ffc`59c1196b e8a0f7ffff call 00007ffc`59c11110 (StructDeserializingFix.RefReturnBenchmark.RefReturn2(), mdToken: 000000000600001e)
00007ffc`59c11970 8b5608 mov edx,dword ptr [rsi+8]
00007ffc`59c11973 8b08 mov ecx,dword ptr [rax]
00007ffc`59c11975 03d1 add edx,ecx
00007ffc`59c11977 035004 add edx,dword ptr [rax+4]
00007ffc`59c1197a 8bc2 mov eax,edx
00007ffc`59c1197c 894608 mov dword ptr [rsi+8],eax
00007ffc`59c1197f 4883c420 add rsp,20h
00007ffc`59c11983 5e pop rsi
00007ffc`59c11984 c3 ret
!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturn2
...
!U /d [address]
00007ffc`59c119a0 488d4110 lea rax,[rcx+10h]
00007ffc`59c119a4 c7000a000000 mov dword ptr [rax],0Ah
00007ffc`59c119aa c740040b000000 mov dword ptr [rax+4],0Bh
00007ffc`59c119b1 c3 ret
Comparing them, we can see that the difference is only RefReturn2
and RefReturnSlow2
, and two mov
instructions. This seems to be one of the places, where more C# code results in a more optimized and faster code.