Skip to content

Instantly share code, notes, and snippets.

@kunalspathak
Last active June 21, 2023 14:49
Show Gist options
  • Save kunalspathak/da3685ed70c2b1df12317c454759834d to your computer and use it in GitHub Desktop.
Save kunalspathak/da3685ed70c2b1df12317c454759834d to your computer and use it in GitHub Desktop.
Startup performance comparison of paint.net on x64 vs. arm64

Purpose

This report compares the startup performance of paint.net version 5.0.3 against dotnet version 8.0.0-preview.4.23259.5. The application was ran on Intel and Ampere machines. To make sure we are doing a comparable measurements, the application was started by affinitizing the number of cores. For x64, it was set to 0x5555 and for arm64, it was set to 0xFF. They both were ran using start /AFFINITY <mask> /WAIT dotnet.exe paintdotnet.dll. There are PaintDotnetTrace events that are logged. I used PerfView to profile the application and then studied the call stack between Started (first line of the Main() method) and Ready (all data-structures are initialized) events. The measurements are taken on Intel Cascade lake (x64) and Ampere (arm4) machine.

You will see lot of screenshots of callstack. In all of them, left side indicates x64 profile and right indicates arm64 profile.

Table of contents

Issues opened

TLDR

Below things stood out for arm64 that might make it slower during startup:

  • Flushing of instruction cache
  • More time spent in kernel than x64
  • More time spent in memory related syscalls that involves locks
  • strcmp shows different behavior
  • Few methods take longer to JIT because of more instructions (genGenerateCode()) and registers (allocateRegisters()) for arm64.

Flush Instruction Cache on Arm

The top most differentiator on arm64 is the kernal call KfLowerlrql which corresponds to flushing of instruction cache. Due to "Cache coherency" principle, arm64 must flush the instruction cache every time the runtime writes executable code to the memory. This is not needed on x64 because the spec guarantees that the processors have instruction caches that are coherent with the data caches. Meaning in x64, whenever the processor accesses memory location, it receives the most up-to-date version of data, but in arm64, it needs to broadcast the updates to all the cores. We have seen this in past with data access too, where arm64 needs to add memory barriers, which are not present on x64. Turns out that during startup, this is a dominant factor (which we can clearly see in the profile), because lot of code is being generated and written to the memory. In normal applications like TE, the app execution time dominates the "cache flusing" scenarios and hence don’t see it much in profile.

image

Searching through the runtime codebase, we can find many references to the FlushInstructionCache() and can also see in the profile below.

image

If you compare the overall call stack below, most of the dominant methods on arm64 are slower because it needs to flush the instruction cache.

image

strcmp

I noticed there were more samples of strcmp on arm64 than on x64.

image

image

I wrote a C++ program to find out if C++ compiler's x64 heuristics are different. For the below method, I use strcmp and pass it arguments which are not known to the compiler during compilation and hence will not do much optimizations.

int main(int argc, char** argv)
{
    if (strcmp(argv[1], argv[2]) == 0)
    {
        std::cout << "Yes!\n";
    }
    else
    {
        std::cout << "Hello World!\n";
    }
}

However, looking at the x64 code, it pretty much inlines the strcmp operation in the main itself.

main:
  0000000140001970: 48 83 EC 28        sub         rsp,28h
  0000000140001974: 48 8B 42 08        mov         rax,qword ptr [rdx+8]
  0000000140001978: 4C 8B 42 10        mov         r8,qword ptr [rdx+10h]
  000000014000197C: 4C 2B C0           sub         r8,rax
  000000014000197F: 90                 nop
  0000000140001980: 0F B6 10           movzx       edx,byte ptr [rax]
  0000000140001983: 42 0F B6 0C 00     movzx       ecx,byte ptr [rax+r8]
  0000000140001988: 2B D1              sub         edx,ecx
  000000014000198A: 75 07              jne         0000000140001993
  000000014000198C: 48 FF C0           inc         rax
  000000014000198F: 85 C9              test        ecx,ecx
  0000000140001991: 75 ED              jne         0000000140001980
  0000000140001993: 85 D2              test        edx,edx
  0000000140001995: 48 8D 15 6C DA 02  lea         rdx,[??_C@_05IOIEDEHB@Yes?$CB?6@]
                    00
  000000014000199C: 74 07              je          00000001400019A5
  000000014000199E: 48 8D 15 6B DA 02  lea         rdx,[??_C@_0O@NFOCKKMG@Hello?5World?$CB?6@]
                    00
  00000001400019A5: E8 06 01 00 00     call        ??$?6U?$char_traits@D@std@@@std@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@0@AEAV10@PEBD@Z
  00000001400019AA: 33 C0              xor         eax,eax
  00000001400019AC: 48 83 C4 28        add         rsp,28h
  00000001400019B0: C3                 ret

But in arm64, as seen below, it calls the strcmp routine which does the comparion. Side-note: look at the post-index increament addressing mode ldrb w9,[x1,#1]! used in strcmp routine.

main:
  0000000140002860: A9BF7BFD  stp         fp,lr,[sp,#-0x10]!
  0000000140002864: 910003FD  mov         fp,sp
  0000000140002868: AA0103E8  mov         x8,x1
  000000014000286C: A9408500  ldp         x0,x1,[x8,#8]
  0000000140002870: 940026E2  bl          strcmp
  0000000140002874: 350000E0  cbnz        w0,0000000140002890
  0000000140002878: 90000128  adrp        x8,0000000140026000
  000000014000287C: 9123C101  add         x1,x8,#0x8F0
  0000000140002880: 94000076  bl          ??$?6U?$char_traits@D@std@@@std@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@0@AEAV10@PEBD@Z
  0000000140002884: 52800000  mov         w0,#0
  0000000140002888: A8C17BFD  ldp         fp,lr,[sp],#0x10
  000000014000288C: D65F03C0  ret
  0000000140002890: 90000128  adrp        x8,0000000140026000
  0000000140002894: 9123E101  add         x1,x8,#0x8F8
  0000000140002898: 94000070  bl          ??$?6U?$char_traits@D@std@@@std@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@0@AEAV10@PEBD@Z
  000000014000289C: 52800000  mov         w0,#0
  00000001400028A0: A8C17BFD  ldp         fp,lr,[sp],#0x10
  00000001400028A4: D65F03C0  ret
  
strcmp:
  000000014000C3F0: 39400029  ldrb        w9,[x1]
  000000014000C3F4: 39400008  ldrb        w8,[x0]
  000000014000C3F8: 4B09010A  sub         w10,w8,w9
  000000014000C3FC: 3500010A  cbnz        w10,000000014000C41C
  000000014000C400: CB01000B  sub         x11,x0,x1
  000000014000C404: 13001D28  sxtb        w8,w9
  000000014000C408: 340000A8  cbz         w8,000000014000C41C
  000000014000C40C: 38401C29  ldrb        w9,[x1,#1]!
  000000014000C410: 38616968  ldrb        w8,[x11,x1]
  000000014000C414: 4B09010A  sub         w10,w8,w9
  000000014000C418: 34FFFF6A  cbz         w10,000000014000C404
  000000014000C41C: 4B0A03E8  neg         w8,w10
  000000014000C420: 531F7D09  lsr         w9,w8,#0x1F
  000000014000C424: 4B4A7D20  sub         w0,w9,w10,lsr #0x1F
  000000014000C428: D65F03C0  ret
  000000014000C42C: 00 00 00 00 

Kernel calls

Significant amount of time is spent in kernel, on arm64, it spends 6% more exclusive time than on x64 as seen in below screenshot. All the other modules like coreclr, clrjit, etc. has fewer samples for arm64.

image

There are certain Windows kernel page fault related calls that has different implementation for arm64 (KiAbortException exists for arm64) and that shows some more samples on arm64 than on x64.

image

UnmapViewOfFile2

As seen below, the calls to kernelbase APIs are slower on arm64.

image

coreclr

image

PortableThreadPool::WorkerThreadStart()

The caller view shows that WorkerThreadStart takes more time on arm64 than on x64.

image

Deep dive in callee's view, the ThreadNative::SpinWait() shows that it takes more time on arm64.

image

Likewise, RunFromThreadPoolDispatchLoop, all the callees takes more time on arm64:

image

ExecutableAllocator

image

Looking at the callees, significant time is taken in kernel calls that involves locking.

image

clrjit

Register allocation

image

Register allocation is at the top of stack on both x64 and arm64, but sample counts are higher on arm64 and that is potentially because of number of registers. X64 has 32 registers (not counting EVEX encoding) and arm64 has 64 registers. Multiple times we iterate over all the registers during register allocation and clearly arm64 would take longer. dotnet/runtime#85744 should solve that problem to some extent.

Various phases

image

fgImport

This method spends more time on coreclr side querying various information.

image

Deep dive in resolveToken, they go into allocation on arm64 and perhaps they show up in profile because it is expensive.

image

JIT time

We tried to gather the number of methods from various assemblies that are jitted and time it took to JIT them. As seen in the data below, similar JITing happened for most of the methods of the assemblies except there is significant difference in number of methods jitted for some "Unknown Assembly". We need to dig more what and why that is the case. The jit time is also significantly more on arm64. Other than that, we should investigate why jitting the methods of PaintDotNet.Fundamentals, PaintDotNet.Collections, System.Private.Corelib assemblies take longer time on arm64 than on x64.

assembly_name jit_time_arm64 jit_time_x64 jit_count_arm64 jit_count_x64 jit_count_arm64_diff jit_time_arm64_diff jit time difference %
PaintDotNet.Fundamentals 79.478 52.404 415 414 1 27.07 51.66%
PaintDotNet.Collections 136.296 113.539 824 824 0 22.76 20.04%
System.Private.CoreLib 255.713 235.012 1444 1464 -20 20.7 8.81%
(Unknown Assembly) 20.229 2.744 1 1 0 17.48 637.21%
System.Collections.Concurrent 87.488 74.21 315 315 0 13.28 17.89%
System.Linq 66.523 57.299 278 281 -3 9.22 16.10%
WindowsBase 42.385 36.304 177 177 0 6.08 16.75%
System.Collections.Immutable 39.601 33.933 217 217 0 5.67 16.70%
PaintDotNet.Windows.Core 34.011 30.229 223 227 -4 3.78 12.51%
PaintDotNet.Base 31.408 27.902 124 124 0 3.51 12.57%
TerraFX.Interop.Windows 13.51 10.773 107 106 1 2.74 25.41%
PaintDotNet.ObjectModel 13.643 11.074 82 82 0 2.57 23.20%
PaintDotNet.Windows 18.304 15.912 66 78 -12 2.39 15.03%
PaintDotNet.Windows.Framework 11.978 9.761 52 52 0 2.22 22.71%
PaintDotNet.Core 13.592 12.174 51 51 0 1.42 11.65%
PaintDotNet.Primitives 6.169 4.752 18 18 0 1.42 29.82%
DdsFileTypePlus 4.368 3.295 9 9 0 1.07 32.56%
AvifFileType 4.238 3.316 8 8 0 0.92 27.80%
PaintDotNet.Framework 2.53 1.696 10 10 0 0.83 49.17%
System.Collections 9.062 8.289 54 54 0 0.77 9.33%
ComputeSharp.D2D1 4.168 3.419 22 22 0 0.75 21.91%
WebPFileType 4.152 3.672 8 8 0 0.48 13.07%
CommunityToolkit.HighPerformance 1.622 1.446 16 16 0 0.18 12.17%
System.Security.Principal.Windows 0.342 0.174 1 1 0 0.17 96.55%
PaintDotNet.UI 0.762 0.639 3 3 0 0.12 19.25%
paintdotnet 6.007 5.947 46 46 0 0.06 1.01%
PresentationFramework 0.462 0.399 2 2 0 0.06 15.79%
PaintDotNet.SystemLayer 0.329 0.273 1 1 0 0.06 20.51%
PointerToolkit 0.174 0.214 2 2 0 -0.04 -18.69%
System.Windows.Forms.Primitives 2.895 2.95 17 17 0 -0.06 -1.86%
System.Windows.Forms 0.649 0.719 2 2 0 -0.07 -9.74%

Here are list of slowest methods:

method_name jit_time_arm64 jit_time_x64 jit_time_arm64_diff % diff
[PaintDotNet.Primitives]PaintDotNet.Imaging.PixelFormats+d__1052[System.__Canon,System.Collections.Generic.KeyValuePair2[PaintDotNet.Imaging.PixelFormat,System.__Canon]].MoveNext() 3.15 2.29 0.86 37.55%
[AvifFileType]AvifFileTypePlugin..ctor(class PaintDotNet.IFileTypeHost) 2.06 1.55 0.51 32.90%
[System.Private.CoreLib]SpanHelpers.IndexOfValueType(!!0&,!!0,int32) 1.55 1.61 -0.06 -3.73%
[System.Private.CoreLib]SpanHelpers.IndexOfValueType(!!0&,!!0,int32) 1.55 1.64 -0.09 -5.49%
[System.Private.CoreLib]SpanHelpers.IndexOfNullCharacter(wchar&) 1.82 2.14 -0.32 -14.95%
[PaintDotNet.Core]GeometryList.GetInteriorScans(!!0,value class PaintDotNet.Rendering.Matrix3x2Double) 1.67 2.02 -0.35 -17.33%

Managed code

As part of the analysis, Will Smith also tried to gather information on which managed methods are hot in both x64 and arm64 setup and that goes through tiering. Here is the analysis outcome of it.

Tier 0 Methods JITTED Count

ARM64: 20305 X64: 19816

Tier 1 Methods JITTED

ARM64 (Total 11)

- System.Runtime.CompilerServices.CastHelpers:StelemRef,"static  StelemRef"
- System.Runtime.CompilerServices.CastHelpers:LdelemaRef,"static  LdelemaRef"
- PaintDotNet.ByteUtil:InitUnscaleLookup,"static  InitUnscaleLookup"
- Force.Crc32.SafeProxy:Init," Init"
- PaintDotNet.Rendering.PixelKernels:ConvertBgra32ToPbgra32,"static  ConvertBgra32ToPbgra32"
- PaintDotNet.BufferUtil:BitwiseAllEqualToVectorized,"static  BitwiseAllEqualToVectorized"
- System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted,"static  AwaitUnsafeOnCompleted"
- PhotoSauce.MagicScaler.LookupTables:MakeUQ15Gamma,"static  MakeUQ15Gamma"
- PhotoSauce.MagicScaler.Transforms.Convolver4ChanVector:PhotoSauce.MagicScaler.Transforms.IConvolver.ConvolveSourceLine," PhotoSauce.MagicScaler.- Transforms.IConvolver.ConvolveSourceLine"
- PhotoSauce.MagicScaler.Transforms.Convolver4ChanVector:PhotoSauce.MagicScaler.Transforms.IConvolver.WriteDestLine," PhotoSauce.MagicScaler.Transforms.- IConvolver.WriteDestLine"
- System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted,"static  AwaitUnsafeOnCompleted"

X64 (Total 14)

- System.Runtime.CompilerServices.CastHelpers:StelemRef,"static  StelemRef"
- System.Runtime.CompilerServices.CastHelpers:LdelemaRef,"static  LdelemaRef"
- PaintDotNet.ByteUtil:InitUnscaleLookup,"static  InitUnscaleLookup"
- Force.Crc32.SafeProxy:Init," Init"
- PaintDotNet.SystemLayer.OS:IsSessionManagerRebootRequired,"static  IsSessionManagerRebootRequired"
- System.SpanHelpers:Fill,"static  Fill"
- System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted,"static  AwaitUnsafeOnCompleted"
- Blake2Fast.Implementation.Blake2bHashState:mixAvx2,"static  mixAvx2"
- PhotoSauce.MagicScaler.LookupTables:MakeUQ15Gamma,"static  MakeUQ15Gamma"
- PhotoSauce.MagicScaler.Converters.ConverterToLinear`2+Converter3A:convertFloatAvx2,"static  convertFloatAvx2"
- PhotoSauce.MagicScaler.Transforms.Convolver4ChanIntrinsic:PhotoSauce.MagicScaler.Transforms.IConvolver.ConvolveSourceLine," PhotoSauce.MagicScaler.Transforms.IConvolver.ConvolveSourceLine"
- PhotoSauce.MagicScaler.Transforms.Convolver4ChanIntrinsic:PhotoSauce.MagicScaler.Transforms.IConvolver.WriteDestLine," PhotoSauce.MagicScaler.Transforms.IConvolver.WriteDestLine"
- PhotoSauce.MagicScaler.Converters.ConverterFromLinear`2+Converter3A:convertFloatAvx2,"static  convertFloatAvx2"
- PaintDotNet.Rendering.PixelKernels:ConvertBgra32ToPbgra32,"static  ConvertBgra32ToPbgra32"

Managed Method CPU Stack Comparison

image

Slow Method JIT Times

These are the top three methods that take the most time to JIT on both ARM64 and X64. The table shows the comparisons. These results are tested by JITTing the methods 10,000 times and averaging them.

Method ARM64 JIT Time(ms) X64 JIT Time(ms) JIT Diff Time Jit Diff
[AvifFileType]AvifFileTypePlugin..ctor(class PaintDotNet.IFileTypeHost) 0.900781 0.810642 0.090139 10%
PaintDotNet.Rendering.GeometryList:GetInteriorScansImpl 0.049871 0.032953 0.016918 34%
PaintDotNet.Imaging.PooledBitmapAllocator:Allocate 0.026409 0.018645 0.007764 29%

CPU Stack Comparison for JITing "[AvifFileType]AvifFileTypePlugin..ctor(class PaintDotNet.IFileTypeHost)"

Left: ARM64; Right: X64

Note: Ignore "supermi!?" in the stacks image

Based upon this information, we came to the following conclusion:

  • Register Allocation: X64 is faster at JITing due to having less GPRs(General Purpose Registers) than ARM64.

    • You can see this in the CPU Stack Comparison for JITing [AvifFileType]AvifFileTypePlugin..ctor(class PaintDotNet.IFileTypeHost) by observing RegisterSelection.
  • Intrinsified .NET Calls

    • There are methods that use HW intrinsics only available on X64, or methods that make use of Vector256 for X64 and Vector128 for ARM64 respectively. Because of this, the execution times on those methods will be larger on ARM64 than X64. Below are a few of them:
    • PhotoSauce.MagicScaler.Transforms.Convolver4ChanVector:PhotoSauce.MagicScaler.Transforms.IConvolver.ConvolveSourceLine See code.
      • This method is a Tier 1 method on both X64 and ARM64. It also appears higher in the CPU stack comparison on ARM64 compared to X64.
    • PaintDotNet.Rendering.PixelKernels:ConvertBgra32ToPbgra32,"static ConvertBgra32ToPbgra32 See code.
      • Tier 1 method on X64 and ARM64.
    • System.SpanHelpers.Fill<T>(ref T refData, nuint numElements, T value) See code.
    • PhotoSauce.MagicScaler.Transforms.IConvolver.WriteDestLine(byte* tstart, byte* ostart, int ox, int ow, byte* pmapy, int smapy) See code
      • Tier 1 method on X64 and ARM64.
    • Blake2Fast.Implementation.Blake2bHashState.compress(ref byte input, uint offs, uint cb) See code
      • This method makes a call to 'mixAvx2' for X64 and 'mixScalar' for ARM64. 'mixScalar' is the non-intrinsified version.
      • mixAvx2 See code
      • mixScalar See code
    • PaintDotNet.BufferUtil:BitwiseAllEqualToVectorized See code
  • System Calls

    • When analyzing the CPU stack comparison, there was a method, IsVideoCodecAvailable, taking longer on ARM64 than X64. When looking at the source code, we can see that it is just a system call. This means we cannot do much about it.
    • PaintDotNet.SystemLayer.HeifCodecInfos.IsVideoCodecAvailable(value class CodecCategory, value class System.Guid&) See code.
  • PhotoSauce

    • PaintDotNet uses a library called PhotoSauce that makes use of a lot of HW intrinsics for X64. It appears to take more time in ARM64 than X64. See image below:
    • Left: X64; Right: ARM64 image

    Appendix

    Comparison of call stack
Name Base Test Delta Responsibility % Overweight % Interest Level
ntoskrnl!KiSystemServiceExit 2.0 1010.3 1008.3 159.04 176673.30 6
win32kfull!xxxActivateWindowWithOptions 7.0 151.0 144.0 22.71 7208.67 5
win32kfull!`anonymous namespace'::xxxLocalActivateWindow 7.0 151.0 144.0 22.71 7208.67 5
win32kfull!xxxSetWindowPos 33.0 205.0 172.0 27.13 1826.44 5
win32kfull!xxxSetWindowPosAndBand 33.0 205.0 172.0 27.13 1826.44 5
win32kfull!NtUserShowWindow 39.0 194.0 155.0 24.45 1392.70 5
win32kfull!xxxShowWindowEx 39.0 194.0 155.0 24.45 1392.70 5
win32kfull!xxxEndDeferWindowPosEx 67.0 203.0 136.0 21.45 711.30 5
win32kfull!xxxSendMessage 308.0 499.0 191.0 30.12 217.31 5
coreclr!DelayLoad_Helper 357.8 517.3 159.6 25.17 156.30 5
coreclr!DynamicHelperFixup 382.8 550.8 168.0 26.50 153.83 5
coreclr!DynamicHelperWorker 384.8 552.8 168.0 26.50 153.03 5
win32kfull!xxxSendMessageToClient 361.0 514.0 153.0 24.13 148.52 5
system.private.corelib!System.Threading.Tasks.Task::ExecuteWithThreadLocal(System.Threading.Tasks.Task&, System.Threading.Thread) 532.0 749.0 217.0 34.23 142.94 5
system.private.corelib!System.Threading.ExecutionContext::RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 532.0 749.0 217.0 34.23 142.94 5
win32kfull!xxxSendTransformableMessageTimeout 374.0 513.0 139.0 21.92 130.24 5
system.private.corelib!System.Threading.PortableThreadPool+WorkerThread::WorkerThreadStart() 720.0 984.0 264.0 41.64 128.49 5
coreclr!RunMainInternal 998.8 1360.8 362.0 57.10 127.02 5
coreclr!RunMain 998.8 1360.8 362.0 57.10 127.02 5
coreclr!Assembly::ExecuteMainMethod 998.8 1360.8 362.0 57.10 127.02 5
coreclr!CorHost2::ExecuteAssembly 998.8 1360.8 362.0 57.10 127.02 5
coreclr!coreclr_execute_assembly 998.8 1360.8 362.0 57.10 127.02 5
system.private.corelib!System.Boolean System.Threading.ThreadPoolWorkQueue::Dispatch() 712.0 969.0 257.0 40.53 126.49 5
coreclr!MethodDescCallSite::CallTargetWorker 1002.8 1362.8 360.0 56.78 125.81 5
coreclr!CallDescrWorkerInternal 1928.8 2580.8 652.0 102.84 118.46 5
coreclr!UnsafeJitFunction 541.0 715.0 174.0 27.44 112.70 5
coreclr!ClassLoader::LoadTypeHandleForTypeKey 414.0 544.5 130.5 20.58 110.42 5
coreclr!ClrFlushInstructionCache 1.0 102.0 101.0 15.93 35392.54 4
clrjit!Compiler::impResolveToken 1.0 81.0 80.0 12.62 28033.70 4
ntdll!ZwUnmapViewOfSectionEx 1.0 43.3 42.3 6.68 14839.63 4
ntoskrnl!NtUnmapViewOfSectionEx 1.0 42.3 41.3 6.52 14489.20 4
ntoskrnl!MiUnmapViewOfSection 1.0 42.3 41.3 6.52 14489.20 4
ntoskrnl!MiUnmapVad 1.0 41.3 40.3 6.36 14138.78 4
coreclr!??DomainAssembly::DoIncrementalLoad 1.0 38.0 37.0 5.84 12965.58 4
ntdll!RtlAllocateHeap 3.0 97.0 94.0 14.83 10979.87 4
win32kfull!`anonymous namespace'::xxxSendNCActivateMessage 4.0 88.0 84.0 13.25 7358.85 4
win32kfull!xxxSetForegroundWindow2 3.0 57.0 54.0 8.52 6307.58 4
win32kfull!xxxSetForegroundWindowWithOptions 3.0 57.0 54.0 8.52 6307.58 4
win32kfull!xxxSendActivateAppMessage 2.0 37.0 35.0 5.52 6132.37 4
ntoskrnl!MiDeleteVad 4.0 42.3 38.3 6.05 3359.49 4
ntoskrnl!NtMapViewOfSection 15.0 67.0 52.0 8.20 1214.79 4
ntdll!ZwMapViewOfSection 16.0 71.0 55.0 8.67 1204.57 4
ntoskrnl!MiMapViewOfSection 15.0 65.0 50.0 7.89 1168.07 4
ntdll!RtlFreeHeap 13.0 50.0 37.0 5.84 997.35 4
ntdll!RtlpFreeHeapInternal 13.0 49.0 36.0 5.68 970.40 4
coreclr!UnlockedLoaderHeap::GetMoreCommittedPages 17.0 59.0 42.0 6.62 865.75 4
ntoskrnl!MiResolveProtoPteFault 17.0 51.0 34.0 5.36 700.84 4
win32kfull!SfnINLPCREATESTRUCT 27.0 80.0 53.0 8.36 687.86 4
user32!DefWindowProcW 23.0 59.0 36.0 5.68 548.49 4
coreclr!VirtualCallStubManager::ResolveWorker 31.0 69.0 38.0 5.99 429.55 4
system.windows.forms!System.Windows.Forms.Control::OnLayout(System.Windows.Forms.LayoutEventArgs) 48.0 98.0 50.0 7.89 365.02 4
system.windows.forms!System.Windows.Forms.NativeWindow::DefWndProc(System.Windows.Forms.Message&) 87.0 177.0 90.0 14.19 362.50 4
ntoskrnl!MmAccessFault 71.8 143.0 71.2 11.23 347.77 4
system.windows.forms!System.Windows.Forms.ToolStrip::OnLayout(System.Windows.Forms.LayoutEventArgs) 48.0 95.0 47.0 7.41 343.12 4
win32kfull!NtUserSetWindowPos 34.0 66.0 32.0 5.05 329.81 4
coreclr!MethodTableBuilder::SetupMethodTable2 87.0 162.0 75.0 11.83 302.09 4
ntoskrnl!MiDispatchFault 41.0 75.0 34.0 5.36 290.59 4
system.windows.forms.primitives!System.IntPtr Interop+User32::CallWindowProcW(System.IntPtr, System.IntPtr, Interop+User32+WM, System.IntPtr, System.IntPtr) 61.0 107.0 46.0 7.26 264.25 4
system.windows.forms!System.Windows.Forms.Control::PerformLayout(System.Windows.Forms.LayoutEventArgs) 98.0 170.0 72.0 11.36 257.45 4
user32!CallWindowProcW 62.0 107.0 45.0 7.10 254.34 4
win32kfull!xxxCreateWindowEx 89.0 152.0 63.0 9.94 248.05 4
win32kfull!NtUserCreateWindowEx 89.0 152.0 63.0 9.94 248.05 4
coreclr!MulticoreJitProfilePlayer::CompileMethodDesc 68.0 114.0 46.0 7.26 237.05 4
coreclr!MulticoreJitProfilePlayer::CompileMethodInfoRecord 69.0 114.0 45.0 7.10 228.54 4
system.private.corelib!System.Object System.Reflection.RuntimeMethodInfo::Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo) 155.0 253.5 98.5 15.53 222.58 4
system.private.corelib!System.Object System.Reflection.MethodInvoker::Invoke(System.Object, System.IntPtr*, System.Reflection.BindingFlags) 155.0 253.5 98.5 15.53 222.58 4
system.windows.forms!System.Windows.Forms.Form::SetVisibleCore(System.Boolean) 119.0 194.0 75.0 11.83 220.85 4
system.windows.forms!System.Windows.Forms.Control::InvokeMarshaledCallbackDo(System.Windows.Forms.Control+ThreadMethodEntry) 123.0 200.5 77.5 12.22 220.66 4
system.private.corelib!System.Object System.Delegate::DynamicInvokeImpl(System.Object[]) 125.0 202.5 77.5 12.22 217.13 4
coreclr!ZapSig::DecodeMethod 67.0 108.0 41.0 6.47 214.44 4
system.windows.forms!System.Windows.Forms.Control::CreateControl(System.Boolean) 59.0 94.0 35.0 5.52 207.88 4
system.windows.forms!System.Windows.Forms.Control::CreateControl() 59.0 94.0 35.0 5.52 207.88 4
win32kfull!xxxUpdateWindow2 197.0 312.0 115.0 18.14 204.56 4
win32kfull!NtUserUpdateWindow 197.0 312.0 115.0 18.14 204.56 4
system.windows.forms!System.Windows.Forms.Control::CreateHandle() 76.0 120.0 44.0 6.94 202.88 4
system.windows.forms!System.Windows.Forms.Control::SetVisibleCore(System.Boolean) 128.0 202.0 74.0 11.67 202.59 4
coreclr!MethodTableBuilder::LoadExactInterfaceMap 58.0 91.0 33.0 5.20 199.38 4
coreclr!MethodDesc::PrepareILBasedCode 98.0 153.0 55.0 8.67 196.67 4
system.private.corelib!System.Object System.RuntimeType::CreateInstanceOfT() 67.0 104.0 37.0 5.84 193.52 4
system.private.corelib!System.__Canon System.Activator::CreateInstance() 67.0 104.0 37.0 5.84 193.52 4
coreclr!DelayLoad_MethodCall 116.0 179.0 63.0 9.94 190.32 4
coreclr!ClassLoader::LoadApproxTypeThrowing 59.0 91.0 32.0 5.05 190.06 4
system.windows.forms!System.Windows.Forms.Control+ControlCollection::Add(System.Windows.Forms.Control) 63.0 97.0 34.0 5.36 189.12 4
system.windows.forms!System.Windows.Forms.NativeWindow::CreateHandle(System.Windows.Forms.CreateParams) 78.0 119.0 41.0 6.47 184.20 4
windowsbase!System.Windows.DependencyObject::SetValueCommon(System.Windows.DependencyProperty, System.Object, System.Windows.PropertyMetadata, System.Boolean, System.Boolean, System.Windows.OperationType, System.Boolean) 81.0 122.0 41.0 6.47 177.37 4
coreclr!ExternalMethodFixupWorker 113.0 170.0 57.0 8.99 176.76 4
paintdotnet.base!? 177.0 266.0 89.0 14.04 176.20 4
system.windows.forms.primitives!System.IntPtr Interop+User32::CreateWindowExW(Interop+User32+WS_EX, System.String, System.String, Interop+User32+WS, System.Int32, System.Int32, System.Int32, System.Int32, System.IntPtr, System.IntPtr, System.IntPtr, System.Object) 77.0 115.0 38.0 5.99 172.94 4
System.Private.CoreLib.il!dynamicClass.InvokeStub_SendOrPostCallback.Invoke(class System.Object,class System.Object,int*) 120.0 179.0 59.0 9.31 172.29 4
coreclr!Module::FixupDelayListAux<Module ,int (__cdecl Module::)(READYTORUN_IMPORT_SECTION *,unsigned __int64,unsigned __int64 *,int)> 284.0 406.0 122.0 19.24 150.53 4
coreclr!Module::FixupNativeEntry 282.0 402.0 120.0 18.93 149.12 4
system.windows.forms!System.Windows.Forms.ToolStrip::WndProc(System.Windows.Forms.Message&) 80.0 114.0 34.0 5.36 148.93 4
coreclr!LoadDynamicInfoEntry 259.0 367.0 108.0 17.03 146.12 4
system.windows.forms!System.Windows.Forms.Control::Refresh() 102.0 144.0 42.0 6.62 144.29 4
coreclr!ReadyToRunInfo::GetEntryPoint 303.0 426.0 123.0 19.40 142.25 4
coreclr!ClassLoader::LoadTypeDefThrowing 261.0 366.5 105.5 16.63 141.58 4
user32!SendMessageWorker 87.0 120.0 33.0 5.20 132.92 4
system.private.corelib!System.Threading.Tasks.Task`1[System.__Canon]::InnerInvoke() 177.0 242.0 65.0 10.25 128.69 4
clrjit!Compiler::impImportBlock 134.0 183.0 49.0 7.73 128.14 4
coreclr!ClassLoader::CreateTypeHandleForTypeKey 307.0 418.5 111.5 17.58 127.22 4
clrjit!Compiler::impImport 135.0 184.0 49.0 7.73 127.19 4
clrjit!Compiler::fgImport 135.0 184.0 49.0 7.73 127.19 4
coreclr!ClassLoader::LoadTypeDefOrRefThrowing 244.0 332.5 88.5 13.95 127.03 4
system.windows.forms.primitives!Interop+BOOL Interop+User32::UpdateWindow(IHandle) 108.0 147.0 39.0 6.15 126.54 4
coreclr!ClassLoader::CreateTypeHandleForTypeDefThrowing 298.0 405.5 107.5 16.95 126.35 4
paintdotnet.framework!? 276.0 375.0 99.0 15.61 125.69 4
user32!CreateWindowInternal 112.0 152.0 40.0 6.31 125.15 4
coreclr!MulticoreJitProfilePlayer::PlayProfile 139.0 188.0 49.0 7.73 123.53 4
coreclr!MulticoreJitProfilePlayer::StaticJITThreadProc 139.0 188.0 49.0 7.73 123.53 4
coreclr!MulticoreJitProfilePlayer::JITThreadProc 139.0 188.0 49.0 7.73 123.53 4
clrjit!Compiler::impImportBlockCode 131.0 177.0 46.0 7.26 123.05 4
user32!CreateWindowExW 112.0 151.0 39.0 6.15 122.02 4
paintdotnet!? 1891.8 2519.8 628.0 99.05 116.33 4
system.windows.forms!System.Windows.Forms.Control::PaintWithErrorHandling(System.Windows.Forms.PaintEventArgs, System.Int16) 109.0 145.0 36.0 5.68 115.74 4
paintdotnet.windows.core!? 370.0 492.0 122.0 19.24 115.54 4
kernel32!BaseThreadInitThunk 2142.8 2836.8 694.0 109.46 113.50 4
ntdll!RtlUserThreadStart 2143.8 2836.8 693.0 109.31 113.28 4
coreclr!MethodTableBuilder::BuildMethodTableThrowing 275.0 363.5 88.5 13.95 112.71 4
paintdotnet.ui!? 210.0 276.0 66.0 10.41 110.13 4
coreclr!ThreadNative::KickOffThread_Worker 929.0 1220.0 291.0 45.90 109.77 4
coreclr!ManagedThreadBase_DispatchMiddle 930.0 1220.0 290.0 45.74 109.27 4
coreclr!ManagedThreadBase_DispatchOuter 930.0 1220.0 290.0 45.74 109.27 4
coreclr!ThreadNative::KickOffThread 931.0 1220.0 289.0 45.58 108.78 4
system.windows.forms!System.Windows.Forms.Control::InvokeMarshaledCallbacks() 581.0 746.5 165.5 26.10 99.79 4
system.windows.forms!System.Windows.Forms.Application+ThreadContext::RunMessageLoop(Interop+Mso+msoloop, System.Windows.Forms.ApplicationContext) 648.0 831.5 183.5 28.93 99.21 4
system.windows.forms!System.Windows.Forms.Application+ThreadContext::RunMessageLoopInner(Interop+Mso+msoloop, System.Windows.Forms.ApplicationContext) 648.0 831.5 183.5 28.93 99.21 4
system.windows.forms!System.Windows.Forms.Control::InvokeMarshaledCallbackHelper(System.Object) 581.0 745.5 164.5 25.94 99.19 4
paintdotnet.core!? 1029.0 1317.0 288.0 45.42 98.08 4
system.windows.forms!Interop+BOOL System.Windows.Forms.Application+ComponentManager::Interop.Mso.IMsoComponentManager.FPushMessageLoop(System.UIntPtr, Interop+Mso+msoloop, System.Void*) 586.0 749.5 163.5 25.78 97.74 4
user32!DispatchMessageWorker 585.0 747.5 162.5 25.62 97.31 4
coreclr!MethodDesc::JitCompileCodeLocked 575.0 731.0 156.0 24.60 95.07 4
system.windows.forms!System.Windows.Forms.Control::WndProc(System.Windows.Forms.Message&) 667.0 838.5 171.5 27.04 90.08 4
system.windows.forms!System.IntPtr System.Windows.Forms.NativeWindow::Callback(System.IntPtr, Interop+User32+WM, System.IntPtr, System.IntPtr) 682.0 849.5 167.5 26.41 86.04 4
user32!UserCallWinProcCheckWow 714.0 883.5 169.5 26.73 83.16 4
coreclr!ThePreStub 1049.0 1297.0 248.0 39.11 82.85 4
coreclr!MethodDesc::DoPrestub 1042.0 1286.0 244.0 38.48 82.06 4
coreclr!PreStubWorker 1048.0 1291.0 243.0 38.33 81.25 4
coreclr!CodeVersionManager::PublishVersionableCodeIfNecessary 1004.0 1229.0 225.0 35.49 78.53 4
system.private.corelib!System.Threading.ExecutionContext::RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 834.0 1008.5 174.5 27.51 73.30 4
coreclr!??MethodDesc::JitCompileCodeLockedEventWrapper 756.0 912.0 156.0 24.60 72.31 4
coreclr!MethodDesc::JitCompileCode 787.0 931.0 144.0 22.71 64.12 4
coreclr!DispatchCallSimple 1109.8 1244.0 134.2 21.17 42.38 4
win32kfull!NtUserMessageCall 71.0 27.0 -44.0 -6.94 -217.16 4
BROKEN 48.0 12.0 -36.0 -5.68 -262.82 4
combase!CSyncClientCall::SendReceive2 35.0 2.0 -33.0 -5.20 -330.40 4
combase!CSyncClientCall::SendReceive 35.0 2.0 -33.0 -5.20 -330.40 4
combase!NdrExtpProxySendReceive 35.0 2.0 -33.0 -5.20 -330.40 4
windowsbase!System.Windows.DependencyObject::SetValue(System.Windows.DependencyProperty, System.Object) 75.0 3.0 -72.0 -11.36 -336.40 4
coreclr!JIT_ClassInitDynamicClass 61.0 1.0 -60.0 -9.46 -344.68 4
user32!VerNtUserCreateWindowEx 112.0 1.0 -111.0 -17.51 -347.29 4
ntoskrnl!ExpAllocatePoolWithTagFromNode 1.0 24.0 23.0 3.63 8059.69 3
coreclr!DynamicHelpers::CreateHelper 1.0 19.3 18.3 2.89 6429.52 3
win32kfull!CalcVisRgn 1.0 16.0 15.0 2.37 5256.32 3
ntdll!LdrpValidateUserCallTarget 1.0 16.0 15.0 2.37 5256.32 3
ntoskrnl!RtlpHpAllocateHeap 1.0 15.0 14.0 2.21 4905.90 3
coreclr!VirtualCallStubManager::GenerateDispatchStub 2.0 29.0 27.0 4.26 4730.69 3
coreclr!MethodTable::MethodDataObject::`scalar deleting destructor' 1.0 14.0 13.0 2.05 4555.48 3
coreclr!operator delete[] 2.0 23.0 21.0 3.31 3679.42 3
ntoskrnl!KiSystemServiceGdiTebAccess 2.0 21.0 19.0 3.00 3329.00 3
ntoskrnl!PsInvokeWin32Callout 2.0 21.0 19.0 3.00 3329.00 3
system.drawing.common!System.Drawing.Font System.Drawing.Font::FromLogFont(Interop+User32+LOGFONT&) 2.0 16.0 14.0 2.21 2452.95 3
win32kbase!W32CalloutDispatch 2.0 16.0 14.0 2.21 2452.95 3
ntoskrnl!KiProcessThreadWaitList 3.0 24.0 21.0 3.31 2452.95 3
ntoskrnl!MiDeleteVirtualAddresses 3.0 20.3 17.3 2.74 2026.37 3
ntoskrnl!MiDeletePagablePteRange 3.0 20.3 17.3 2.74 2026.37 3
ntoskrnl!MiDeleteVaDirect 3.0 19.3 16.3 2.58 1909.56 3
win32kfull!CalcVisRgnWorker 3.0 16.0 13.0 2.05 1518.49 3
coreclr!MethodTableBuilder::bmtInterfaceEntry::CreateSlotTable 3.0 16.0 13.0 2.05 1518.49 3
coreclr!ClrEnterCriticalSection 7.0 37.0 30.0 4.73 1501.81 3
win32kbase!ReleaseCacheDC 4.0 19.0 15.0 2.37 1314.08 3
system.drawing.common!System.Drawing.Font System.Drawing.SystemFonts::get_MenuFont() 4.0 17.0 13.0 2.05 1138.87 3
win32kfull!xxxRealDefWindowProc 6.0 25.0 19.0 3.00 1109.67 3
system.private.corelib!System.Lazy`1[System.Boolean]::ViaFactory(System.Threading.LazyThreadSafetyMode) 7.0 28.0 21.0 3.31 1051.26 3
system.private.corelib!System.Lazy`1[System.Boolean]::ExecutionAndPublication(System.LazyHelper, System.Boolean) 7.0 28.0 21.0 3.31 1051.26 3
system.private.corelib!System.Boolean System.Lazy`1[System.Boolean]::CreateValue() 7.0 28.0 21.0 3.31 1051.26 3
icu!icu_internal::CanonicalIterator::setSource 6.0 21.0 15.0 2.37 876.05 3
ntoskrnl!KiExitDispatcher 7.0 24.0 17.0 2.68 851.02 3
kernelbase!CreateFileW 6.0 20.0 14.0 2.21 817.65 3
ntoskrnl!MiResolvePageTablePage 6.0 20.0 14.0 2.21 817.65 3
coreclr!LoaderHeap::RealAllocMemUnsafe 12.0 38.0 26.0 4.10 759.25 3
paintdotnet.collections!? 7.0 21.0 14.0 2.21 700.84 3
icu!icu_internal::CollationBuilder::addOnlyClosure 8.0 24.0 16.0 2.52 700.84 3
user32!RealDefWindowProcW 17.0 48.0 31.0 4.89 639.00 3
system.windows.forms!System.Boolean System.Windows.Forms.Layout.FlowLayout::LayoutCore(System.Windows.Forms.Layout.IArrangedElement, System.Windows.Forms.LayoutEventArgs) 8.0 22.0 14.0 2.21 613.24 3
win32kfull!SfnINLPWINDOWPOS 14.0 38.0 24.0 3.79 600.72 3
ntoskrnl!MiFastLockLeafPageTable 8.0 21.0 13.0 2.05 569.43 3
win32kfull!xxxSetScrollBar 9.0 22.0 13.0 2.05 506.16 3
win32kfull!NtUserSetScrollInfo 9.0 22.0 13.0 2.05 506.16 3
user32!__fnINLPWINDOWPOS 16.0 38.0 22.0 3.47 481.83 3
win32kfull!xxxSendChangedMsgs 17.0 40.0 23.0 3.63 474.10 3
icu!icu_internal::CollationBuilder::addRelation 11.0 25.0 14.0 2.21 445.99 3
icu!icu_internal::CollationRuleParser::parseRelationStrings 11.0 25.0 14.0 2.21 445.99 3
icu!icu_internal::CollationBuilder::parseAndBuild 16.0 36.0 20.0 3.15 438.03 3
icu!icu_internal::CollationRuleParser::parse 12.0 26.0 14.0 2.21 408.82 3
icu!icu_internal::CollationRuleParser::parseRuleChain 12.0 26.0 14.0 2.21 408.82 3
system.windows.forms!System.Windows.Forms.ToolStripItem+ToolStripItemInternalLayout::PerformLayout() 13.0 28.0 15.0 2.37 404.33 3
coreclr!MethodDesc::GetPrecompiledR2RCode 21.0 45.0 24.0 3.79 400.48 3
system.windows.forms!System.Boolean System.Windows.Forms.ToolStripSplitStackLayout::LayoutCore(System.Windows.Forms.Layout.IArrangedElement, System.Windows.Forms.LayoutEventArgs) 27.0 57.0 30.0 4.73 389.36 3
system.windows.forms!System.Boolean System.Windows.Forms.ToolStripSplitStackLayout::LayoutHorizontal() 26.0 54.0 28.0 4.42 377.38 3
win32kfull!xxxWrapRealDefWindowProc 13.0 26.0 13.0 2.05 350.42 3
user32!SetScrollInfo 13.0 26.0 13.0 2.05 350.42 3
coreclr!MethodDesc::GetPrecompiledCode 23.0 46.0 23.0 3.63 350.42 3
system.windows.forms!System.Windows.Forms.ComboBox::CreateHandle() 28.0 56.0 28.0 4.42 350.42 3
system.windows.forms!System.Windows.Forms.ToolStripControlHost::OnParentChanged(System.Windows.Forms.ToolStrip, System.Windows.Forms.ToolStrip) 19.0 37.0 18.0 2.84 331.98 3
ntoskrnl!MiUserFault 28.8 56.0 27.2 4.29 331.63 3
system.windows.forms!System.Windows.Forms.ToolStripDropDownMenu::OnLayout(System.Windows.Forms.LayoutEventArgs) 34.0 65.0 31.0 4.89 319.50 3
clrjit!LinearScan::allocateRegisters 26.0 49.0 23.0 3.63 309.99 3
ntoskrnl!MiAllocateVirtualMemory 18.0 33.0 15.0 2.37 292.02 3
presentationframework!System.Windows.Controls.TextBlock::.cctor() 17.0 31.0 14.0 2.21 288.58 3
system.windows.forms!System.Windows.Forms.Control::ResumeLayout(System.Boolean) 34.0 62.0 28.0 4.42 288.58 3
system.windows.forms!System.Windows.Forms.Application+ThreadContext::EndModalMessageLoop(System.Windows.Forms.ApplicationContext) 22.0 38.0 16.0 2.52 254.85 3
system.windows.forms!System.Windows.Forms.Application+ThreadContext::EnableWindowsForModalLoop(System.Boolean, System.Windows.Forms.ApplicationContext) 22.0 38.0 16.0 2.52 254.85 3
system.windows.forms!System.Int32 System.Windows.Forms.ComboBox::NativeAdd(System.Object) 20.0 34.0 14.0 2.21 245.29 3
coreclr!DelayLoad_Helper_ObjObj 22.0 37.0 15.0 2.37 238.92 3
system.windows.forms!System.Windows.Forms.Application+ThreadWindows::Enable(System.Boolean) 22.0 37.0 15.0 2.37 238.92 3
icu!icu_internal::RuleBasedCollator::internalBuildTailoring 26.0 43.0 17.0 2.68 229.12 3
coreclr!CloneCollatorWithOptions 26.0 43.0 17.0 2.68 229.12 3
icu!ucol_openRules_internal 26.0 43.0 17.0 2.68 229.12 3
system.windows.forms.primitives!System.IntPtr Interop+User32::SendMessageW(IHandle, Interop+User32+WM, System.IntPtr, System.String) 20.0 33.0 13.0 2.05 227.77 3
system.windows.forms!System.Windows.Forms.ToolStripDropDownMenu::CalculateInternalLayoutMetrics() 22.0 36.0 14.0 2.21 223.00 3
presentationframework!System.Windows.Controls.TextBox::.cctor() 33.0 54.0 21.0 3.31 223.00 3
presentationframework!System.Boolean System.Windows.Data.BindingExpressionBase::AttachOverride(System.Windows.DependencyObject, System.Windows.DependencyProperty) 36.0 58.0 22.0 3.47 214.15 3
paintdotnet.effects.core!? 23.0 37.0 14.0 2.21 213.30 3
coreclr!RuntimeMethodHandle::InvokeMethod 34.0 54.5 20.5 3.23 210.79 3
coreclr!GlobalizationNative_GetSortKey 25.0 40.0 15.0 2.37 210.25 3
system.private.corelib!System.Int32 System.CultureAwareComparer::GetHashCode(System.String) 25.0 40.0 15.0 2.37 210.25 3
system.private.corelib!System.Int32 System.Globalization.CompareInfo::GetHashCode(System.ReadOnlySpan`1[System.Char], System.Globalization.CompareOptions) 25.0 40.0 15.0 2.37 210.25 3
system.private.corelib!System.Int32 System.Globalization.CompareInfo::IcuGetHashCodeOfString(System.ReadOnlySpan`1[System.Char], System.Globalization.CompareOptions) 25.0 40.0 15.0 2.37 210.25 3
presentationframework!System.Windows.Data.BindingExpressionBase::Attach(System.Windows.DependencyObject, System.Windows.DependencyProperty) 50.0 80.0 30.0 4.73 210.25 3
presentationframework!System.Boolean System.Windows.Data.BindingExpression::AttachOverride(System.Windows.DependencyObject, System.Windows.DependencyProperty) 50.0 80.0 30.0 4.73 210.25 3
coreclr!MethodDesc::PrepareInitialCode 29.0 46.0 17.0 2.68 205.42 3
system.windows.forms!System.Windows.Forms.Control::SetBoundsCore(System.Int32, System.Int32, System.Int32, System.Int32, System.Windows.Forms.BoundsSpecified) 24.0 38.0 14.0 2.21 204.41 3
system.private.corelib!System.Boolean System.Collections.Generic.Dictionary`2[System.__Canon, System.__Canon]::TryInsert(System.__Canon, System.__Canon, System.Collections.Generic.InsertionBehavior) 26.0 41.0 15.0 2.37 202.17 3
presentationframework!System.Windows.Data.BindingExpressionBase System.Windows.Data.BindingOperations::SetBinding(System.Windows.DependencyObject, System.Windows.DependencyProperty, System.Windows.Data.BindingBase) 53.0 83.0 30.0 4.73 198.35 3
system.windows.forms!System.Windows.Forms.Control::PerformLayout() 42.0 65.0 23.0 3.63 191.90 3
coreclr!CEEInfo::getCallInfo 35.0 54.0 19.0 3.00 190.23 3
ntdll!ZwTraceEvent 28.0 43.0 15.0 2.37 187.73 3
system.windows.forms!System.Windows.Forms.ComboBox::WndProc(System.Windows.Forms.Message&) 51.0 78.0 27.0 4.26 185.52 3
system.windows.forms!System.Windows.Forms.ComboBox::OnHandleCreated(System.EventArgs) 25.0 38.0 13.0 2.05 182.22 3
user32!__fnINLPCREATESTRUCT 53.0 80.0 27.0 4.26 178.52 3
paintdotnet.resources!? 50.0 75.0 25.0 3.94 175.21 3
system.windows.forms!System.Windows.Forms.Control::SetBounds(System.Int32, System.Int32, System.Int32, System.Int32, System.Windows.Forms.BoundsSpecified) 33.0 49.0 16.0 2.52 169.90 3
system.windows.forms!System.Windows.Forms.Control::WmShowWindow(System.Windows.Forms.Message&) 30.0 44.0 14.0 2.21 163.53 3
coreclr!CEEInfo::resolveToken 60.0 86.0 26.0 4.10 151.85 3
paintdotnet.windows.core!PaintDotNet.Direct2D1.DrawingHelpers+<>c__DisplayClass2_0`2[System.__Canon,PaintDotNet.Rendering.SizeInt32].b__0(!0) 35.0 50.0 15.0 2.37 150.18 3
system.windows.forms!System.Windows.Forms.Control::WmCreate(System.Windows.Forms.Message&) 49.0 70.0 21.0 3.31 150.18 3
system.private.corelib!System.Object System.RuntimeType::CreateInstanceDefaultCtor(System.Boolean, System.Boolean) 45.0 64.0 19.0 3.00 147.96 3
coreclr!MulticoreJitProfilePlayer::HandleGenericMethodInfoRecord 57.0 80.0 23.0 3.63 141.40 3
user32!SendMessageW 79.0 110.0 31.0 4.89 137.51 3
coreclr!MethodTable::DoFullyLoad 57.0 79.0 22.0 3.47 135.25 3
coreclr!MulticoreJitProfilePlayer::HandleNonGenericMethodInfoRecord 58.0 80.0 22.0 3.47 132.92 3
system.windows.forms!System.Windows.Forms.ToolStrip::OnPaint(System.Windows.Forms.PaintEventArgs) 50.0 68.0 18.0 2.84 126.15 3
paintdotnet.fundamentals!PaintDotNet.Functional.LazyResult`2[PaintDotNet.Rendering.RectDouble,System.Int32].get_Value() 43.0 58.0 15.0 2.37 122.24 3
paintdotnet.fundamentals!PaintDotNet.Functional.LazyResult`2[PaintDotNet.Rendering.RectDouble,System.Int32].EnsureEvaluated() 43.0 58.0 15.0 2.37 122.24 3
paintdotnet.fundamentals!PaintDotNet.Functional.LazyResult`2[PaintDotNet.Rendering.RectDouble,System.Int32].ThreadSafeEvaluate() 43.0 58.0 15.0 2.37 122.24 3
paintdotnet.fundamentals!PaintDotNet.Functional.LazyResult`2[PaintDotNet.Rendering.RectDouble,System.Int32].ThreadUnsafeEvaluate() 43.0 58.0 15.0 2.37 122.24 3
paintdotnet.fundamentals!PaintDotNet.Functional.LazyResult+<>c__DisplayClass0_0`1[PaintDotNet.Rendering.RectDouble].b__0(int32) 43.0 58.0 15.0 2.37 122.24 3
system.windows.forms!System.Windows.Forms.Control+ControlCollection::AddRange(System.Windows.Forms.Control[]) 44.0 59.0 15.0 2.37 119.46 3
ntdll!RtlpAllocateHeapInternal 70.0 92.0 22.0 3.47 110.13 3
coreclr!MethodTable::CheckRunClassInitThrowing 398.8 523.3 124.6 19.65 109.47 3
coreclr!ClassLoader::LoadTypeHandleForTypeKey_Body 384.0 503.5 119.5 18.84 109.01 3
coreclr!MethodTable::DoRunClassInitThrowing 398.8 521.3 122.6 19.33 107.71 3
coreclr!MethodTable::RunClassInitEx 397.8 519.3 121.6 19.18 107.10 3
coreclr!DispatchCallDebuggerWrapper 398.8 519.3 120.6 19.02 105.96 3
win32kfull!SfnDWORD 316.0 409.0 93.0 14.67 103.13 3
coreclr!SigPointer::GetTypeHandleThrowing 409.0 522.5 113.5 17.89 97.20 3
paintdotnet.systemlayer!? 332.0 423.0 91.0 14.35 96.05 3
user32!DispatchClientMessage 410.0 508.0 98.0 15.46 83.76 3
clrjit!jitNativeCode 526.0 641.0 115.0 18.14 76.61 3
clrjit!CILJit::compileMethod 527.0 642.0 115.0 18.14 76.47 3
paintdotnet.windows.framework!? 184.0 224.0 40.0 6.31 76.18 3
clrjit!Compiler::compCompileHelper 519.0 628.0 109.0 17.19 73.60 3
ntoskrnl!KeUserModeCallback 424.0 512.0 88.0 13.88 72.73 3
clrjit!Compiler::compCompile 522.0 629.0 107.0 16.88 71.83 3
user32!__fnDWORD 340.0 405.0 65.0 10.25 66.99 3
terrafx.interop.windows!? 407.0 481.0 74.0 11.67 63.71 3
windowsbase.il!dynamicClass.IL_STUB_ReversePInvoke(int64,unsigned int32,int64,int64) 684.0 807.5 123.5 19.47 63.25 3
paintdotnet.windows!? 289.0 339.0 50.0 7.89 60.63 3
system.private.corelib!System.Boolean System.Threading.Tasks.Task::InternalWaitCore(System.Int32, System.Threading.CancellationToken) 52.0 33.0 -19.0 -3.00 -128.04 3
ntdll!LdrpMapDllNtFileName 50.0 31.0 -19.0 -3.00 -133.16 3
d2d1!D2DDeviceContextBase<ID2D1RenderTarget,ID2D1DeviceContext7,ID2D1DeviceContext7>::EndDraw 51.0 31.0 -20.0 -3.15 -137.42 3
system.private.corelib!System.Boolean System.Threading.Tasks.Task::WrappedTryRunInline() 48.0 28.0 -20.0 -3.15 -146.01 3
system.private.corelib!System.Boolean System.Threading.Tasks.TaskScheduler::TryRunInline(System.Threading.Tasks.Task, System.Boolean) 48.0 28.0 -20.0 -3.15 -146.01 3
system.private.corelib!System.Boolean System.Threading.Tasks.ThreadPoolTaskScheduler::TryExecuteTaskInline(System.Threading.Tasks.Task, System.Boolean) 48.0 28.0 -20.0 -3.15 -146.01 3
coreclr!MemberLoader::GetMethodDescFromMethodDef 45.0 24.0 -21.0 -3.31 -163.53 3
coreclr!DomainAssembly::DoIncrementalLoad 39.0 19.0 -20.0 -3.15 -179.70 3
combase!ObjectStublessClient 36.0 17.0 -19.0 -3.00 -184.94 3
combase!ObjectStubless 36.0 17.0 -19.0 -3.00 -184.94 3
system.private.corelib!System.Threading.Thread::StartInternal(System.Threading.ThreadHandle, System.Int32, System.Int32, System.Char*) 22.0 9.0 -13.0 -2.05 -207.07 3
system.private.corelib!System.Threading.Thread::StartCore() 22.0 9.0 -13.0 -2.05 -207.07 3
ntdll!LdrpPrepareModuleForExecution 22.0 9.0 -13.0 -2.05 -207.07 3
coreclr!ThreadNative_Start 22.0 9.0 -13.0 -2.05 -207.07 3
ntoskrnl!ExReleaseResourceAndLeaveCriticalRegion 21.0 8.0 -13.0 -2.05 -216.93 3
coreclr!ThreadNative::Start 21.0 8.0 -13.0 -2.05 -216.93 3
coreclr!ClassLoader::LoadConstructedTypeThrowing 29.0 11.0 -18.0 -2.84 -217.50 3
rpcrt4!NdrpClientCall3 37.0 13.0 -24.0 -3.79 -227.30 3
ntdll!ZwDelayExecution 20.0 7.0 -13.0 -2.05 -227.77 3
win32kbase!UserSessionSwitchLeaveCrit 19.0 6.0 -13.0 -2.05 -239.76 3
fltmgr!FltpPassThroughInternal 20.0 6.0 -14.0 -2.21 -245.29 3
fltmgr!FltpPerformPreCallbacksWorker 22.0 6.0 -16.0 -2.52 -254.85 3
combase!CoCreateInstance 33.0 7.0 -26.0 -4.10 -276.09 3
combase!CComActivator::DoCreateInstance 33.0 7.0 -26.0 -4.10 -276.09 3
combase!ICoCreateInstanceEx 33.0 7.0 -26.0 -4.10 -276.09 3
communitytoolkit.highperformance!? 30.0 6.0 -24.0 -3.79 -280.34 3
d3d10warp!Task_Rasterize 16.0 3.0 -13.0 -2.05 -284.72 3
ntoskrnl!EtwpEventWriteFull 28.0 5.0 -23.0 -3.63 -287.85 3
ntoskrnl!EtwWriteEx 28.0 5.0 -23.0 -3.63 -287.85 3
ntdll!EtwEventWriteTransfer 27.0 3.0 -24.0 -3.79 -311.49 3
combase!CoUnmarshalInterface 28.0 3.0 -25.0 -3.94 -312.88 3
combase!CStdMarshal::UnmarshalObjRef 28.0 3.0 -25.0 -3.94 -312.88 3
win32kfull!xxxRealInternalGetMessage 14.0 1.0 -13.0 -2.05 -325.39 3
coreclr!JIT_GetGenericsGCStaticBase 16.0 1.0 -15.0 -2.37 -328.52 3
d3d10warp!RasterizationStage::RasterizeBufferNoPixelShader 16.0 1.0 -15.0 -2.37 -328.52 3
kernelbase!VirtualAlloc 18.0 1.0 -17.0 -2.68 -330.95 3
coreclr!McGenEventWrite_EventWriteTransfer 23.0 1.0 -22.0 -3.47 -335.19 3
combase!ActivationPropertiesOut::GetObjectInterfaces 28.0 1.0 -27.0 -4.26 -337.91 3
system.windows.forms!System.Windows.Forms.Label::AdjustSize() 48.0 63.0 15.0 2.37 109.51 2
paintdotnet.fundamentals!PaintDotNet.Functional.LazyResult`2[System.__Canon,System.Int32].EnsureEvaluated() 64.0 84.0 20.0 3.15 109.51 2
paintdotnet.fundamentals!PaintDotNet.Functional.LazyResult`2[System.__Canon,System.Int32].ThreadSafeEvaluate() 64.0 84.0 20.0 3.15 109.51 2
system.windows.forms!System.Windows.Forms.Label::OnParentChanged(System.EventArgs) 42.0 55.0 13.0 2.05 108.46 2
ntoskrnl!ObOpenObjectByNameEx 43.0 56.0 13.0 2.05 105.94 2
system.windows.forms!System.Windows.Forms.ToolStripItem::HandlePaint(System.Windows.Forms.PaintEventArgs) 47.0 61.0 14.0 2.21 104.38 2
paintdotnet.fundamentals!PaintDotNet.Functional.LazyResult`2[System.__Canon,System.Int32].ThreadUnsafeEvaluate() 64.0 83.0 19.0 3.00 104.03 2
ntdll!RtlpxVirtualUnwind 48.0 62.0 14.0 2.21 102.21 2
paintdotnet.windows.core!PaintDotNet.Direct2D1.DrawingHelpers.DrawWithErrorHandling(class System.Func1<!!0>,class System.Func2<!!0,!!1>) 49.0 63.0 14.0 2.21 100.12 2
ROOT 2221.8 2855.8 634.0 100.00 100.00 2
paintdotnet.objectmodel!? 90.0 115.0 25.0 3.94 97.34 2
paintdotnet.fundamentals!PaintDotNet.Functional.LazyResult`2[System.__Canon,System.Int32].get_Value() 52.0 66.0 14.0 2.21 94.34 2
coreclr!ETW::MethodLog::MethodJitting 82.0 103.0 21.0 3.31 89.74 2
?!? 65.0 81.0 16.0 2.52 86.26 2
coreclr!JIT_ClassInitDynamicClass_Helper 61.0 76.0 15.0 2.37 86.17 2
clrjit!CodeGen::genEmitMachineCode 90.0 112.0 22.0 3.47 85.66 2
paintdotnet.componentmodel!? 85.0 105.0 20.0 3.15 82.45 2
ntdll!LdrpProcessWork 61.0 75.0 14.0 2.21 80.42 2
system.windows.forms!System.Windows.Forms.Control::WmPaint(System.Windows.Forms.Message&) 101.0 124.0 23.0 3.63 79.80 2
clrjit!emitter::emitEndCodeGen 90.0 108.0 18.0 2.84 70.08 2
paintdotnet.core!PaintDotNet.Rendering.TileCompressor1[PaintDotNet.Imaging.ColorPbgra32].CompressTile(class PaintDotNet.Imaging.IBitmap1<!0>&) 73.0 86.0 13.0 2.05 62.40 2
system.private.corelib!System.Threading.QueueUserWorkItemCallback::Execute() 151.0 177.0 26.0 4.10 60.34 2
clrjit!CodeGenPhase::DoPhase 161.0 185.0 24.0 3.79 52.24 2
clrjit!CodeGen::genGenerateCode 163.0 186.0 23.0 3.63 49.45 2
coreclr!ETW::SamplingLog::GetCurrentThreadsCallStack 174.0 157.0 -17.0 -2.68 -34.24 2
coreclr!ETW::SamplingLog::SaveCurrentStack 172.0 155.0 -17.0 -2.68 -34.63 2
coreclr!MemberLoader::GetDescFromMemberRef 125.0 112.0 -13.0 -2.05 -36.44 2
coreclr!Thread::VirtualUnwindCallFrame 169.0 147.0 -22.0 -3.47 -45.62 2
ntdll!RtlLookupFunctionEntry 92.0 78.0 -14.0 -2.21 -53.32 2
coreclr!ETW::MethodLog::MethodJitted 93.0 73.0 -20.0 -3.15 -75.36 2
coreclr!ETW::MethodLog::SendMethodILToNativeMapEvent 78.0 57.0 -21.0 -3.31 -94.34 2
clrjit!LinearScan::buildIntervals 46.0 32.0 -14.0 -2.21 -106.65 2
@kunalspathak
Copy link
Author

I did a run of aspnet, and connected perf immediately so that it gets all the initialisation.

just that I understand, you did dotnet Benchmarks.dll and then attached perf to it. That would be barely 1 second worth of profile and might hard to time it to capture everything in startup. For PDN, we recorded the traces by starting the PDN itself under profiler.

@a74nh
Copy link

a74nh commented Jun 21, 2023

just that I understand, you did dotnet Benchmarks.dll and then attached perf to it. That would be barely 1 second worth of profile and might hard to time it to capture everything in startup. For PDN, we recorded the traces by starting the PDN itself under profiler.

Yes, I ran dotnet Benchmarks.dll, immediately attached perf then did a run of wrk. So slightly different. but agreed it's not ideal.

(I also need to fix my setup. I'm using https://github.com/microsoft/perfview/blob/main/src/perfcollect/perfcollect to run perf as it seems to be doing some extra things for dotnet. However, it only allows attaching or tracing the whole system. So I need to stop using it or fix the script.)

I had a look around, and I found https://www.pinta-project.com/, which is a cross platform alternative to paint.net. I gave this a quick try on Linux.

That gives me 0.4% in strcmp on X64 and 0.2% on Arm64. I don't see any calls to FlushInstructionCache. If you think it's useful, I can have a look deeper into this for Linux.

@kunalspathak
Copy link
Author

kunalspathak commented Jun 21, 2023

I had a look around, and I found https://www.pinta-project.com/, which is a cross platform alternative to paint.net

Thanks for finding that. I was not aware of this.

That gives me 0.4% in strcmp on X64 and 0.2% on Arm64. I don't see any calls to FlushInstructionCache. If you think it's useful, I can have a look deeper into this for Linux.

Let me try it first with the setup that I have and see what I notice. If I see FlushInstructionCache, that will confirm the theory that it is expensive on windows than on linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment