Skip to content

Instantly share code, notes, and snippets.

Avatar
🛠️
Working from home

Egor Bogatov EgorBo

🛠️
Working from home
View GitHub Profile
@EgorBo
EgorBo / Dynamic PGO in .NET 6.0.md
Last active Nov 29, 2021
Dynamic PGO in .NET 6.0.md
View Dynamic PGO in .NET 6.0.md

Dynamic PGO in .NET 6.0

Dynamic PGO (Profile-guided optimization) is a JIT-compiler optimization technique that allows JIT to collect additional information about surroundings (aka profile) in tier0 codegen in order to rely on it later during promotion from tier0 to tier1 for hot methods to make them even more efficient.

What exactly PGO can optimize for us?

  1. Profile-driving inlining - inliner relies on PGO data and can be very aggressive for hot paths and care less about cold ones, see dotnet/runtime#52708 and dotnet/runtime#55478. A good example where it has visible effects is this StringBuilder benchmark:

  2. Guarded devirtualization - most monomorphic virtual/interface calls can be devirtualized using PGO data, e.g.:

void DisposeMe(IDisposable d)
View 61293-IgnoreStructsAndRefs.md

benchmarks.run.windows.arm64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 7603860 (overridden on cmd)
Total bytes of diff: 7582960 (overridden on cmd)
Total bytes of delta: -20900 (-0.27 % of base)
@EgorBo
EgorBo / dotnet-runtime-commit.cs
Created Nov 7, 2021
dotnet-runtime-commit.cs
View dotnet-runtime-commit.cs
using System.Diagnostics;
using System.Text.RegularExpressions;
using System.Xml.Linq;
class Program
{
static void Main() => Console.WriteLine($"dotnet/runtime commit: {GetDotnetRuntimeCommit()}");
public static string GetDotnetRuntimeCommit()
{
@EgorBo
EgorBo / GDV_for_delegates.cs
Last active Oct 29, 2021
GDV_for_delegates.cs
View GDV_for_delegates.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
//
@EgorBo
EgorBo / ConstextInsensitivePGO.cs
Created Oct 27, 2021
ConstextInsensitivePGO.cs
View ConstextInsensitivePGO.cs
using System;
using System.Threading;
public class Prog
{
static void DoWorkCommon(JobFactory factory, int i) => factory.CreateJob(i).DoWork();
static void DoWork1(JobFactory factory) => DoWorkCommon(factory, 1);
static void DoWork2(JobFactory factory) => DoWorkCommon(factory, 2);
View vpopcnt.cs
static Vector128<uint> VPopcnt(Vector128<uint> dataArg)
{
var data = dataArg; // workaround for a codegen issue
data = Sse2.Subtract(data, Sse2.And(Sse2.ShiftRightLogical(data, 1), Vector128.Create((uint)0x55555555)));
data = Sse2.Add(Sse2.And(data, Vector128.Create((uint)0x33333333)),
Sse2.And(Sse2.ShiftRightLogical(data, 2), Vector128.Create((uint)0x33333333)));
data = Sse2.And(Sse2.Add(data, Sse2.ShiftRightLogical(data, 4)), Vector128.Create((uint)0x0F0F0F0F));
data = Sse2.Add(data, Sse2.ShiftRightLogical(data, 8));
data = Sse2.Add(data, Sse2.ShiftRightLogical(data, 16));
data = Sse2.And(data, Vector128.Create((uint)0x0000003F));
@EgorBo
EgorBo / Volatile-vs-volatile.cs
Created Oct 10, 2021
Volatile-vs-volatile.cs
View Volatile-vs-volatile.cs
using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics.Arm;
using System.Threading;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Order;
using BenchmarkDotNet.Running;
View trie.cs
using BenchmarkDotNet.Attributes;
using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
public class Program
{
public static void Main()
{
BenchmarkDotNet.Running.BenchmarkRunner.Run<Program>();
View old.asm
; Assembly listing for method Tests:Test(ClassAFactoryFactory):long
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 3.50) ref -> rcx class-hnd single-def
View new.asm
; Assembly listing for method Tests:Test(ClassAFactoryFactory):long
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; optimized using profile data
; rsp based frame
; partially interruptible
; with PGO: edge weights are valid, and fgCalledCount is 2e+08
; 0 inlinees with PGO data; 3 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments