Skip to content

Instantly share code, notes, and snippets.

@fredeil
Last active November 19, 2018 10:05
Show Gist options
  • Save fredeil/8fbf6a869ae1ff7ac630ec2474752853 to your computer and use it in GitHub Desktop.
Save fredeil/8fbf6a869ae1ff7ac630ec2474752853 to your computer and use it in GitHub Desktop.
Benchmarking parsing of Guids in .NET Core using BenchmarkDotNet

A note on garbage collection

In the common language runtime (CLR), the garbage collector keeps track of objects that are no longer being used. The CLR returns the memory previously being used by an object back to the heap. Garbage collection can be expensive so the CLR only collects garbage when it needs to.

Generations:

The heap is organized into three generations:

  • Gen. 0: This is the youngest generation and contains short-lived objects. An example of a short-lived object is a temporary variable. Garbage collection occurs most frequently in this generation

  • Gen 1: This generation contains short-lived objects and serves as a buffer between short-lived objects and long-lived objects.

  • Gen 2: This generation contains long-lived objects. An example of a long-lived object is an object in a server application that contains static data that is live for the duration of the process.

Collecting a generation means collecting objects in that generation and all its younger generations. A generation 2 garbage collection is also known as a full garbage collection, because it reclaims every object in the managed heap.

Benchmarking results

The code for exercise 1 was benchmarked using BenchmarkDotNet (BDN), which is a really good tool for helping with performance investigations in .NET.

BenchmarkDotNet=v0.11.1, OS=macOS High Sierra 10.13.6 (17G65) [Darwin 17.7.0]
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.1.402
  Core   : .NET Core 2.1.4 (CoreCLR 4.6.26814.03, CoreFX 4.6.26814.02), 64bit RyuJIT
Method NumClaims Mean Scaled Gen 0 Gen 1 Gen 2 Allocated
ExtractClaims_String 10 3,039.8 ns 16,9 0,0687 - - 224 B
ExtractClaims_Bytes 10 179.8 ns 1 0,071 - - 224 B
ExtractClaims_String 1000 455,802.3 ns 26,18 4,8828 - - 16064 B
ExtractClaims_Bytes 1000 17,466.2 ns 1 5,0964 - - 16064 B
ExtractClaims_String 10000 4,703,344.5 ns 23,62 39,0625 39,0625 39,0625 160064 B
ExtractClaims_Bytes 10000 199,113.5 ns 1 51,7578 47,3633 47,3633 160064 B
ExtractClaims_String 100_000 49,042,150.1 ns 22,45 - - - 1600064 B
ExtractClaims_Bytes 100_000 2,191,978.9 ns 1 136,7188 132,8125 132,8125 1600064 B

Legend:

  • NumClaims: Number of generated claims
  • Mean: Arithmetic mean of all measurements
  • Scaled: Mean(CurrentBenchmark) / Mean(BaselineBenchmark)
  • Gen 0: GC Generation 0 collects per 1k Operations
  • Gen 1: GC Generation 1 collects per 1k Operations
  • Gen 2: GC Generation 2 collects per 1k Operations
  • Allocated: Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)

Disclaimer: some data columns were removed to keep the table small

Discussion

Both methods are using Span<T>, more specifically ReadOnlySpan<T>, which is made to work with strings or other immutable types. This is a simple value type that lets us work with any contiguous memory, and ensures memory and type safety and has almost no overhead. One of its core feature is Slicing (taking part of some memory), this means that it does not copy any memory, it simply creates a Span<T> with a different pointer and lenght. This means that Slicing does not allocate any managed heap memory.

The Allocated column indicates that we allocate the same amount of memory in both methods, for all benchmarks. The only allocations are the List (plus object header and method table pointer) and N Guids which are all structs (value types). For NumClaims = 10000 we can clearly see that the garbage collector is doing a full garbage collection, and roughly the same amount is being collected (BDN is using some heuristic when running these benchmarks, so the number of invocations can be different for different runs). The total amount of allocated memory is also somewhat random, the CLR does some aligning, e.g., if you allocate a new byte[7] array, it will allocate a byte[8] array.

For NumClaims = 100 000 we see that ExtractClaims_String causes no garbage collection occurs, this is a interesting result that I can't explain. But this can be due to the fact that all objects over 85 000 bytes are put in the Large Object Heap (LOH), and are handled differently.

As we can see, ExtracClaims is always slower, this can be seen by looking at the Scaled column of the table. For NumClaims = 1000 the ExtractClaim is 26 times slower than ExtractClaim_Bytes, 22 times slower on average. This is a significant difference in performance.

If you were to parse 1000 claims, a 1000 000 000 times each day in the course of a year, with an average run time of 17,466.2 ns per 1000 claims you are looking at:

ExtractClaims_Bytes

1x10^12/24h * 17 ns * 0.140$/h = 0.66$/24h
0.66$/24 * 8760h = 241$/year.

if you were to use ExtractClaim_String which is 26 times slower for NumClaims = 10000:

241$/year * 26 = 6271$/year.
using System;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Util;
namespace Benchmarking
{
[MemoryDiagnoser, CoreJob]
public class ExtractClaimsBenchmarks
{
private string Data;
private byte[] byteData;
[Params(10, 1000, 10_000, 100_000)]
public int NumClaims { get; set; }
[GlobalSetup]
public void Setup()
{
Data = DataGeneration.GenerateStringClaims(NumClaims);
byteData = DataGeneration.GenerateByteArrayClaims(NumClaims);
}
[Benchmark]
public List<Guid> ExtractClaims() => ClaimParsing.ExtractClaims(Data);
[Benchmark(Baseline = true)]
public List<Guid> ExtractClaims_Bytes() => ClaimParsing.ExtractClaims(byteData);
}
public class Program
{
public static void Main()
{
var summary = BenchmarkRunner.Run<ExtractClaimsBenchmarks>();
}
}
}
using System;
using System.Collections.Generic;
namespace Util
{
public static class ClaimParsing
{
public static List<Guid> ExtractClaims(string data)
{
ReadOnlySpan<char> utf8Text = data;
// By setting the capacity we can remove the overhead
// of reallocating new arrays when the list grows
var guids = new List<Guid>(data.Length / 36);
for (int i = 0; i < data.Length; i += 36)
{
// Slicing causes no overhad in allocation,
// and is much faster than Substring/Linq etc
guids.Add(Guid.Parse(utf8Text.Slice(i, 36)));
}
return guids;
}
public static List<Guid> ExtractClaims(byte[] data)
{
ReadOnlySpan<byte> utf8Text = data;
// By setting the capacity we can remove the overhead
// of reallocating new arrays when the list grows
var guids = new List<Guid>(data.Length / 16);
for (int i = 0; i < data.Length; i += 16)
{
guids.Add(new Guid(utf8Text.Slice(i, 16)));
}
return guids;
}
}
}
using System;
using System.Text;
namespace Util
{
public static class DataGeneration
{
private static readonly Random random = new Random();
private static readonly string[] regions = { "EUR", "CAN", "JPN", "US", "CHN", "GER", "APAC", "AUS" };
public static string GetRandomRegion() => regions[random.Next(0, regions.Length - 1)];
public static string GenerateStringClaims(int numClaims)
{
if (numClaims < 1)
throw new ArgumentOutOfRangeException(nameof(numClaims));
var sb = new StringBuilder(numClaims * 36);
for (var i = 0; i < numClaims; i++)
sb.Append(Guid.NewGuid().ToString());
return sb.ToString();
}
public static byte[] GenerateByteArrayClaims(int numClaims)
{
if (numClaims < 1)
throw new ArgumentOutOfRangeException(nameof(numClaims));
var buffer = new byte[numClaims * 16];
for (int i = 0; i < numClaims * 16; i += 16)
Array.Copy(Guid.NewGuid().ToByteArray(), 0, buffer, i, 16);
return buffer;
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment