fredeil/Benchmark.cs

## README.md

      
    Raw
  

              README.md
            
          
    A note on garbage collection

In the common language runtime (CLR), the garbage collector keeps track of objects that are no longer being used. The CLR returns the memory previously being used by an object back to the heap. Garbage collection can be expensive so the CLR only collects garbage when it needs to.
Generations:
The heap is organized into three generations:


Gen. 0: This is the youngest generation and contains short-lived objects. An example of a short-lived object is a temporary variable. Garbage collection occurs most frequently in this generation


Gen 1: This generation contains short-lived objects and serves as a buffer between short-lived objects and long-lived objects.


Gen 2: This generation contains long-lived objects. An example of a long-lived object is an object in a server application that contains static data that is live for the duration of the process.


Collecting a generation means collecting objects in that generation and all its younger generations. A generation 2 garbage collection is also known as a full garbage collection, because it reclaims every object in the managed heap.
Benchmarking results

The code for exercise 1 was benchmarked using BenchmarkDotNet (BDN), which is a really good tool for helping with performance investigations in .NET.
BenchmarkDotNet=v0.11.1, OS=macOS High Sierra 10.13.6 (17G65) [Darwin 17.7.0]
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.1.402
  Core   : .NET Core 2.1.4 (CoreCLR 4.6.26814.03, CoreFX 4.6.26814.02), 64bit RyuJIT


Method
NumClaims
Mean
Scaled
Gen 0
Gen 1
Gen 2
Allocated


ExtractClaims_String
10
3,039.8 ns
16,9
0,0687
-
-
224 B


ExtractClaims_Bytes
10
179.8 ns
1
0,071
-
-
224 B


ExtractClaims_String
1000
455,802.3 ns
26,18
4,8828
-
-
16064 B


ExtractClaims_Bytes
1000
17,466.2 ns
1
5,0964
-
-
16064 B


ExtractClaims_String
10000
4,703,344.5 ns
23,62
39,0625
39,0625
39,0625
160064 B


ExtractClaims_Bytes
10000
199,113.5 ns
1
51,7578
47,3633
47,3633
160064 B


ExtractClaims_String
100_000
49,042,150.1 ns
22,45
-
-
-
1600064 B


ExtractClaims_Bytes
100_000
2,191,978.9 ns
1
136,7188
132,8125
132,8125
1600064 B


Legend:

NumClaims: Number of generated claims
Mean: Arithmetic mean of all measurements
Scaled: Mean(CurrentBenchmark) / Mean(BaselineBenchmark)
Gen 0: GC Generation 0 collects per 1k Operations
Gen 1: GC Generation 1 collects per 1k Operations
Gen 2: GC Generation 2 collects per 1k Operations
Allocated: Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)

Disclaimer: some data columns were removed to keep the table small
Discussion

Both methods are using Span<T>, more specifically ReadOnlySpan<T>, which is made to work with strings or other immutable types. This is a simple value type that lets us work with any contiguous memory, and ensures memory and type safety and has almost no overhead. One of its core feature is Slicing (taking part of some memory), this means that it does not copy any memory, it simply creates a Span<T> with a different pointer and lenght. This means that Slicing does not allocate any managed heap memory.
The Allocated column indicates that we allocate the same amount of memory in both methods, for all benchmarks. The only allocations are the List (plus object header and method table pointer) and N Guids which are all structs (value types). For NumClaims = 10000 we can clearly see that the garbage collector is doing a full garbage collection, and roughly the same amount is being collected (BDN is using some heuristic when running these benchmarks, so the number of invocations can be different for different runs). The total amount of allocated memory is also somewhat random, the CLR does some aligning, e.g., if you allocate a new byte[7] array, it will allocate a byte[8] array.
For NumClaims = 100 000 we see that ExtractClaims_String causes no garbage collection occurs, this is a interesting result that I can't explain. But this can be due to the fact that all objects over 85 000 bytes are put in the Large Object Heap (LOH), and are handled differently.
As we can see, ExtracClaims is always slower, this can be seen by looking at the Scaled column of the table. For NumClaims = 1000 the ExtractClaim is 26 times slower than ExtractClaim_Bytes, 22 times slower on average. This is a significant difference in performance.
If you were to parse 1000 claims, a 1000 000 000 times each day in the course of a year, with an average run time of 17,466.2 ns per 1000 claims you are looking at:
ExtractClaims_Bytes

1x10^12/24h * 17 ns * 0.140$/h = 0.66$/24h
0.66$/24 * 8760h = 241$/year.
if you were to use ExtractClaim_String which is 26 times slower for NumClaims = 10000:
241$/year * 26 = 6271$/year.

  
## Benchmark.cs
using System;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Util;

namespace Benchmarking
{
    [MemoryDiagnoser, CoreJob]
    public class ExtractClaimsBenchmarks
    {
        private string Data;
        private byte[] byteData;

        [Params(10, 1000, 10_000, 100_000)]
        public int NumClaims { get; set; }

        [GlobalSetup]
        public void Setup()
        {
            Data = DataGeneration.GenerateStringClaims(NumClaims);
            byteData = DataGeneration.GenerateByteArrayClaims(NumClaims);
        }

        [Benchmark]
        public List<Guid> ExtractClaims() => ClaimParsing.ExtractClaims(Data);

        [Benchmark(Baseline = true)]
        public List<Guid> ExtractClaims_Bytes() => ClaimParsing.ExtractClaims(byteData);
    }

    public class Program
    {
        public static void Main()
        {
            var summary = BenchmarkRunner.Run<ExtractClaimsBenchmarks>();
        }
    }
}

## Claimparsing.cs
using System;
using System.Collections.Generic;

namespace Util
{
    public static class ClaimParsing
    {
        public static List<Guid> ExtractClaims(string data)
        {
            ReadOnlySpan<char> utf8Text = data;

            // By setting the capacity we can remove the overhead
            // of reallocating new arrays when the list grows
            var guids = new List<Guid>(data.Length / 36);

            for (int i = 0; i < data.Length; i += 36)
            {
                // Slicing causes no overhad in allocation,
                // and is much faster than Substring/Linq etc
                guids.Add(Guid.Parse(utf8Text.Slice(i, 36)));
            }

            return guids;
        }

        public static List<Guid> ExtractClaims(byte[] data)
        {
            ReadOnlySpan<byte> utf8Text = data;

            // By setting the capacity we can remove the overhead
            // of reallocating new arrays when the list grows
            var guids = new List<Guid>(data.Length / 16);

            for (int i = 0; i < data.Length; i += 16)
            {
                guids.Add(new Guid(utf8Text.Slice(i, 16)));
            }

            return guids;
        }
    }
}

## Datageneration.cs
using System;
using System.Text;

namespace Util
{
    public static class DataGeneration
    {
        private static readonly Random random = new Random();
        private static readonly string[] regions = { "EUR", "CAN", "JPN", "US", "CHN", "GER", "APAC", "AUS" };

        public static string GetRandomRegion() => regions[random.Next(0, regions.Length - 1)];

        public static string GenerateStringClaims(int numClaims)
        {
            if (numClaims < 1)
                throw new ArgumentOutOfRangeException(nameof(numClaims));

            var sb = new StringBuilder(numClaims * 36);

            for (var i = 0; i < numClaims; i++)
                sb.Append(Guid.NewGuid().ToString());

            return sb.ToString();
        }

        public static byte[] GenerateByteArrayClaims(int numClaims)
        {
            if (numClaims < 1)
                throw new ArgumentOutOfRangeException(nameof(numClaims));

            var buffer = new byte[numClaims * 16];

            for (int i = 0; i < numClaims * 16; i += 16)
                Array.Copy(Guid.NewGuid().ToByteArray(), 0, buffer, i, 16);

            return buffer;
        }
    }
}
Method	NumClaims	Mean	Scaled	Gen 0	Gen 1	Gen 2	Allocated
ExtractClaims_String	10	3,039.8 ns	16,9	0,0687	-	-	224 B
ExtractClaims_Bytes	10	179.8 ns	1	0,071	-	-	224 B
ExtractClaims_String	1000	455,802.3 ns	26,18	4,8828	-	-	16064 B
ExtractClaims_Bytes	1000	17,466.2 ns	1	5,0964	-	-	16064 B
ExtractClaims_String	10000	4,703,344.5 ns	23,62	39,0625	39,0625	39,0625	160064 B
ExtractClaims_Bytes	10000	199,113.5 ns	1	51,7578	47,3633	47,3633	160064 B
ExtractClaims_String	100_000	49,042,150.1 ns	22,45	-	-	-	1600064 B
ExtractClaims_Bytes	100_000	2,191,978.9 ns	1	136,7188	132,8125	132,8125	1600064 B
	using System;
	using System.Collections.Generic;
	using BenchmarkDotNet.Attributes;
	using BenchmarkDotNet.Running;
	using Util;

	namespace Benchmarking
	{
	[MemoryDiagnoser, CoreJob]
	public class ExtractClaimsBenchmarks
	{
	private string Data;
	private byte[] byteData;

	[Params(10, 1000, 10_000, 100_000)]
	public int NumClaims { get; set; }

	[GlobalSetup]
	public void Setup()
	{
	Data = DataGeneration.GenerateStringClaims(NumClaims);
	byteData = DataGeneration.GenerateByteArrayClaims(NumClaims);
	}

	[Benchmark]
	public List<Guid> ExtractClaims() => ClaimParsing.ExtractClaims(Data);

	[Benchmark(Baseline = true)]
	public List<Guid> ExtractClaims_Bytes() => ClaimParsing.ExtractClaims(byteData);
	}

	public class Program
	{
	public static void Main()
	{
	var summary = BenchmarkRunner.Run<ExtractClaimsBenchmarks>();
	}
	}
	}
	using System;
	using System.Collections.Generic;

	namespace Util
	{
	public static class ClaimParsing
	{
	public static List<Guid> ExtractClaims(string data)
	{
	ReadOnlySpan<char> utf8Text = data;

	// By setting the capacity we can remove the overhead
	// of reallocating new arrays when the list grows
	var guids = new List<Guid>(data.Length / 36);

	for (int i = 0; i < data.Length; i += 36)
	{
	// Slicing causes no overhad in allocation,
	// and is much faster than Substring/Linq etc
	guids.Add(Guid.Parse(utf8Text.Slice(i, 36)));
	}

	return guids;
	}

	public static List<Guid> ExtractClaims(byte[] data)
	{
	ReadOnlySpan<byte> utf8Text = data;

	// By setting the capacity we can remove the overhead
	// of reallocating new arrays when the list grows
	var guids = new List<Guid>(data.Length / 16);

	for (int i = 0; i < data.Length; i += 16)
	{
	guids.Add(new Guid(utf8Text.Slice(i, 16)));
	}

	return guids;
	}
	}
	}
	using System;
	using System.Text;

	namespace Util
	{
	public static class DataGeneration
	{
	private static readonly Random random = new Random();
	private static readonly string[] regions = { "EUR", "CAN", "JPN", "US", "CHN", "GER", "APAC", "AUS" };

	public static string GetRandomRegion() => regions[random.Next(0, regions.Length - 1)];

	public static string GenerateStringClaims(int numClaims)
	{
	if (numClaims < 1)
	throw new ArgumentOutOfRangeException(nameof(numClaims));

	var sb = new StringBuilder(numClaims * 36);

	for (var i = 0; i < numClaims; i++)
	sb.Append(Guid.NewGuid().ToString());

	return sb.ToString();
	}

	public static byte[] GenerateByteArrayClaims(int numClaims)
	{
	if (numClaims < 1)
	throw new ArgumentOutOfRangeException(nameof(numClaims));

	var buffer = new byte[numClaims * 16];

	for (int i = 0; i < numClaims * 16; i += 16)
	Array.Copy(Guid.NewGuid().ToByteArray(), 0, buffer, i, 16);

	return buffer;
	}
	}
	}