In the common language runtime (CLR), the garbage collector keeps track of objects that are no longer being used. The CLR returns the memory previously being used by an object back to the heap. Garbage collection can be expensive so the CLR only collects garbage when it needs to.
Generations:
The heap is organized into three generations:
-
Gen. 0: This is the youngest generation and contains short-lived objects. An example of a short-lived object is a temporary variable. Garbage collection occurs most frequently in this generation
-
Gen 1: This generation contains short-lived objects and serves as a buffer between short-lived objects and long-lived objects.
-
Gen 2: This generation contains long-lived objects. An example of a long-lived object is an object in a server application that contains static data that is live for the duration of the process.
Collecting a generation means collecting objects in that generation and all its younger generations. A generation 2 garbage collection is also known as a full garbage collection, because it reclaims every object in the managed heap.
The code for exercise 1 was benchmarked using BenchmarkDotNet (BDN), which is a really good tool for helping with performance investigations in .NET.
BenchmarkDotNet=v0.11.1, OS=macOS High Sierra 10.13.6 (17G65) [Darwin 17.7.0]
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.1.402
Core : .NET Core 2.1.4 (CoreCLR 4.6.26814.03, CoreFX 4.6.26814.02), 64bit RyuJIT
Method | NumClaims | Mean | Scaled | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
ExtractClaims_String | 10 | 3,039.8 ns | 16,9 | 0,0687 | - | - | 224 B |
ExtractClaims_Bytes | 10 | 179.8 ns | 1 | 0,071 | - | - | 224 B |
ExtractClaims_String | 1000 | 455,802.3 ns | 26,18 | 4,8828 | - | - | 16064 B |
ExtractClaims_Bytes | 1000 | 17,466.2 ns | 1 | 5,0964 | - | - | 16064 B |
ExtractClaims_String | 10000 | 4,703,344.5 ns | 23,62 | 39,0625 | 39,0625 | 39,0625 | 160064 B |
ExtractClaims_Bytes | 10000 | 199,113.5 ns | 1 | 51,7578 | 47,3633 | 47,3633 | 160064 B |
ExtractClaims_String | 100_000 | 49,042,150.1 ns | 22,45 | - | - | - | 1600064 B |
ExtractClaims_Bytes | 100_000 | 2,191,978.9 ns | 1 | 136,7188 | 132,8125 | 132,8125 | 1600064 B |
Legend:
- NumClaims: Number of generated claims
- Mean: Arithmetic mean of all measurements
- Scaled: Mean(CurrentBenchmark) / Mean(BaselineBenchmark)
- Gen 0: GC Generation 0 collects per 1k Operations
- Gen 1: GC Generation 1 collects per 1k Operations
- Gen 2: GC Generation 2 collects per 1k Operations
- Allocated: Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
Disclaimer: some data columns were removed to keep the table small
Both methods are using Span<T>
, more specifically ReadOnlySpan<T>
, which is made to work with strings or other immutable types. This is a simple value type that lets us work with any contiguous memory, and ensures memory and type safety and has almost no overhead. One of its core feature is Slicing (taking part of some memory), this means that it does not copy any memory, it simply creates a Span<T>
with a different pointer and lenght. This means that Slicing does not allocate any managed heap memory.
The Allocated column indicates that we allocate the same amount of memory in both methods, for all benchmarks. The only allocations are the List (plus object header and method table pointer) and N Guids which are all structs (value types). For NumClaims = 10000 we can clearly see that the garbage collector is doing a full garbage collection, and roughly the same amount is being collected (BDN is using some heuristic when running these benchmarks, so the number of invocations can be different for different runs). The total amount of allocated memory is also somewhat random, the CLR does some aligning, e.g., if you allocate a new byte[7]
array, it will allocate a byte[8]
array.
For NumClaims = 100 000 we see that ExtractClaims_String causes no garbage collection occurs, this is a interesting result that I can't explain. But this can be due to the fact that all objects over 85 000 bytes are put in the Large Object Heap (LOH), and are handled differently.
As we can see, ExtracClaims is always slower, this can be seen by looking at the Scaled column of the table. For NumClaims = 1000 the ExtractClaim is 26 times slower than ExtractClaim_Bytes, 22 times slower on average. This is a significant difference in performance.
If you were to parse 1000 claims, a 1000 000 000 times each day in the course of a year, with an average run time of 17,466.2 ns per 1000 claims you are looking at:
ExtractClaims_Bytes
1x10^12/24h * 17 ns * 0.140$/h = 0.66$/24h
0.66$/24 * 8760h = 241$/year.
if you were to use ExtractClaim_String which is 26 times slower for NumClaims = 10000:
241$/year * 26 = 6271$/year.