InKolev/dotNETMemoryManagement.md

## dotNETMemoryManagement.md

      
    Raw
  

              dotNETMemoryManagement.md
            
          
    Memory Management in .NET

"If you don't ultimately have a root reference, then you can't be accessed, so you
must die."
Brief overview

SOH:

Allocates objects smaller than 85K
Objects are allocated on top of each other, a Next Object Pointer (NOP) represents the location where the next object will be allocated.
Because the NOP position is known, a new object can be immediately allocated without the overhead of looking for space in a Free Space tables.
Large potential for fragmentation to occur because it is a heavily used storage area.
Each object that cannot be traced to a root reference is marked for garbage collection.
To overcome the fragmentation problem, the GC compacts the heap, and thereby removes any gaps between objects.
This comes with the price of a huge performance overhead, if you need to navigate through huge object graphs and copy lots of alive objects over the top of dead ones.
To get maximum performance, that whole process needs to be optimized, so the GC guys decided to classify all objects into one of three groups:
Gen 0 (Short-lived objects) - Inspected and collected most frequently.
Gen 1 (Mid-lived objects, which survived Gen 0 collection)
Gen 2 (Long-lived objects, which survived both Gen 0 and Gen 1 collections)

GC Runs automatically on a separate thread under one of the following conditions:

When the size of objects in any Generation reaches a generation-specific threshold:
Gen 0 hits ~ 256 K
Gen 1 hits ~ 2 MB (GC collects Gen 1 and Gen 0)
Gen 2 hits ~ 10 MB (GC collects Gen 2, Gen 1 and Gen 0)
When GC.Collect() is called in code.
the OS sends a low memory notification

GC operations also depend on whether the application is Server or Workstation - based, and on its latency mode.
Finalization

Finalization Queue (Gen 1 objects, acts as a root reference)
fReachableQueue (Used to iterate over all objects which need to be Finalize()'d)

LOH:

Unlike SOH, objects on the LOH aren't compacted, because of the overhead of copying large chunks of memory.
When a full GC (Gen 2) takes place, the address ranges of any LOH objects not in use are recorded in a "free space" allocation table.
If any adjacent objects are rootless, then they are recorded as one entry, within a combined address range (their address ranges are merged)
It is unlikely that the object which will be allocated will be of a size that exactly matches an address range in the Free Space table, and as a result, small chunks of memory will almost always be left between objects, resulting in fragmentation.
Chunks < 85K will be left with no possibility of reuse, as objects of that size, never make it onto the LOH.
As allocation demand increases, new segments are reserved for the LOH, even though space, albeit fragmented space, is still available.
For performance reasons, .NET GC preferentially allocates large objects at the end of the heap.
When a large objects needs to be allocated, we have two choices: either run a Gen 2 FULL GC, or append the object to the end, which may involve extending the heap with additional segments). It tends to go for the second, easier, faster option, and avoids the full GC.

Four sections of memory (heaps) are created upon application run


Code Heap - stores the actual native code instructions after they have been JITed (just in time compiled).
Small Object Heap (SOH) - stores allocated objects that are less than 85K in size.
Large Object Heap (LOH) - stores allocated objects greater than 85K in size.
Process Heap

Stack

Used to keep track of method's data from every other method call (parameters, locally declared variables, etc). Each method has its own stack frame. The stack can store variables that are primitive data types (boolean, byte, int, float, double, decimal, char, structs)
Heap

Used to store instances of everything you have defined, including:

classes
interfaces
delegates
strings
instances of "object"

These are all referred to as "reference types", and are stored on the heap (the SOH or LOH, depending on their size). When an instance of a reference type is created (usually involving the new keyword), only an object reference is stored on Stack. The actual instance itself is stored on the heap, and its address held on the stack. To achieve this .NET has to create the object on the memory
heap, determine its address on the heap, and place that object reference within the current stack frame.
Passing parameters

When you pass a value type as a parameter, all you actually pass to the calling method is a copy of the variable. Any changes that are made to the passed variable within the method call are isolated to the method. Be careful when passing structs because they are also value types, and sometimes they tend to get very large.
One way around this problem is to pass specific value types by reference. This is something you would do anyway if you wanted to allow direct changes to the value of a passed variable inside a method call.
Boxing and unboxing

Refers to the extra work required when your code causes a value type (e.g. int, char, etc.) to be allocated on the heap
rather than the stack. As we descrbier earlier, allocating onto the heap requires more work, and is therefore less efficient.
Classic code example of boxing and unboxing looks like this:
// Integer is created on the Stack
int stackVariable = 12;

// Integer is created on the Heap (Boxing)
object boxedObject = stackVariable;

// Integer is created back on the Stack (Unboxing) 
int unboxedObject = (int)boxedObject;
Garbage Collection

When an object goes out of scope, it will be automatically scheduled for cleanup. To achieve this automatic cleanup, .NET uses the infamous garbage collector (GC). All the GC does is look for allocated objects on the heap that aren't being referenced by anything. The most obvious source of references, as we saw earlier, is the stack. Other potential sources include:

global/static object references
CPU registers
object finalization references (more later)
Interop references (.NET objects passed to COM/API calls)
stack references

Collectively, these are all called root references or GC roots. As well as root references, an object can also be referenced by other objects. This forms a reference tree. This is important because if an object doesn't ultimately have a root reference then it
can't actually be accessed by code, so it is no longer in use, and can be scheduler for removal.
SOH Cleanup (Heap compaction)

Garbage collection of the Small Object Heap (SOH) involves compaction. This is because the SOH is a contiguous heap where objects are allocated consecutively on top of each other. When compaction occurs, marked objects are copied over the space taken up by unmarked objects, overwriting those objects, removing any gaps, and keeping the heap contiguous; this process is known as Copy Collection. The advantage of this is that heap fragmentation (i.e. unusable memory gaps) is kept to a minimum. The main disadvantage is that compaction involves copying chunks of memory around, which requires CPU cycles and so, depending on frequency, can cause performance problems. What you gain in efficient allocation you could lose in compaction costs.
LOH Sweeping (Free space tracking)

The Large Object Heap (LOH) isn't compacted, and this is simply because of the time it would take to copy large objects over the top of unused ones. Instead, the LOH keeps track of free and used space, and attempts to allocate new objects into the most appropriately-sized free slots left behind by collected objects.
As a result of this, the LOH is prone to fragmentation.
Static Objects

When you mark a method, property, variable, or event as static, the runtime creates a global instance of each one soon after the code referencing them is loaded and used.
Static members are never garbage collected because they essentially are root references in themselves. Statics are a common and enduring source of root references, and can be responsible for keeping objects loaded in memory for far longer than would otherwise be expected.
It's also worth remembering that any classes that subscribe to static events will remain in memory until the event subscription is removed, or the containing app domain finishes.
Static collections can also be a problem, as the collection itself will act as a root reference, holding all added objects in memory for the lifetime of the app domain.
Sometimes you may want to prevent multiple threads accessing a common set of statics. To do this, you can add the [ThreadStatic] attribute to the member, and create multiple static instances of that member – one for each isolated thread (one instance per thread).
Managed Heaps

To be of any use to the application, an object needs to be accessible. For that to be the case, it either needs to have a reference pointing to it directly from a root reference (stack, statics, CPU registers, finalization queue), or it needs to be referenced from an object that ultimately has a root reference itself.
By tracking back through an object's reference hierarchy, if we ultimately reach a root reference, then all of the objects in that hierarchy are fundamentally accessible (rooted).
This simple principle is the cornerstone of how .NET memory management works. Using the rule that an object which is no longer accessible can be cleaned up, automatic garbage collection becomes possible.
What is automatic Garbage Collection?

Automatic garbage collection is just a bunch of code that runs periodically, looking for allocated objects that are no longer being used by the application. It frees developers from the responsibility of explicitly destroying objects they create, avoiding the problem of objects being left behind and building up as classic memory leaks.
The GC in .NET runs for both the LOH and SOH, but also works differently for both. In terms of similarities, each of the heaps can be expanded by adding segments (chunks of memory requested from the OS) when necessary. However, the GC tries to avoid this by making space where it can, clearing unused objects so that the space can be reused. This is much more efficient, and avoids expensive heap expansion.
When does the GC run?

The GC runs on a separate thread when certain memory conditions are reached or when the application begins to run out of memory. The
developer can also explicitly force the GC to run using the following line of code:
GC.Collect();
This is never really good idea because it can cause performance and scalability problems. If things with memory aren't going well - get a hold of good memory profiler and find a solution that way.
Small Object Heap (SOH)

Allocation and automatic garbage collection on the Small Object Heap (SOH) is quite a complex process. Because most of the objects allocated within an application are less than 85 K, the SOH is a pretty heavily used storage area.
In unmanaged C/C++ applications, objects are allocated onto the heap wherever space can be found to accommodate(fit) them. When an object is destroyed by the programmer, the space that that object used on the heap is then available to allocate other objects onto. The problem is that, over time, the heap can become fragmented, with little spaces left over that aren't usually large enough to use effectively. As a result, the heap becomes larger than necessary, as more memory segments are added so that the heap can expand to accommodate new objects.
Another problem is that whenever an allocation takes place (which is often), it must take time to find a suitable gap in memory to use.
To minimize allocation time and almost eliminate heap fragmentation, .NET allocates objects consecutively, one on top of another, and keeps track of where to allocate the next object.
Optimizing Garbage Collection

If you think about it, there's potentially a bit of a problem with creating "still in use" lists and compacting the heap, especially if it's very large. Navigating through huge object graphs and copying lots of live objects over the top of dead ones is going to take a
significant amount of processing time. To get maximum performance, that whole process needs to be optimized.
This is done by classifying all objects into one of three groups. At one end, you've got short-lived objects that are allocated, used and discarded quickly. At the other end of the spectrum, you have long-lived objects which are allocated early on and then remain in use indefinitely. Thirdly and finally, you have, medium-lived objects, which are somewhere in the middle respectively.

Generation 0 (Gen 0)
Generation 1 (Gen 1)
Generation 2 (Gen 2)

Generational Garbage Collection

When an object has just been created, it is classified as a Gen 0 object, which just means that it's new and hasn't yet been inspected by the GC. Gen 1 objects have been inspected by the GC once and survived, and Gen 2 objects have survived two or more such inspections.
The GC runs automatically on a separate thread under one of the conditions below:

When the size of objects in any generation reaches a generation-specific threshold. To be precise, when:
Gen 0 hits **~ 256 K**
 - Gen 1 hits **~ 2 MB** (at which point the GC collects Gen 1 and 0)
Gen 2 hits **~ 10 MB** (at which point the GC collects Gen 2, 1 and 0)
GC.Collect() is called in code
the OS sends a low memory notification.

It's worth bearing in mind that the above thresholds are merely starting levels, because .NET modifies the levels depending on the application's behavior.