Heap, Stack and Garbage Collector — A practical guide to .NET memory management system.
There was a time when memory was a slow, rare and expensive resource, so it was necessary to write code as performant as it could get, making them a lot harder to read and maintain. This scenario becomes very evident when we see that Apollo 11 main computer had only 4kb RAM.
As the hardware advanced, memory ended up being an abundant, cheap, and fast resource. Software followed this evolution becoming RAM voracious consumers (Chrome, is that you?)
The years passed by and some high-level languages came to life, these higher-level languages did not demand a deep knowledge about how the hardware behaves, but yet, some things had to be done with your bare hands, like allocate and deallocate memory (C/C++)
And then, some revolutionary languages emerged. These new languages took a lot of the programmer’s hard work to itself and now the programmer had to worry about… programming, and no longer about memory allocation and deep controls.
But if we want to get the maximum performance of our applications is necessary to understand how things work under the hood. Here we are going to talk about the .NET/C# environment, but a lot of concepts also apply to languages like Java or Go.
The first thing we should understand is that exists two types of Memory: Heap and Stack. Each one of them has a very important role in how our software works.
Let’s talk about the Stack first.
Stack is a contiguous space in memory, in other words, is a sequential area of 1MB (32bits systems) or 4MB(64bits systems), and even in unmanaged languages this area is controlled by the language itself demanding no manual intervention for memory allocation or deallocation.
The main role of the stack is to allow one method of your system to call another method and be able to get its return and move on through its execution path, but what happens during a method execution?
Whenever we call a method a new block is created at the stack, this block is called Stack Frame and holds information about the parameters received, the caller function return address, and its internal variables.
Let’s take a look at the example
Here we have 3 methods,
main which calls
sum passing 2 parameters which in turn calls
sum1 also passing 2 parameters. Let’s look at how the data is allocated on the stack.
First, we have the 2 variables owned by the
main method, note that value2 is above the value1, why?
This happens because the stack area is… well… a stack. When we stack items the last item to be stacked will be the first one on top, this is known as LIFO (Last-In/First-Out), to get easier think about a stack of plates, the last one you put there will be the first one to be taken.
Ok, with the
main method local variables allocated, now we need to call the
When we call a method with parameters the first thing to be allocated at the stack frame are these parameters, then the return address of the caller method is stored, and then the local variables.
The return address is necessary so the called method knows which address it should return both results and execution when its statements are finished.
Note that once again parameters, return address, and local variables are stored in the “reverse” order of its calls, in other words, they are stacked.
sum method will call
sum1again, passing 2 parameters, then at the end of execution, we’ll have all the data inside of the stack. Let’s see an animation of the stack being populated
Until this moment, no value was effectively returned yet, since the execution chain is still running.
The calls will return only when the last line of
sum1 is executed, then the stack will start to be cleaned.
First of all, the local variables are removed and then the execution goes back to the method
sum. But it seems to have a problem there. If execution went back to
sum, what happens with
parameter2 that is still inside the stack?
The method responsible for that cleaning is the method to which the result of the execution is returned, in this case, the
sum method. This is the method that will clean the areas that store
parameter2 of the
sum1 method and then run its statements
As the memory is freed at the end of the method execution, we can say that we have a scenario of deterministic deallocation since we know exactly the moment that this memory will be released.
Let’s see an animation of the stack being cleaned.
At the start of this topic I said that the memory has a fixed size, right? And as we call chained methods their allocated data will only be released when the last call ends, correct?
What would happen if we call so many methods that this area would be totally filled?
In this case, an Exception will be thrown by the C# runtime environment and our program execution would be finished, this thrown exception is the famous
But don’t worry, these allocations and deallocations happens so quickly that stack overflows are very unlikely to happen (if no mistake is made during the coding process)
Following this logic, what if one of my methods declares a variable so big that you overflow this space? For being so small it means that I can only allocate small values to my program memory?
Well, at the Stack area yes.
For better understanding, we have some specific data types that can be allocated at the Stack, these are called Value Types.
These types have already known length and take a very small amount of memory. Some examples of Value Types are
int (32 bits),
long (64bits),etc. Usually, Value Types are types with pre-defined sizes although we can have Value Types with dynamic sizes, like
structs, which is kind of an exception. Usually the unknown-sized types are stored in the Heap.
Ok, so that means that I cannot allocate unknown-sized variables inside my methods? No.
What happens is that this variable is allocated inside the Heap and only its memory address (reference) is stored at the Stack. This is exactly why we call the types stored at the Heap Reference Types.
Before we go deep about the Heap there is a popular belief we have to demystify.
“Value Types go to Stack and Reference Types go to Heap”.
Although this makes a little sense, this affirmation is wrong. Imagine you have a class named
Book, this Book class has an integer property called
ReleaseYear. Curiously this integer value, despite being a Value Type, is stored at the Heap, because it’s tightly related to a Reference Type. So whenever you have a class property, for example, this value goes to Heap no matter which one is its type. If the Book object was not a Class but a Struct, this value would be stored at the Stack, since Structs live in Stack.
This said, let’s talk about the Heap.
Unlike the Stack, the Heap is not a contiguous space it is a collection of memory segments that maybe be close or not to each other, it also has no fixed size, it expands, and shrinks as the memory demand increases or decreases.
So why we don’t use only the heap, we can get rid of that size limitation thing with the Stack?
Exactly for having an unknown size and being distributed across various blocks, the reading and writing at the heap is much slower, but still, is vital for our code to work.
As said above, the Heap is responsible for storing Reference Types, variables that still need to live beyond the execution of a method or don’t have a determined span of life, some examples of reference types are Classes, Arrays, Strings, and Delegates.
Let’s use Classes as an example.
Every time we create an instance of a class, an amount of space is allocated at the Heap to store this class data. This allocation is made automatically by the language runtime when the class constructor is called.
But how the runtime knows how much space is needed to store this data?
When we compile our C# code, a lot of metadata is created regarding the code and one of this information is the needed size in memory for a new instance to be created and stored.
The Heap can be represented somewhat like this
The blue area is space still available for allocation, the yellow block represents our allocated object at the Heap.
Note that we have a marker at the last position.
This marker is a pointer that indicates the address where the next object will be allocated, this makes the runtime life easier since it knows exactly where the next item will be allocated, and thus, it doesn’t need to seek for available space for that.
Allocation of a few more objects, we would be ending up with the following scenario.
Note that the allocated space of each object is different, this is related to each object’s size in memory.
This size will vary depending on how many properties our classes have, the array’s size, and so on.
Nice! We talked about how objects are stored in the heap, but we didn’t discuss how they are cleaned, so before we jump into this topic, let’s see the code below.
Analyzing this code is clear that when the for-loop iteration ends,
myClass variable would be deallocated since the scope of iteration would be ended, right? Unfortunately, the Heap cleaning is not that simple.
Heap deallocations are made in a non-deterministic way, this means that we don’t know the exact moment where these data are removed or even when they lose their context and are no longer needed.
But how this cleaning is made? Who’s responsible for that?
As we’ll see soon the responsible for this cleaning is a tool called Garbage Collector. This is the one that does the hard job for us and allows us to focus more on “what to do” instead of “how to do” (regarding memory management, of course)
The Garbage Collector
We said above that the deallocation of objects from the heap is done at a non-deterministic time by the Garbage Collector, but how does the Garbage Collector know when to run?
It has two triggers: when the amount of objects on the heap exceeds the acceptable limit or when the operating system triggers a warning of a potential out-of-memory scenario.
The second case is simple: the Operating System sends a warning and the Garbage Collector (let’s call it GC for short) is triggered.
When we talk about the first case, we use the term “limit of the acceptable”, but what is the limit, and who defines it?
Well, the limit we don’t know. This value is calculated all the time by the language runtime, which has a series of parameters heuristically configured and updated from time to time.
If we had fixed values for this we could have big problems since the same settings of a notebook application would be applied to a system with thousands of simultaneous users, so these values are adjusted by the runtime itself at runtime.
But we also can force the execution of the GC by hand by calling the command GC.Collect() and that is a very bad idea!
I said earlier that the GC runs based on a series of self-managed parameters, and telling the GC to run it manually is to tell the runtime that these parameters are wrong and, due to the way the GC works, this can have serious implications for the performance of your application, that is, only call Collect() if you are absolutely sure what you are doing.
Let’s do a deep dive to understand why.
When the GC is triggered 4 things happen
1-Suspension: It suspends the execution of your ENTIRE application. That’s right, your application is frozen completely when the GC execution starts.
This is to prevent memory concurrency and ensure that nothing new is placed anywhere on the heap. That’s why GC has a big impact on your application’s performance and running it all the time might not be a good idea.
Depending on how your code is structured, the GC can fire hundreds to thousands of times in a few seconds.
GC is an extraordinary and really fast piece of code, but let’s be cautious about using it. =)
2-Sweeping/Marking: The GC starts scanning the heap looking for objects that have lost their references and, therefore, will no longer be needed by your application and starts removing them.
This results in lots of deallocated spaces in memory, creating high fragmentation, as we can see below:
With the markup and deletion performed we have the following scenario on the heap:
Did you notice the amount of unused space?
What if between Object2 and Object4 we wanted to allocate a new larger object? It would have to go to the end of the heap and we would be left with this unused space.
That’s if we didn’t have the third step.
3-Shrinking: At this step, the GC starts removing all the spaces, moving all these objects in memory (and this is very expensive but necessary) so that the spaces that were in the middle are moved after the last allocated object, and allocation of new objects can be sequential.
4-Resume: Finally our application unfreezes and things begin to run again.
As we have seen, the execution of the GC is necessary, although it is a very expensive and high-impact thing for our system, so the engineers (not just C#) had to think of ways to make this process smarter so that it could run less often.
That’s why the GC works with the concept of areas where objects are allocated. These areas are classified by generations, there are 3: Gen0, Gen1, and Gen2, and the generations are directly related to the lifetime of objects in memory.
Gen0 - Gen0 is where the youngest objects are. They are placed in this area, as the younger an object is, the greater the chance that it will not be needed in the future.
Think about the number of times we create support variables or perform a simple query to the database to get a value that will be used exclusively for calculating something else. Think about how objects tend to be created more often and survive less time.
This is where the GC is mostly triggered to clean and compress memory areas. Now, if an object survives the Gen0 cleanup, it is promoted and becomes allocated to Gen1.
Gen1 - Cleanup on Gen1 is only triggered when execution on Gen0 was not able to free up enough memory. It works as a buffer area between Gen0 and Gen2.
Objects allocated in Gen0 and Gen1 are classified as ephemeral objects and it is important to tell that whenever a cleanup of Gen1 is triggered, Gen0 clean up is also obligatorily and automatically executed.
Below, we have a table with the size of the area of ephemeral objects areas(Gen0 and Gen1), according to the platform and type of GC (more on this later):
And if the object survives Gen1 what happens to it? Exactly, it goes to Gen2 which is our area for “older” objects.
Gen2 - If your object survived the Gen0 and Gen1 cleanup, this is where it will be moved.
Oh, and if your object is static (it is born and dies along with the application) this is where it will be allocated as well. But not only promoted and static objects lives at Gen2.
Gen2 is also responsible for managing a critical area. Remember I said that right after a clean-up on the heap, a shrinking process is done? Well, not always.
This is only performed in the area we call the Small Objects Heap(SOH), which only allocates objects smaller than 85,000 bytes. For larger objects, we have the Large Objects Heap (LOH).
But first, let’s talk a little more about memory allocation.
If even after clean-up, the heap still needs to grow more, a new block of memory is allocated to hold these objects.
But only ephemeral objects are moved there. In this case, the generation 2 area has shrinkage and ephemeral objects are moved to this new area.
Now let’s talk about the Large Object Heap.
Large Object Heap
LOH (let’s call it that) works a little differently than SOH. It is specific to objects larger than 85,000 bytes. It may seem little, but believe me, an object hardly exceeds that size.
The most common cases of objects that go to LOH are strings and arrays.
One of the most critical phases of GC execution is memory shrinking, which involves moving objects around. Larger objects are considerably more costly to relocate, to the point where it would have such a performance implication that it stops being worth it and this is where they end up.
When a LOH object is cleaned-up, its space is empty. No kind of displacement is done, but that doesn’t mean that this space will be empty forever.
There is a table of “empty spaces in the LOH” and when creating a new object in the LOH, this table is checked to see if this space can hold this new object, this was created to reduce the amount of memory used by the LOH area.
The problem happens when the remaining space between objects is less than 85,000 bytes. There’s nothing to do there. This area will not be allocated, not until one of the adjacent objects is cleaned up as well, and the memory address available in that space “expands sideways”.
However, the runtime almost always prefers to allocate these new objects at the end of the address, since the work of querying the table may not pay off, and forcing a Gen2 clean-up to make additional space in these areas is very expensive.
So the tip here is: watch out for very large objects!
As with everything else in computing, almost nothing is set in stone, starting at .NET 4.5 we have the option to enable a flag to force compression to be performed on the LOH, although this is not advisable due to the high potential impact.
- When working with multi-line text files, if possible, prefer reading one line at a time instead of reading everything at once. Each line will be (possibly) an entry SOH and may be cleaned up, if the file is large and you read at once, surely it will be allocated at LOH, and its deallocation will take longer to run
- Objects that have finalizers, or destructors, (methods that are executed during the finalization of a class) are never cleaned up in Gen0, as the code inside the finalizer may slow down the execution and impact the clean-up time of Gen0, which should be as performant as possible. If the cleanup was previously performed, we can tell the GC not to run the finalizer with the GC.SupressFinalizer() command.
- Beware of calls to unmanaged code. Calls to COM+ objects or Operating System resources may not be supported by the Garbage Collector and your program will never release that allocated memory.
- Do not call
GC.Collect(), the best way to manage memory is to let the runtime do it for us.
Here’s a simple code that concatenates a string 50,000 times and returns that value.
I know, it’s a silly example, but think of code that does this much less often but on a much larger scale.
Strings are immutable types, that is, if we want to change their value, we need to create another string with the value we want, the reference in memory is re-pointed and the GC does the cleaning work since strings are reference types and live inside the heap.
GC runs are ranked by the benchmark tool in x executions per 1000 runs.
Internally, StringBuilder works with an array of strings and only actually generates the string that we want to display when we perform sb.ToString().
This helps a lot to avoid the overhead of consecutive string allocations and GC triggers. That is why this method is much more performant. The number of GC clean-ups and memory allocation (1GB vs 457KB) speak for themselves.