Computers need to store data while they are performing computations. This is (hopefully) not a revolutionary concept to you. However, in my experience very few people really take the time to learn how this works exactly. Most programmers have at least heard of concepts like the stack and the heap. But a few months ago I had a conversation with one of my friends (who I consider to be a very good programmer), and he revealed that he didn’t really understand what we needed the heap for.

So, I’m aiming today to explain what the stack and the heap are, why we need them, and perhaps even discuss some of the more advanced ideas in memory management that have come out since they first became popular that can tackle the same problems more gracefully.

Unstructured Memory

Before we had high level languages like C, we had to write code in bare assembly. This was quite ordeal, I understand. I, like many other others, got a tiny view of it in my degree, where we were required to write a little assembly to help solidify my understanding of how computers work at the lowest level. When you are writing assembly, there is no notion of an organised memory structure - all that exists is just empty space, free for you to use.

This is probably worse than it sounds. For example, function calls are just gotos with a hard-coded address; If you’re lucky, you get something like JRA (Jump relative address) or assembly-level jump labels, and you wouldn’t need to re-write your whole program if something else was already using your preferred address space. If you weren’t lucky, jumps took bare addresses. Then wanted to move the function, or say, add a line to the program before that function’s definition, then you would need to manually update the address, everywhere the function was called. And in assembly, it’s up to you to enforce your calling convention. There is not (in general) a distinction between program memory and data memory - so you are also responsible for making sure you never accidentally overwrite some critical part of your code with the data you’re processing.

This requires quite a lot of discipline from the programmer - more, it turns out, than even the most hardcore modern programmers are willing to exert. Raw-dogging unstructured memory is inflexible and error-prone. But it’s also painful because simple tasks become very difficult. Consider for example implementing a linked list of unknown size; how do you know where the best place to allocate the next node is? How do we track if a space is free or if it’s currently in use? Managing this with raw assembly would have been a truly monumental task.

But some very clever people realised that if we could just impose some structure on memory, and some corresponding structure on the programs, then we get rid of a lot of the tedium and handle memory automatically. Enter the stack.

The stack

Take a look at the following function:

int main() {
  int a = 1;
  int b = 2;
  return a + b;
}

And lets imagine that we don’t have a nice optimising compiler that will just inline a and b. How much space does it need to run? There are two ints, and integers are generally 4 bytes long; so we need 8 bytes of space.

Now think about this: when do we need to keep the space for a and b around? The key observation is that neither of these values will ever be needed after the function ends - the references a and b are only valid within the scope of main. So we can allocate the space for them when the function gets called, and then deallocate them all together when the function ends. Great!

This is the notion of a stack. For any given function, we calculate the amount of memory it needs to run statically, ahead of time. Then whenever a function is called, we allocate that space in what’s called a stack frame. So long as the data is tied to the function, we can safely deallocate it when the function returns.

This is called “the stack” because it resembles the data structure of the same name. We select some region of memory (usually the top end of the program data region, but it doesn’t really matter) to use as the stack. We then keep a pointer to the next free bit of memory in that space. Whenever we need memory for a function call, we just increment the stack pointer as required. When a function returns, we decrement it. Frames come off the stack in reverse order to how we added them - just like the data structure.

Problems

There is of course, a problem with this. I laid out two assumptions necessary to get the stack to work - can you work out what they are?

The first is the lifetime issue - what if our data is going to be used after the function ends? One option is to just directly return the data - i.e. allocate space in the calling frame, and put the data there when the function ends. This can work, but if the data is very large, we probably don’t want to be passing it around by value like that. We’d like a more efficient representation - we want to be able to pass it by reference. This solution also doesn’t really handle ideas like function closures all too well, which should bind values in the scope they were created in.

A bigger problem for us is that it also does not handle the second issue - sizing. It’s not always possible to estimate the size of a data structure ahead of time. For a simple example, consider a program that reads in a user-entered string. How much space should we put aside for it?

You could put an upper limit on the length of the string, but say that’s not possible (and there are cases where it’s not - determining the size requirement of an arbitrary function in a turing complete language is undecidable). The truth is that there isn’t a good, clean way to allocate space for this statically.

These two issues - sizing uncertainty and lifetime uncertainty - that provide the motivation for the heap.

The heap

Unlike the stack, the name here is coincidental - the runtime heap bears no relation to the data structure of the same name. The heap usually does have some structure, although it’s much more complicated - what structure it has precisely depends on the implementation. Generally speaking, the heap is split up into granules. A granule is a small region of memory - the smallest amount we can allocate in one go. Part of why they exist has to do with alignment issues - even if you allocate space for only a single bit, the computer can only do loads in register-sized blocks. This is why ordering your struct properties in C correctly is so important - it influences the memory layout of the struct, which in turn directly impacts it’s memory usage.

But returning to topic; we cut the heap up into granules which can be individually allocated. Access to the heap is allowed through two functions: an allocator and a deallocator. An allocator takes in a size - i.e. the amount of memory we want to allocate - and returns a pointer to a newly allocated block of memory of (at least) that size. Intuitively, you can understand this as allocating enough consecutive granules to fit the object. If you want $n$ bits, then you need $\lceil \frac{n}{\text{granule size}} \rceil$ granules, although things are more complicated in practise.

Inside an allocator, we maintain (probably several) lists of unused memory, called free-lists. These are often implemented as very simple linked lists - in any given granule, you write in the address of then next granule in the list, and keep a pointer to the head. These free-lists have different granule sizes - although, usually the size of any given one will be a multiple of the smallest available size. When we allocate space for an object, we take a granule from the smallest free-list which can fit it. There’s also usually a large block of memory that hasn’t yet been segmented. This is used to allocate space for extremely large objects, which don’t fit in any of the free-lists, and so that we can dynamically increase the size of the other free-lists as required. Why do it this way? The short answer is speed. Finding many consecutive granules might take a long time, so we don’t even try. We want our allocators to be fast - so it’s important that we don’t have to work super hard to allocate an object.

Deallocators take in pointers to live (i.e. allocated, in-use) blocks of heap memory and mark them as unused - so that they can be recycled. In practise, this just means adding the granule back to the free list.

In a simple implementation, these functions are basically list.pop() and list.push() operations (with some fancy dressing). That hopefully explains why things like double free and use after free errors are such a big deal. If you free a memory location twice, it can end up in the free list (as it’s called) twice, and therefore be allocated twice. You can then end up with two different parts of the code accidentally accessing the same section of memory, assuming that they have sole ownership. This can be disastrous, and cause all kinds of horrific memory corruption issues. At the very best, your code will be wrong - but it will usually be a lot worse.

Why expose ourselves to this potential issue, if it’s so dangerous? Well, because C programmers just won’t quit. BUT ALSO, because the heap solves the uncertainty issues we just discussed! malloc and free make managing the memory of the program the problem of the programmer. Through the magic of ‘make the user do it’ we solve the uncertainty problem; So long as we, the programmer can work out what the lifetime or size of an object should be, We have no problem allocating and deallocating the memory for it.

There are other ways of tackling the heap problem - lots, in fact. It turns out that managing the memory yourself with malloc and free is a fantastic way to shoot yourself in the foot. So a lot of development in PL theory and runtime systems has been in finding ways to automate the management of heap memory.

Garbage Collection

Another very popular memory management mechanism is garbage collection. In a garbage collected language, free is never called. Instead, whenever the allocator is called, it can choose to run the garbage collector (GC). The garbage collector is responsible for figuring out what allocated memory is now dead, and safe to be reused.

There is a whole plethora of different GC implementations, all of which have different advantages. Indeed, there are really several different axes on which we can place GC systems, depending on how exactly they work.

Precise or Conservative
Compacting or Non-Compacting
Tracing vs Reference Counting vs Generational vs Something else…

The simplest garbage collector we could come up with is probably reference counting GC. In reference counting, with each block of memory, you associate a counter, which tracks how many references to a that block exists. When that count hits 0, you can deallocate the block.

The biggest problem that RC GC faces is reference cycles. Simply put: What if two dead blocks point at each other? This is the motivation for tracing GC. The trace algorithm first looks through the stack - what is called the ‘root set’ - and finds pointers to live objects. Everything else is then scanned in turn, looking for yet more pointers. This process continues until it there is no new pointers to follow. So long as we ensure we only follow a given pointer once, we are guaranteed to terminate. All the live memory trace visits is marked as live. The stuff that didn’t get marked as live can safely be considered dead - and can be allocated anew, or ‘swept’. Thus, this kind of tracing GC is sometime called ‘mark-sweep’.

I assumed in this explanation that we could always determine what counts as a pointer - this is called ‘precise’ GC. But in practise, that’s not always true. In C for instance, *void and u_size integers look identical. In conservative GC, we assume that anything that could be a pointer is a pointer. Then you simply run trace on that. This sounds a bit weird, but it has a very big advantage - you can implement a GC like this as a drop-in replacement for Cs malloc and free. You can garbage collect a program with no change to the code required - just change how it gets linked. A good example here is BDWGC.

This is just scratching the surface of what GC and managed runtimes do - for instance, compacting GC squishes objects together to reduce the amount of awkward free space there is, and reduce fragmentation. This can be very expensive, and doing it well is a really interesting problem to solve.

There’s also been lots of work on how to statically analyse programs to allow the compiler to automatically insert the proper free calls. Rust is probably the most famous example of this - it’s ownership system guarantees that safe free calls can be inserted automatically. Work of this kind has been around for ages; Tofte and Talpin’s paper on region based memory management is perhaps the most famous. Regions is to allow stack frames to grow in size at runtime. Essentially, you associate with each a list of pointers, freed when the frame is popped.

Conclusions

That about covers everything I wanted to say. Hopefully you can walk away with this feeling confident about what the stack and heap really are, and maybe also with a bit of an intuition for what malloc and free and doing. At it’s core, memory management is really not that difficult to understand. I also hope that I’ve convinced you that there is some really interesting problems in memory management. Even today, people are still working on new techniques, and I think it’s one of the most interesting areas of research within PL.

Memory Management

Unstructured Memory

The stack

Problems

The heap

Garbage Collection

Conclusions