Computers need to store data while they are performing computations. This is (hopefully) not a revolutionary concept to you. However, in my experience very few people really take the time to learn how this works exactly. Most programmers have at least heard of concepts like the stack and the heap. But a few months ago I had a conversation with one of my friends (who I consider to be a very good programmer), and he revealed that he didn’t really understand what we needed the heap for.

So, I’m aiming today to explain what the stack and the heap are, why we need them, and perhaps even discuss some of the more advanced ideas in memory management that have come out since they first became popular that can tackle the same problems more gracefully.

Unstructured Memory

Before we had high level languages like C, we had to write code in bare assembly. This was quite ordeal, I understand. I, like many other others, got a tiny view of it in my degree, where we were required to write just a little assembly to help solidify my understanding of how computers work at the lowest level. When you are writing assembly, there is no notion of an organised memory structure - all that exists is just empty space, free for you to use.

This is probably worse than it sounds. Function calls are just gotos with a hard-coded address; If you were lucky, you got something like JRA (Jump relative address) or assembly-level jump labels, and you wouldn’t need to re-write your program if something else was already using your preferred address space. If you weren’t lucky jumps took bare addresses, and I understand this was more common back in the day. Then wanted to move the function, or say, add a line to the program before that function’s definition, then you would need to manually update the address, everywhere the function was called. And in assembly, it’s up to you to enforce your calling convention. There is not (in general) a distinction between program memory and data memory - so you are also responsible for making sure you never accidentally overwrite some critical part of your code with the data you’re processing.

This requires quite a lot of discipline from the programmer - more, it turns out, than even the most hardcore modern programmers are willing to exert. Raw-dogging unstructured memory is inflexible and error-prone. But it’s also painful because simple tasks become very difficult. Consider for example implementing a linked list of unknown size; how do you know where the best place to allocate the next node is? How do we track if a space is free or if it’s currently in use? Managing this with raw assembly would have been a truly monumental task.

But some very clever people realised that if we could just impose some structure on memory, and some corresponding structure on the programs, then we get rid of a lot of the tedium and handle memory automatically. Enter the stack and the heap.

The stack

Take a look at the following function:

int main() {
  int a = 1;
  int b = 2;
  return a + b;
}

And lets imagine that we don’t have a nice optimising compiler that will just inline a and b. How much space does it need to run? There are two ints, and integers are generally 8 bytes long; so we need 16 bytes of space.

Now think about this: when do we need to keep the space for a and b around? The key observation is that neither of these values will ever be needed after the function ends. So we can allocate the space for them when the function gets called, and then deallocate them all together when the function ends. Great!

This is the notion of a stack; the idea is that we calculate the amount of memory we need for a function to run statically - ahead of runtime. Then whenever a function is called, we allocate that space - we call it a stack frame. And so long as the data is tied to the function, we can safely deallocate it when the function returns.

It’s called “the stack” because it resembles the data structure of the same name. We select some region of memory (usually the top end of the program data region, but it doesn’t really matter) to use as the stack. We then keep a pointer to the next free bit of memory in that space. Whenever we need memory for a function call, we just increment the stack pointer as required. When a function returns, we decrement it. Frames come off the stack in reverse order to how we added them - just like the data structure.

Problems

There is of course, a problem with this. I laid out two assumptions we needed to get the stack to work - can you work out what they are?

The first is the lifetime issue - what if our data is going to be used after the function ends? One option is to just directly return the data - i.e. allocate space in the calling frame, and put it there when the function ends. This can work, but if the data is very large, we probably don’t want to be passing it around by value like that. We’d like a more efficient representation - we want to be able to pass it by reference. This solution also doesn’t really handle ideas like function closures all too well, which should bind values in the scope they were created in.

A bigger problem for us is that it also does not handle the second issue - sizing. It’s not always possible to estimate the size of a data structure ahead of time. For a simple example, consider a program that reads in a user-entered string. How much space should we put aside for it?

You could put an upper limit on the length of the string, but say that’s not possible (and there are cases where it’s not - determining the size of an arbitrary function return is undecidable). The truth is that there isn’t a good, clean way to allocate space for this statically.

These two issues - sizing uncertainty and lifetime uncertainty - that provide the motivation for the heap.

The heap

Unlike the stack, the name here is coincidental - the runtime heap bears no relation to the data structure of the same name. The heap usually does have some structure, although it’s much more complicated - what structure it has precisely depends on the implementation. Generally speaking, the heap is split up into granules. A granule is a small region of memory - the smallest amount we can allocate on the heap in one go. Part of why they exist has to do with alignment issues - even if you allocate space for only a single bit, the computer can only do loads in register-sized blocks. This is why ordering your struct properties in C correctly is so important - it influences the memory layout of the struct, which in turn directly impacts it’s memory usage.

But returning to topic; we cut the heap up into granules which can be individually allocated. Access to the heap is allowed through two functions: an allocator and a deallocator. An allocator takes in a size - i.e. the amount of memory we want to allocate - and returns a pointer to a newly allocated block of memory of (at least) that size. Intuitively, you can understand this as allocating enough consecutive granules to fit the object. If you want $n$ bits, then you need $\text{ceil}(n \div \text{granule size})$ granules.

Deallocators take in pointers to live (i.e. allocated, in-use) blocks of heap memory and mark them as unused - so that they can be recycled.

In a simple implementation, these functions are basically list.pop() and list.push() operations (with some fancy dressing). That hopefully explains why things like double free and use after free errors are such a big deal. If you free a memory location twice, it can end up in the free list (as it’s called) twice, and therefore be allocated twice. You can then end up with two different parts of the code accidentally accessing the same section of memory, assuming that they have sole ownership. This can be disastrous, and cause all kinds of horrific memory corruption issues. At the very best, your code will be wrong - but it will usually be a lot worse.

Why expose ourselves to this potential issue, if it’s so dangerous? Well, because C programmers just won’t quit. BUT ALSO, because the heap solves the uncertainty issues we just discussed! malloc and free make managing the memory of the program the problem of the programmer. Through the magic of ‘make the user do it’ we solve the uncertainty problem; So long as we, the programmer can work out what the lifetime or size of an object should be, We have no problem allocating and deallocating the memory for it.

In real life, malloc and free are usually a bit more complex. We need to deal with issues like fragmentation. Consider: to allocate an object of some size, we need to find a contiguous block of free memory of at least that size. If we really used the simple version of the allocator where we have only one granule size, in an unordered list, doing this would be a very expensive operation. So instead, most allocators maintain a more than one free list, each holding granules of various different sizes. When you allocate, you simply pick from the list holding the smallest blocks that will fit your object. When you free, you push back into the corresponding list.

This allows you to avoid the performance issues of fragmentation, at the cost of a slightly more complex allocator. Allocators also tend to keep around a big block of undivided memory as ‘spare’. This can be used to allocate space for really big or oversized objects that don’t fit into any of the granule sizes you already offer. It can also be cut up into smaller granules as required if a given free list runs out of entries.

There are other ways of tackling the heap problem - lots, in fact. It turns out that managing the memory yourself with malloc and free is a fantastic way to shoot yourself in the foot. So a lot of development in PL theory and runtime systems has been in finding ways to automate the management of heap memory.

Garbage Collection

A very popular mechanism is garbage collection. In a garbage collected language, free is never called. Instead, whenever the allocator is called, it can choose to run the garbage collector (GC).

There is a whole plethora of different GC implementations, all of which have different advantages. Maybe the simplest is reference counting GC. You just track how many references to a given block exists. When that count hits 0, you can deallocate the block.

This is susceptible to a problem reference cycles - what two dead blocks point at each other? This is the motivation for tracing GC. The trace algorithm involves first looks through the stack and finds pointers to live objects. These objects are then swept in turn looking for yet more pointers. This process continues until it eventually finishes. So long as we ensure we only follow a given pointer once, we are guaranteed to terminate. All the live memory trace visits is marked as live. The stuff that didn’t get marked as live can safely be considered dead - and can be allocated anew.

Conservative GC takes this a bit further. In trace, we implicitly assume that we can determine what is and what isn’t a pointer. But maybe that isn’t true. In tracing GC, we assume that anything that could be a pointer is a pointer. Then you simply run trace on that. This sounds a bit weird, but it has a very big advantage - you can implement a GC like this as a drop-in replacement for Cs malloc and free. You can garbage collect a program with no change to the code required - just change how it gets linked. A good example here is BDWGC.

This is just scratching the surface of what GC and managed runtimes do - for instance, compacting GC squishes objects together to reduce the amount of awkward free space there is, and reduce fragmentation. This can be very expensive, and doing it well is a really interesting problem to solve.

There’s also been lots of work on how to statically analyse programs to allow the compiler to automatically insert the proper free calls. Rust is probably the most famous example of this - it’s ownership system guarantees that safe free calls can be inserted automatically. Work of this kind has been around for ages; Tofte and Talpin’s paper on region based memory management is perhaps the most famous. Regions is to allow stack frames to grow in size at runtime. Essentially, you associate with each a list of pointers, freed when the frame is popped.

Conclusions

That about covers everything I wanted to say. Hopefully you can walk away with this feeling confident about what the stack and heap really are, and maybe also with a bit of an intuition for what malloc and free and doing. At it’s core, memory management is really not that difficult to understand. I also hope that I’ve convinced you that there is some really interesting problems in memory management. Even today, people are still working on new techniques, and I think it’s one of the most interesting areas of research within PL.

Memory Management

Unstructured Memory

The stack

Problems

The heap

Garbage Collection

Conclusions