Let's play a word association game. When I say "stack overflow" what comes to mind? Is it Spolsky and Atwood's popular question and answer site? Do you imagine a toppling tower of buttermilk pancakes oozing with real Vermont maple syrup? Or do you think about the common catastrophic programming bug? Stack overflows are a real problem in a lot of code. Let's dig in a little bit to understand how and why they happen.
First we need to understand what the stack is and how it is used. A stack is a last-in-first-out data structure. You can 'put' something on top of it and 'pop' off the thing on top of the stack. Want to access something in the middle? Forget about it. You must keep popping off the top until you get there. Think of it like a stack of those delicous pancakes. If you want to add another pancake, the only place you can put it is on the top of the stack. Trying to pull a pancake from the middle of the stack would likely leave you with syrup-soaked pants, so when you take a pancake you must take it from the top.
Most every programming language uses a stack data structure to keep track of program execution. It's the foundation for how function calls work. As the CPU (or virtual CPU) executes your function, it builds up a local context. This context contains the local variables in the function and the pointer to the line of code currently being executed. What happens, then, when the CPU hits a function call? It needs to jump to a new part of the code and start executing in a new context, but the context of the current function still needs to be there when the new function returns. What happens in that other function should not effect the local variable in the current one.
Enter the program stack. Each thread of execution has a stack. Before jumping to the new function, the compiler inserts instructions to push all of the current context onto the this stack. The local variables and everything else that gives that function its context get stacked on top. The last thing pushed is the pointer that tells the CPU where to jump back to after the function call is finished. The parameters that are being passed to the function go on the stack, as well as the return value from the function.
As functions call more functions, the stack piles taller and taller. You've seen this stack before and probably didn't know it. When you see an exception that dumps out a big stack trace, you're just seeing a view of all of the function calls piled up on that stack. If you are in a debugger and you look at the call trace, what you are seeing is the debuggers view of the stack.
So if that's a program stack, how and why does the stack overflow? In most programs the stack grows and contracts as the execution of the program goes deeply into nested function calls, and then unwinds those calls. This is not problematic for most programs.
What happens, though, if recursion is introduced into the program? A function calls itself, which may in turn call itself again. This is not problematic so long as the recursion has some reasonable bound - a condition that causes the recursion to stop, then unwind. Even if the bound is quite high, modern computers have such large memory capacities that enough stack space can be allocated to accommodate the program.
Where programs run into trouble is when the recursion is not bounded. Consider this snippet of Swift code from a game that takes user input.
func move(board: Board) {
printer.printBoard(board)
printer.print("Please choose a space on the board: ")
let spot = reader.getInt()
if spot != nil && spot! >= 0 && spot! < BoardConstants.SIZE && board.containsOpenSpot(spot!) {
board.dropToken(spot!, token: token)
} else {
move(board)
}
}
If the player does not enter a valid 'spot', this function calls itself to get another move. Each improper input puts another layer of context on top of the stack. Pancake after pancake piles high as the player inputs more and more wrong entries. Because this recursion is not bounded, we've given the player the opportunity to overflow our stack. The stack overflows when it takes up more space than has been allocated to it. In the case of the Swift code above, it took about 20,000 wrong inputs, but eventually the program crashed with a stack overflow error. The size of the stack exceeded the allocated stack space and 'overflowed' into adjacent memory.
Here is another example. This is from a clojure web app. This function calculates a sequence of days between two dates.
(defn collect-days [current end days]
(cons current
(if (= current end)
days
(collect-days (time/plus current (time/days 1)) end days))))
This function may look safe and bounded, but consider what happens if end
is actually before current
! The function will keep recursing deeper and deeper. No matter how many days you add to current
it will never be equal to end. This unbounded recursion will eventually overflow the stack.
The moral of the story? Don't allow user input to cause unbounded recursion. Use a loop instead. Loops stay in the context of the function with internal jumping and don't build up on the stack.
Having seen this mistake many times, I have also encountered some exceptions to this rule. The Rust language has a really interesting implementation of the Stack. Rust actually dynamically allocates new chunks of Stack when you fill up what's been allocated. Built-in to the language is the ability to jump to a new section of stack and keep going. A Rust program like this won't crash. What a user can do is cause the program to use insane amounts of memory. Eventually, your system will run out of physical memory and begin swapping to virtual memory on the disk, which slows the system to an eventually unusable state.
The other exception is a language like Clojure, which has tail recursion optimization. Instead of calling the function by name, you can use the 'recur' keyword. The recur call must be in the 'tail' position, meaning it is the last thing executed in the function. This allows the compiler to turn that recursion into something more like a loop under the hood and prevent the stack from growing. Tail-optimized recursion is an acceptable alternative to a loop. If you are using the language this way, you'd better make darn sure that you've put the rercusion in the tail position. Clojure enforces this, but not all languages do.
Now that we've got a handle on how to control stack growth by avoiding unbounded recursion, I'm really hungry for those pancakes. Who's with me?
Photo Credit: Tavallai