Visualizing Recursion

In my experience teaching recursion, I have found one method of visualizing recursion to be particularly effective. This post contains an interactive, animated version of that visualization.

Binary Trees and Depth First Search

The rules we are about to learn about recursive control flow apply to all problems involving recursion, but we will learn about them exclusively in the context of depth-first search and binary trees. Depth-first search is a recursive algorithm that systematically visits every node in a tree. Interview problems involving recursion and binary trees almost always require extending depth-first search in some fashion. The version of depth-first search we will be working with in this visualization is below:

def dfs(node):
  if not node:
    return
    
  dfs(node.left)
  dfs(node.right)
          

This is the simplest possible form of depth-first search. We will be using it to traverse the tree constructed via the Python code on the right and represented graphically on the left:

class Node(object):
  def __init__(self, data, left=None, right=None):
    self.value = data
    self.left = left 
    self.right = right

ten = Node(10)
eight = Node(8)
nine = Node(9)
six = Node(6, left=nine)
twelve = Node(12, left=six, right=eight)
root = Node(5, left=ten, right=twelve)

dfs(root)
G 5 5 10 10 5->10 12 12 5->12 6 6 12->6 8 8 12->8 9 9 6->9

Visualizing Code Execution

Let's quickly familiarize ourselves with the style of visualization we will be working with. Use the slider or the prev/next buttons to execute each line. Note the variables that show up in the panel to the right of the code snippet after each line of code completes, and how they map to the various nodes in the tree diagram. Executing each step in this snippet brings us to line 14, in which we are about to make our first call to dfs(root).

class Node(object):
  def __init__(self, data, left=None, right=None):
    self.value = data
    self.left = left 
    self.right = right

ten = Node(10)
eight = Node(8)
nine = Node(9)
six = Node(6, left=nine)
twelve = Node(12, left=six, right=eight)
root = Node(5, left=ten, right=twelve)

dfs(root)
10 10 9 9 6 6 6->9 8 8 12->8 12->6 12 12 ` ` 5 5 5->10 5->12
Step 0 out of 6

The Rules of Recursive Flow Control

Rule #1: Recursive Calls Create Frames

We start our visualization by creating a frame to represent the first call to dfs(node). A frame maps to the execution of a single function call. It displays the current line of code being executed and the values of any local variables within that call. Execution within this frame reaches line 5, where we encounter our first recursive call to dfs(node.left).

To execute this recursive call, we create a new frame and push it to the top of the call stack. The call stack is a stack data structure containing all frames which have not finished executing. The newly created frame becomes the active frame (green border), indicating this frame is the only frame in our call stack which is actively executing instructions. The previously active frame turns grey to indicate that it is “suspended“ - i.e. it is waiting for the recursive call it made to dfs(node.left) on line 5 to complete before it can continue executing.

Just as before, execution within the new active frame continues until line 5, which is another recursive call to dfs(root.left). To execute this recursive call, we create a new frame, push it to the top of the call stack, and make it active. Execution starts in this newly created frame, and since the value of node is None, we reach our first return statement.

Return Statements: Popping Frames off the Call Stack

The return statement is the last line of code to execute within the frame, so we pop the frame off the call stack. This causes the next frame in the call stack - the frame immediately beneath the popped frame in our visual, or the frame associated with the node 10 - to become active again.

From the perspective of this newly active frame, the recursive call on line 5 now has a concrete value - namely, the value returned by the frame that was just popped. In our simple depth-first search, this value is None, but as the problems become more complex, the return statements in each frame will contain actual values that are critical to our solution. Now, the newly active frame, having received the value it was waiting for, continues executing. The next line of code it executes is a recursive call to dfs(node.right), so we create a new frame, push it to the top of the call stack, and make it active.

The value of root is None within this new frame, so we reach the return statement on line 3. We pop the current frame from the call stack and the next frame in the call stack becomes the new active frame. There are no more lines of code to execute in the new active frame, which means it has completed executing. The frame is popped off the call stack, and the next frame in the call stack becomes the new active frame.

Let's take a second to relate the state of our call stack back to depth-first search. At this point, the only frame in the call stack is the frame associated with the root of our tree. Within that frame, line 5 has just finished executing, which means we have finished searching the root's entire left subtree. With that done, we can move onto line 6, which won't complete until we search the root's entire right subtree using the same rules for recursive control used to search the root's left subtree.


To recap, those rules are:

  1. Whenever the active frame encounters a recursive call, a new frame is created, pushed to the top of the call stack, and becomes the new actively executing frame. Execution within the previous frame is suspended.
  2. Whenever the active frame encounters a return statement (or there are no more lines of code to execute), that frame is popped off the stack. The frame's return value (if any) is sent to the next frame in the call stack. This next frame in the call stack becomes the active frame, and execution resumes within it.

The animation shows the steps for searching the root's entire right subtree. See if you can recognize when to apply the two rules above.

These two rules for recursive control flow are foundational - the techniques we are about to learn for solving recursive binary tree problems are based on the values we pass when pushing new frames onto the call stack, and the values we return when popping frames from the call stack.