From Fibonacci to Bitstrings to Max Independent Set

Let’s face it: DP is just hard, much harder than data structures or divide-n-conquer. But it’s also one of the most important and most clever inventions in history of humanity.

While most textbooks start the DP chapter by classical examples such as LIS/LCS and scheduling, I still found them a bit too complex for beginners. Instead, I use a much simpler example to introduce DP to first-time learners, and hopefully you’ll find this introduction gentler and friendlier.

Naive Fibonacci is Exponential Time

The simplest example I can find for DP is the Fibonacci series:

\[ f(n) = f(n-1) + f(n-2)\]

Wait, why is that relevant to DP? Well, because if you implement it as is (i.e., recursively):

def fib(n):
    return 1 if n <= 2 else fib(n-1) + fib(n-2)

It will be much too slow for even very small numbers like n=50 (which takes >1h on my Mac). In fact, you can see that the time to compute fib(n) grows so fast that it looks like exponential growth:

>>> for i in range(10,41,5):
       t = time.time()
       print(i, fib(i), "%.4f" % (time.time() - t))

10 55 0.0000
15 610 0.0001
20 6765 0.0011
25 75025 0.0129
30 832040 0.1414
35 9227465 1.4998
40 102334155 16.7113

You can see that the runtime increases by \(\times10\) or more while \(n\) increases by +5. This is definitely (at least) exponential, i.e., \(O(a^n)\) for some constant \(a\), instead of polynomial \(O(n^a)\). The difference between polynomial and exponential is indeed the most important division in complexity classes, and is at the very heart of the central problem in computer science: P vs. NP.

The reason why the above naive version is so ridiculously slow is repeated calculation. To do fib(5), you first call fib(4), which in turn solves fib(3) in the process. But then you need to call fib(3) again to combine with fib(4)’s result to form fib(5). In fact, everything except the leftmost branch is repeated:

The recursion tree is not a full perfect tree: the leftmost branch (with stepsize 1) has height \(n\) but the rightmost branch (with stepsize 2) has height \(n/2\). If you work out the math behind Fibonacci series, you will know the size of the tree is \(\sim 1.618^n\) which is related to the golden ratio, i.e., the time complexity is \(O(a^n)\) where \(a=\frac{\sqrt{5}+1}{2}\). See more details here: Fibonacci series approximates the golden ratio.

In fact, I plotted the runtime up to \(n=47\) below, which fits the curve \(y=c\cdot a^n\) very well for \(a=1.618\).

Memoized Recursion vs. Bottom-Up

But we all know that Fibonacci should be linear-time (modulo high-precision arithmetics), for example by a simple for loop (\(O(1)\) space):

def fib0(n):
  a, b = 1, 1
  for i in range(3, n+1):
    a, b = a+b, a
  return a

Note that the two assignments (=) above are simultaneous assignments (a nice feature of Python not found in C/C++/Java which would need auxiliary variables).

Or using a list to save all \(f(n)\)’s (\(O(n)\) space):

def fib1(n):
  f = [1, 1]
  for i in range(3, n+1):
    fibs.append(f[-1]+f[-2])
  return f[-1]

Both are bottom-up (from smaller \(n\) to larger \(n\)) and \(O(n)\) time. But the recursive version, though slow, does have its own merit: being top-down, it is identical to the original (recursive) definition, and is thus more “intuitive” from a mathematical point of view. How can we combine the merits of both approaches, i.e., a recursive function that runs in \(O(n)\) time?

The answer is memoization (note: not “memorization”), which means to remember the subproblems you already solved before and never solve the same subproblem twice. This technique is also known as tabularization, and thus needs a table that supports lookup, e.g., hash table (Python dict).

fibs={1:1, 2:1} # hash table (dict)
def fib2(n):
  if n not in fibs:
    fibs[n] = fib2(n-1) + fib2(n-2)
  return fibs[n]

These three versions fib1, fib2, and fib3 are all \(O(n)\) time, if we ignore the cost of high-precision arithmetics.

Number of Bitstrings

Now let’s look at our first non-Fibonacci example, number of bitstrings, and you will see that, although it does not look like Fibonacci at all on the surface, it can actually be reduced to Fibonacci.

Count \(g(n)\), the number of \(n\)-bit strings that do not contain "00" as a substring.

For example, for \(n=1\), both "0" and "1" are valid, so \(g(1)=2\), and for \(n=2\), among all 4 strings, 3 are valid (only "00" is not), so \(g(2)=3\).

What about \(g(0)\)? 0 or 1? It should be \(g(0)=1\) because the empty string "" is still valid!

How should we solve this problem? Still divide-n-conquer: we do a case analysis on the last bit (the \(n\)th bit), being either 1 or 0.

<==g(n-1)==>1  # last bit is 1
<==g(n-2)=>10  # last bit is 0, 2nd-last bit must be 1

If the last bit is 1, then for the preceding \(n-1\) bits, every valid \((n-1)\)-bit string (there are \(g(n-1)\) of them) plus the last 1 bit makes a valid \(n\)-bit string;
But if the last bit is 0, then we need to be a bit more careful because the second-last bit must be 1 otherwise we have 00. Now for the remaining \(n-2\) bits, every valid \((n-2)\)-bit string (there are \(g(n-2)\) of them) plus the last two bits (10) makes a valid \(n\)-bit string.
Combining the two cases, we have \[ g(n) = g(n-1) + g(n-2). \]

Isn’t this exactly the same as Fibonacci?!

Well, don’t forget the base cases:

\[ g(0) = 1, g(1) = 2 \]

So this \(g(n)\) series is Fibonacci shifted by one step, i.e., \(g(n)=f(n+1)\).

Maximum Weighted Independent Set on Linear Chain

Now let’s look at a more “real” example of DP, but in the end we’ll still reduce it to Fibonacci.

Given \(n\) numbers \(a_1, \ldots, a_n\), find a subset whose sum is the largest, with the constraint that no two consecutive numbers are chosen (i.e., if \(a_i\) is chosen, then neither \(a_{i-1}\) or \(a_{i+1}\) can be chosen).

For example, given

\[a=[9, 10, 8, 5, 2, 4]\]

the best solution is \([9, 8, 4] \rightarrow 21\).

You might come up with a very simple greedy solution: always take the largest available number (let’s say \(a_i\)), cross out its two neighbors (\(a_{i-1}\) and \(a_{i+1}\)), and repeat until no numbers left. This is suboptimal, for example, for the above array, you’ll take \([10, 5, 4] \rightarrow 19\).

So how to solve it by DP?

Hint: Does the constraint “no two consecutive numbers” remind you of something similar in the number of bitstrings? Yes, you’ll see that it is the same as “no 00 as substring” with each 0 meaning “take this number”.

So we first define the subproblem:

Let \(f[i]\) be the best MIS value for the first \(i\) numbers, \(a_1, \ldots, a_i\).

Then like in bitstrings, we do a case analysis on the last number, \(a_i\):

If we don’t take \(a_i\), then we can use the best solution for the first \(i-1\) numbers, i.e., \(f[i-1]\);
If we decide to take \(a_i\), then we have to skip \(a_{i-1}\), and can use the best solution for the first \(i-2\) numbers, i.e., \(f[i-2]\), plus the value of \(a_i\).

So:

\[f[i] = \max\{f[i-1],\quad f[i-2] + a_i\}\]

This is almost identical to the bitstrings problem, except we use \(\max\) instead of \(+\) between the two cases.

What about the base cases?

Well, \(f[0] =0\), but is \(f[1]=a_1\)? No, because each \(a_i\) might be negative. Is how about \(f[1] = \max\{a_1, 0\}\)? That’s correct, but too complicated. It’s better this way:

\[ f[0] = 0,\quad f[-1] = 0\]

Here is an example:

\(i\)	-1	0	1	2	3	4	5	6
\(a_i\)			9	10	8	5	2	4
\(f[i]\)	0	0	9	10	17	17	19	21

So we’ve got the correct answer of \(21\). However, this is only half of the problem. In optimization problems like this, we also need to return the optimal solution \([9, 8, 4]\) in addition to the best value of \(21\).

Backtracking for the Best Solution

How would you do this? Think again about divide-n-conquer. You now know the best value of the global problem (\(a_1, \ldots, a_n\)), so you should also backtrack from the global problem. But unlike divide-n-conquer, now we need each subproblem (including the global one) to tell us how to best divide that subproblem, because there are multiple ways of division (in our case, either dividing into \(f[i-2]\) and \(a_i\) or dividing into \(f[i-1]\)). This is the crucial difference between DP and divide-n-conquer:

DP is divide-n-conquer with multiple ways of division.

To remember the best division for each subproblem, we need another table \(b[i]\), to store the backpointers, which record for each \(i\), where or how the best value of \(f[i]\) is obtained. In our case, \(f[i]\) involves a choice between two cases, so \(b[i]\) only needs to be a boolean like this:

\[ b[i] = (f[i] \neq f[i-1])\]

Therefore, \(b[i]=T\) means the best solution of \(f[i]\) is to take \(a_i\), i.e., \(f[i]=f[i-2]+a_i\), and \(b[i]=F\) means the best solution of \(f[i]\) is not to take \(a_i\), i.e., \(f[i]=f[i-1]\). With this backpointers table, we can backtrack from the global problem \(f[n]\) backwards to base cases. This process is just like doing top-down recursion again, but this time each subproblem has a deterministic divide, just like normal divide-n-conquer:

start with \(i=n\) (global)
if \(b[i]=T\), it means \(f[i]=f[i-2]+a_i\), so take \(a_i\), and backtrack to \(i-2\).
if \(b[i]=F\), it means \(f[i]=f[i-1]\), so don’t take \(a_i\), and backtrack to \(i-1\).
until reaching a base case (\(i<1\)).

Here is the complete table for the running example. Note that \(f[i]\) and \(b[i]\) is computed left-to-right (forward pass) while the last row is computed right-to-left (backward pass).

\(i\)	-1	0	1	2	3	4	5	6
\(a_i\)			9	10	8	5	2	4
\(f[i]\)	0	0	9	10	17	17	19	21
\(b[i]\)	0	0	T	T	T	F	T	T
backtrack	base		take		take	not		take

It is crucial that you understand the following points:

\(b[i]=T\) does not mean \(a_i\) is included in the global best solution; it only means \(a_i\) is included in the local best solution (for subproblem \(f[i]\)). For example, \(b[2]=T\) but \(a_2=10\) is not included in the global solution because it’s bypassed in the backtracking.
\(b[i]=F\) means \(a_i\) is not included in the local best solution, but it is also guaranteed that \(a_i\) is not included in the global best solution, because even if the global best solution happens to visit \(f[i]\), we still won’t take \(a_i\) (see \(a_4=5\) in this example).
whether \(a_i\) is included in the global best solution can not be decided in the forward pass, and can only be decided in the backward phase.

Graph Interpretations of MIS and Fibonacci

The DP graph for MIS, which is different from the input graph.

The DP graph for MIS with backpointers (best incoming edge for each node) marked.