Python Memoization

Improving the efficiency of Python Programs Using Memoization

Some of you may be familiar with the Fibonacci sequence, which is famous in both mathematics and computer programming. The sequence is formed by adding together each of the previous two terms to create the next term in the sequence. It has its origins in a the consideration of breeding rabbits, and turn out to have many truly interesting and surprising properties. this article is not specifically about the Fibonacci sequence, but I will add some links to interesting articles you can check out if you would like to learn more about it.

The Fibonacci Sequence begins as follows:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, …

The next number is found by adding up the two numbers before it in the sequence.

So, for example

the 2 is calculated by adding the values 1 and 1 proceeding it, the 3 is calculated by adding the values 1 and 3 proceeding it, and the 5 is calculated by adding the values 2 and 3 1 and 3 proceeding it. and so on!

Sometimes the sequence begins with different initial values, including commonly 1, 1 instead of 0, 1, but the pattern of how each term is constructed remains consistent.

Fibonacci Numbers with Python

The Fibonacci sequence is often used to illustrate the concept of recursion in programming, which is a very powerful technique with many applications. The basic idea is that we break a large problem down into smaller problems of the same type and solve those smaller problems as a means to solving the original problem.

In terms of computer code, it usually takes the form of a function that specifies base case, contains a call to itself (the function calls itself), and whose return values approach closer to the base case with each call.

So for example, the a function to return the nth Fibonacci Number in Python looks like this:

def fib(n):
    """
    Returns the n-th Fibonacci number.
    """
    if n == 0 or n == 1:
        return n
    return fib(n - 1) + fib(n - 2)

print(fib(10))

Improving Time Complexity of Python Programs Using Memoization

Now the above code is all well and good, except…. there is a wee problem.

Try this code, which times the execution of fib(n) for n = 20:

import time

def fib(n):
    """
    Returns the n-th Fibonacci number.
    """
    if n == 0 or n == 1:
        return n
    return fib(n - 1) + fib(n - 2)

n = 20
start = time.perf_counter()
fib(n)
end = time.perf_counter()
print("Seconds taken: {:.7f}".format(end - start))

This code times the execution of fib(n) for n = 20. On my machine I got Seconds taken: 0.0066919

That seems OK, but what about a higher value for n? Say the 40th Fibonacci number?

On my machine I got Seconds taken: 118.2504081

Clearly this is a problem, as we are only considering n = 40 and already the execution time is impractical.

Can you think why the algorithm as it stands takes so long to execute?

The answer lies in the fact that a lot of values are calculated multiple times. For example, using the the notation F1 for the first Fibonacci number, F2 for the second, etc, we can see that

in order to calculate F6, we calculate F4 and F5 and in order to calculate F5, we calculate F3 and F4 and in order to calculate F4, we calculate F2 and F3 etc.

With a relatively small value for n these extra calculations don’t matter much, but we don’t have to go far along the sequence to get to the point where the repeat calculations stack up so much that the execution time becomes impractical.

Can you think of an approach to deal with this problem?

Python Memoization using lru_cache

There is a way to dramatically reduce the execution time of out Fibonacci function but storing previous results. The fancy term for this is memoization. What it means in this case is that once, say, F3 has been calculated, the result is stored in a cache, so that next time the value of F3 is needed, instead of calculating it by adding F1 and F2, the program can access the result directly from the cache. And the same goes for all other previously encountered values of n.

Implementing this idea in Python is very easy if we make use of the tools available to us as you can see in the code below. It makes use of lru_cahce from the functools module. lru in this stands for Least Recently Used A Least Recently Used (LRU) Cache organizes items in order of use, allowing you to quickly identify which item hasn’t been used for the longest amount of time.

Python Memoization Code Listing Fibonacci

from functools import lru_cache
import time

@lru_cache
def fib(n):
    """
    Returns the n-th Fibonacci number.
    """
    if n == 0 or n == 1:
        return n
    return fib(n - 1) + fib(n - 2)

n = 40
start = time.perf_counter()
fib(n)
end = time.perf_counter()
print("Seconds taken: {:.7f}".format(end - start))

On my machine I got Seconds taken: 0.0000443. That is quite some improvement on 118.2504081 seconds for the same value of n!

Manual caching

Although using the “magic” @lru_cache decorator is fine for practical purposes, you may with to gain a deeper understanding of how memoization works by implementing the caching “manually”. For those who wish to do this, I have provided a code listing below showing one way it can be done, by passing acache dictionary along with each function call.

# With manual caching
def fib(n, cache={}):
    if n in cache:
        return cache[n]
    if n == 0 or n == 1:
        return n
    result = fib(n - 1) + fib(n - 2)
    cache[n] = result
    return result

print(fib(40))

There are a great many application for memoization in Python. Once you understand the concept and have some experience of using it, you will have a powerful tool in your programming toolkit to dramatically improve the performance of your code.

Tags: computer science python

Compucademy