Skip to content

Instantly share code, notes, and snippets.

@alexdelorenzo
Last active November 20, 2017 09:44
Show Gist options
  • Save alexdelorenzo/280a0038922fbdd88a6a57e551eb54d1 to your computer and use it in GitHub Desktop.
Save alexdelorenzo/280a0038922fbdd88a6a57e551eb54d1 to your computer and use it in GitHub Desktop.

Memoization

In this article, I'm going to introduce you to an eloquent way to speed up your Python code called memoization.

Memoization finds its root word in memorandum, which means "to be remembered" and is often shortened to memo in English.

When I talk about memoization and Python, I am talking about remembering a function’s output based on its inputs.

In the next few sections, I’ll show you when and how to wield this simple but powerful concept with Python.

Why would we want to use memoization? The answer is expensive code.

When I am analyzing code, I look at it in terms of how long it takes to run and how much memory it uses.

If I’m looking at code that takes a long time to run or uses a lot of memory, I call the code expensive. It’s expensive code because it costs a lot of resources, space and time, to run.

When you run expensive code, it takes resources away from other programs on your machine.

Python is my goto choice to build web applications, scrape the web and interact with databases; all of which involve expensive operations.

When we interact with the network or a database, a long time can pass until the interaction completes. Our application can block, or pause, during that time.

If you want to speed up your Python application, memoization can be a great technique to use.

Let’s take a deeper look at memoization before we get our hands dirty and implement it ourselves!

What is memoization, exactly?

Memoization is a specific type of caching that is used as a software optimization technique.

A cache stores the results of an operation for later use. Your web browser most likely used a cache to load this page faster.

Memoization allows you to optimize a function by caching its output based on the parameters you supply to it. Once you memoize a function, it will only compute its output once for each set of parameters you call it with. Every call after the first will be quickly retrieved from a cache.

Memoization in Python

One of the things I love the most about Python is that the simplicity and beauty of its syntax goes hand in hand with beauty and simplicity of its philosophy.

Python is “batteries included”, which means that Python is bundled with loads of commonly used software which are only an import statement away!

I find functools.lru_cache() to be one of the most useful tools included with Python. This is the language’s easy to use memoization implementation.

Once you recognize when to use lru_cache(), you can quickly speed up your application with just a few lines of code.

Let’s cut our teeth on our own version of a memoization function in Python. I promise that you will have fun and come out with better understanding of memoization!

In [141]: def memoize(func):                                  	         
		 cache = dict()                                               

		 @functools.wraps(func)  # preserve information about ‘func()’
		 def memoized_func(*args):     
		     if args in cache:
			 return cache[args]

		     result = func(*args)
		     cache[args] = result

		     return result                  	

		 return memoized_func  

memoize() is a decorator, which is a function that takes another function as an input and has a function as its output.

Here we use a Python dictionary as a cache. In Python, using a key to look-up a value in a dictionary is very inexpensive.

In the new memoized function, memoized_func(), we check if the parameters are already in the cache. If they are, then the cached result is returned. Instead of computing the result, we quickly return it from the cache. Bam, memoization.

If the result isn’t in the cache, we compute it, store it in the cache and then return the result.

Let’s test this out on a recursive Fibonacci sequence function.

In [172]: def fibonacci(n):
		 if n == 0:
		     return 0

		 elif n == 1:
		     return 1

		 return fibonacci(n-1) + fibonacci(n-2)


In [173]: %timeit fibonacci(35)
6.2 s ± 0 ns per loop

On my machine, it takes six seconds to compute the 35th number in the Fibonacci sequence. That’s a pretty slow and expensive operation.

Let’s memoize it with our memoization function.

   In [174]: memoized_fibonacci = memoize(fibonacci)
   In [175]: %timeit memoized_fibonacci(35)
   6.12 s ± 0 ns per loop

The memoized function still takes six seconds to return on the first run.

Let’s run it a second time.

In [176]: %timeit memoized_fibonacci(35)
2.08 µs ± 0 ns per loop

Only 2 microseconds!

Now that we’ve seen how easy it is to implement a memoization function ourselves, I’ll show you how easy it is to use Python’s functools.lru_cache().

In [183]: from functools import lru_cache
In [184]: @lru_cache(maxsize=None)
	     def fibonacci(n):
		 if n == 0:
		     return 0

		 elif n == 1:
		     return 1

		 return fibonacci(n-1) + fibonacci(n-2)
In [185]: %timeit fibonacci(35)
72.9 µs ± 0 ns per loop

Python’s implementation is much more complex than our ad hoc memoize function, as you can see in the source.

In [199]: fibonacci.cache_info()
Out[199]: CacheInfo(hits=34, misses=36, maxsize=None, currsize=36)

When using Python's decorator syntax, recursive calls to fibonacci() are memoized. When we look at the cache information for the memoized function, we can see why it is faster than our version on the first run: the cache is hit 34 times.

You should never need to roll your own memoizing function, because Python’s lru_cache() is readily-available, more comprehensive and battle-tested.

What can be memoized?

Ideally, you will want to memoize functions that are deterministic.

 In [114]: def deterministic_adder(x: int, y: int) -> int:
		 return x + y

Here deterministic_adder() is a deterministic function because it will always return the same result for the same pair of parameters. For example, if you pass 2 and 3 into the function, it will always return 5.

In [129]:  from datetime import datetime

           def nondeterministic_adder(x: int, y: int) -> int:
		 # Check to see if today is Monday (weekday 0)
		 if datetime.now().weekday() == 0:
		      return x + y + x

		 return x + y

This function is nondeterministic because its output for a given input will vary depending on the day of the week.
If we run this function on Monday, the cache will return stale data any other day of the week.

I find that any function that updates a record or grabs information that changes over time is a poor choice to memoize.

Memoization: Quick Summary

  • Memoization is a software optimization technique that stores and return the result of a function call based on its parameters.
  • If your code meets a certain criteria, memoization can be a great method to speed up your application.
  • You can import a comprehensive memoization function, lru_cache(), from Python’s standard library in functools.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment