How to use caching in Python

Kader Miyanyedi
9 min readFeb 28, 2023

--

Hi guys! Usually, we look for ways to improve the performance of our apps and make them faster. In this article, we will talk about how we can have faster applications with caching in Python.

Throughout the article, we will talk about the following topics:

  1. What is the cache and when should we use it?
  2. Cache with Dictionary data structure
  3. Cache with built-in @lru_cache decorator
  4. Cache with built-in @cache decorator
  5. Cache with built-in @cached_property decorator

✨ What is the cache and when should we use it?

Let’s imagine a school system. Every time we click on a course, details about the course (lecture contents, exams, etc.) will be pulled from the database and this process will take time. Considering that thousands of students are looking at the same course at the same time, we can predict that the transaction cost will increase.
Lesson details are not constantly changing data, it is unnecessary to fetch the data from the database every time. Well, can we reuse the same data after the first viewing of the lesson detail?

Basically, the concept of caching refers to keeping data in a temporary area outside the source and calling it from there. This will be faster than fetching the data from the source. (Otherwise, it doesn’t make sense to use cache.)

So, when should we use the cache? In general, caching is most effective when:

  • The data being cached is frequently accessed and does not change often.
  • Retrieving the data from the cache is much faster than retrieving it from the original source.
  • The cost of storing the data in the cache is lower than the cost of retrieving it from the original source.

❗️❗️It’s important to note that caching is not always the best solution, and in some cases, it may not be appropriate at all. For example, if the data being cached is constantly changing or is not accessed very frequently, it may not be worth the effort to implement caching. In a situation where data is constantly changing, the use of cache memory may cause incorrect data to be displayed to the user.

Now that we have learned the concept of cache, we can look at how to use caching in Python.

✨ Cache with Dictionary data structure

There are multiple methods for caching in Python. You can create your own cache structure using the Dictionary data structure. Reading data in a Dictionary data structure is very fast and has O(n) time complexity.

Let’s write a Fibonacci code and look at the cache system we have written using the dictionary data structure.

def fibonacci(n):
if n < 2:
return n
result = fibonacci(n-1) + fibonacci(n-2)
return result

fib_result = fibonacci(25)
print(f"Result: {fib_result}")

When we ran the code without cache, it took 12 milliseconds. And let’s see the result with the cache system:

cache = {0:0, 1:1}

def fibonacci(n):
if n in cache:
return cache[n]
result = fibonacci(n-1) + fibonacci(n-2)
cache[n] = result
return result

fib_result = fibonacci(25)
print(f"Result: {fib_result}")

The code execution was completed in 0.007 milliseconds. We can definitely say that it is faster.

✨ Cache with built-in @lru_cache decorator

The lru_cache decorator is a built-in decorator available as of Python 3.4 that uses the memoization technique in the functools module. It helps to reduce the execution of the function for the same inputs.

✏️ ️Memoization is a caching technique and once the function runs, it keeps the result in memory. Without having to run the function multiple times with the same inputs, it fetches the result from memory and improves performance.

@lru_cache(maxsize=<max_size>, typed=True/False)

The lru_cache decorator has two optional parameters:

  • maxsize: Refers to the maximum number of data to be kept in the Cache. Also, you can set the parameter to None. If set to None, the cache kept all values and increases indefinitely. This will make problems if many values are cached.
  • typed: Specifies whether to keep a separate cache for variables of different types. If set to true, it caches different variable types separately. (Data types such as str and int can be cached differently even if typed is false.)

Let’s rewrite the Fibonacci example with the lru_cache decorator and see the result.

from functools import lru_cache

@lru_cache(maxsize=32)
def fibonacci(n):
if n < 2:
return n
result = fibonacci(n-1) + fibonacci(n-2)
return result

fib_result = fibonacci(25)
print(f"Result: {fib_result}")

The code execution was completed in 0.024 milliseconds. It is faster than the non-cache Fibonacci example.

✏️✏️ Notes on the @lru_cache decorator

  • Should be used when we want to reuse a calculated result with the same inputs.
  • Should not be used in functions that create different objects in each call.
  • Caching mutable functions like time() or random() doesn’t necessary.
  • Only works for one python process.
  • Specifically, useful in recursive functions or CPU-dependent operations.
  • Calling the function with different orderings of variables can result in extra cache entries. For example, f(a=1,b=2) and f(b=2,a=1) can have two separate cache entries.

❗❗ Be careful when using @lru_cache in instance methods

We can create the cache mechanism using the lru_cache decorator in a class method. But until the program life cycle ends, the class instance reference holds in the cache system and takes up memory space. These instances, which are never deleted from memory, can cause memory leaks. Let’s examine this better with an example.

import functools

class Example:
def __init__(self, number):
self.number = number

@functools.lru_cache(maxsize=None)
def sum_of_squares(self):
return sum([num**2 for num in range(self.number+1)])

start = time.time()
example = Example(1000)
result = example.sum_of_squares()
end = time.time()
print(f"Result: {result} Time: {(end-start)*1000} milliseconds")

start = time.time()
result = example.sum_of_squares()
end = time.time()
print(f"Result: {result} Time: {(end-start)*1000} milliseconds")

When we run the program, we can see that the lru_cache decorator works as expected. However, when we examine the program in a python interactive terminal, we can see that even if the class object is deleted, a cache is still kept in the memory.

When we look at the class with the cache_info() method, it shows that the cache keeps a reference to the instance until it is cleared. Considering, we set maxsize=None so this would keep the cache forever. In this case, the memory starts to fill up and a memory leak occurs an inevitable problem.

Well, how can we solve this problem?
To solve this problem, we will need to make the cache system local. This will basically be an assignment operation in the class body and that way it will depend on the class and not the instance itself. The reference to the cached instance will be cleared with the instance and will not take up unnecessary memory space.

class Example:
def __init__(self, number):
self.number = number
self.sum_of_square = functools.lru_cache(maxsize=None)(self._sum_of_squares_uncached)

def _sum_of_squares_uncached(self):
return sum([num**2 for num in range(self.number+1)])

Here I don’t need to clear the cache manually. I just need to call gc.collect() to start explicit garbage collection. Here we need the gc.collect() function to clear the memory because of cyclical references. But you don’t need to call manually gc.collect() function in the real applications because it works in the background without you having to do it.

Class methods(@classmethod) or static methods(@staticmethod) are not affected by this issue. In these methods, the cache is local to the class, not the instance. Therefore, you can use the lru_cache decorator directly as usual.

@classmethod
@functools.lru_cache(maxsize=None)
def sum_of_squares(cls):
return sum([num**2 for num in range(cls.number+1)])

@staticmethod
@functools.lru_cache(maxsize=None)
def foo(a,b):
return a + b

✨ Cache with built-in @cache decorator

Another method for using cache in Python is the @cache decorator.
It uses the Memoize technique and returns the same value as @lru_cache(maxsize=None). This decorator is smaller and faster than a @lru_cache decorator with a max_size limit because it doesn’t need to extract old values.

// Syntax
@cache
def x():
pass
import time
from functools import cache

@cache
def fibonacci(n):
if n < 2:
return n
result = fibonacci(n-1) + fibonacci(n-2)
return result

fib_result = fibonacci(25)
print(f"Result: {fib_result})

When we executed our Fibonacci example, the code execution was completed in 0.029 milliseconds. And we can say that it is faster than the non-cache Fibonacci example.

✨ Cache with built-in @cached_property decorator

@cached_property is a decorator that has been in the Django Framework for many years and was added to Python with version 3.8 in October 2019.

The decorator transforms a class method into a property that is calculated only once and then cached as an instance attribute. It is useful for properties that are expensive to calculate, as it allows you to avoid recomputing the property’s value on each access.

Let’s write the class Example, which takes a number when creating an object, and a method that finds the sum of the squares of the numbers up to the received number.

from functools import cached_property

class Example:
def __init__(self, number):
self.number = number

def sum_of_squares(self):
return sum([num**2 for num in range(self.number+1)])

example = Example(1000)
example.sum_of_squares()
example.sum_of_squares()

print(f"Result: {example.sum_of_squares()}")

In this example, in which we called the class’s method 3 times, the code was executed in 0.17 milliseconds. Now let’s look at the result using the cached_property decorator.

from functools import cached_property

class Example:
def __init__(self, number):
self.number = number

@cached_property
def sum_of_squares(self):
return sum([num**2 for num in range(self.number+1)])

example = Example(1000)
example.sum_of_squares
example.sum_of_squares

print(f"Result: {example.sum_of_squares}")

The code was executed in 0.06 milliseconds, faster than without a cache. The cache will be cleared when the class instance is deleted.

✏️ The cached value is stored as an attribute of the instance, so it will be specific to each instance of the class. This means that if you have multiple instances of the class, each instance will have its own cached value for the property.

You see that when calling the sum_of_sequnce method here, it is called like an attribute, not a method. @cached_property is similar to the @property method with added caching.

✏️ The @cached_property and @property decorators work differently. It only works as a read-only method unless a setter is defined for the @property method. So writing/modifying the attribute is disabled. @cached_property methods are not read-only so writing/modifying the attribute is not disabled.

@cached_property PEP-412 interferes with key-sharing dictionaries operations. For this reason, dictionaries take up more space than usual. If you want to use space-efficient key sharing is desired or mutable variables/mapping is not necessary, an effect similar to cached_property() can be achieved by a stacking property() on top of cache():

from functools import cache

class Example:
def __init__(self, number):
self.number = number

@property
@cache
def sum_of_squares(self):
return sum([num**2 for num in range(self.number+1)])

example = Example(1000)
example.sum_of_squares
example.sum_of_squares

print(f"Result: {example.sum_of_squares}")

Caching is an important optimization method that increases the speed of your applications when used correctly and in the right place. In this article, we learned how we can use cache with built-in methods in Python. I hope this has been an enjoyable and useful article. See you in the next post ^^

✨ Resources

[1] Functools in the official Python documentation
[2] Difference between functool’s cache and lru_cache
[3] don’t lru_cache methods!- Anthony explains(Youtube channel)
[4] Don't wrap instance methods with 'functools.lru_cache' decorator in Python

--

--