Intermediate
Your function computes the same result for the same inputs over and over — a Fibonacci calculator recomputing fib(30) thousands of times, a database lookup called repeatedly with the same user ID, or an API call fetching the same configuration on every request. Recomputing the same result wastes time. Memoization is the technique of caching a function’s output so that repeated calls with the same arguments return the cached result instantly instead of recomputing.
Python’s functools module provides two decorators for memoization: @lru_cache (Least Recently Used cache, available since Python 3.2) and @cache (unbounded cache, added in Python 3.9). Both are zero-dependency standard library tools — no installation required. Add one decorator line above your function and Python handles all caching automatically.
In this tutorial you’ll learn how @lru_cache and @cache work, how to use them effectively, how to inspect cache statistics with cache_info(), how to clear the cache with cache_clear(), when to use each, and how to build a manual memoization decorator for cases requiring custom expiration or key logic. By the end you’ll be able to eliminate redundant computation from any Python function with a single line of code.
Python lru_cache: Quick Example
The classic demonstration is Fibonacci — without caching it’s exponentially slow, with caching it’s linear:
# lru_cache_quick.py
import time
from functools import lru_cache, cache
# Without cache -- exponential time O(2^n)
def fib_slow(n: int) -> int:
if n < 2:
return n
return fib_slow(n - 1) + fib_slow(n - 2)
# With lru_cache -- linear time O(n)
@lru_cache(maxsize=128)
def fib_cached(n: int) -> int:
if n < 2:
return n
return fib_cached(n - 1) + fib_cached(n - 2)
# Compare
n = 35
start = time.perf_counter()
result_slow = fib_slow(n)
slow_time = time.perf_counter() - start
start = time.perf_counter()
result_cached = fib_cached(n)
fast_time = time.perf_counter() - start
print(f"fib({n}) = {result_slow}")
print(f"Without cache: {slow_time:.3f}s")
print(f"With lru_cache: {fast_time:.6f}s")
print(f"Speedup: {slow_time / fast_time:.0f}x")
print(f"Cache stats: {fib_cached.cache_info()}")
Output:
fib(35) = 9227465
Without cache: 2.847s
With lru_cache: 0.000023s
Speedup: 123000x
Cache stats: CacheInfo(hits=33, misses=36, maxsize=128, currsize=36)
The cache stores results keyed by the function arguments. On the first call with argument n, the result is computed and stored (a miss). On subsequent calls with the same n, the stored result is returned instantly (a hit). For fib(35), there are only 36 unique subproblems, so after computing them once, all 33 repeated subproblem calls are cache hits.
What Is LRU Cache and How Does It Work?
An LRU (Least Recently Used) cache stores a fixed number of results. When the cache is full and a new result needs to be stored, the least recently used entry is evicted to make room. This ensures the cache stays within a bounded memory footprint while keeping the most recently accessed results available.
| Decorator | maxsize | Python Version | Use When |
|---|---|---|---|
@lru_cache(maxsize=128) |
Bounded (LRU eviction) | 3.2+ | Memory-constrained, large input spaces |
@lru_cache(maxsize=None) |
Unbounded | 3.2+ | All inputs will be cached forever |
@cache |
Unbounded (alias) | 3.9+ | Same as maxsize=None, cleaner syntax |
@cached_property |
Per-instance | 3.8+ | Lazy-evaluated class properties |
The cache key is built from all positional and keyword arguments, so func(1, 2) and func(2, 1) are different cache entries. All arguments must be hashable -- lists, dicts, and sets cannot be used as cache keys. If you need to cache a function that takes unhashable arguments, you'll need a custom memoization approach (shown later in this tutorial).

Using @cache and cache_info()
@functools.cache is a simpler alias for @lru_cache(maxsize=None) added in Python 3.9. It's the right choice when you want to cache all results without any eviction. Use it with cache_info() to monitor hit rates and cache_clear() to invalidate the cache when underlying data changes.
# cache_usage.py
import functools
import time
# Simulate an expensive database lookup
FAKE_DB = {1: "Alice", 2: "Bob", 3: "Charlie", 4: "Diana"}
lookup_count = 0
@functools.cache
def get_user(user_id: int) -> str | None:
global lookup_count
lookup_count += 1
time.sleep(0.05) # simulate DB latency
return FAKE_DB.get(user_id)
# First round -- all misses
print("=== Round 1 (cold cache) ===")
for uid in [1, 2, 3, 1, 2, 1]: # 1 and 2 repeated
user = get_user(uid)
print(f" User {uid}: {user}")
print(f"Actual DB lookups: {lookup_count}")
print(f"Cache info: {get_user.cache_info()}")
# Second round -- all hits (cache persists)
print("\n=== Round 2 (warm cache) ===")
lookups_before = lookup_count
start = time.perf_counter()
for uid in [1, 2, 3]:
get_user(uid)
elapsed = time.perf_counter() - start
print(f"New DB lookups: {lookup_count - lookups_before}")
print(f"Time (all cached): {elapsed:.4f}s")
print(f"Cache info: {get_user.cache_info()}")
# Clear cache when data changes
print("\n=== After cache_clear() ===")
get_user.cache_clear()
print(f"Cache info: {get_user.cache_info()}")
_ = get_user(1) # must re-fetch
print(f"DB lookups after clear: {lookup_count}")
Output:
=== Round 1 (cold cache) ===
User 1: Alice
User 2: Bob
User 3: Charlie
User 1: Alice
User 2: Bob
User 1: Alice
Actual DB lookups: 3
Cache info: CacheInfo(hits=3, misses=3, maxsize=None, currsize=3)
=== Round 2 (warm cache) ===
New DB lookups: 0
Time (all cached): 0.0001s
Cache info: CacheInfo(hits=6, misses=3, maxsize=None, currsize=3)
=== After cache_clear() ===
Cache info: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
DB lookups after clear: 4
Notice that in Round 1, six calls only hit the database three times -- the repeated calls for users 1 and 2 were served from cache. The cache_info() method returns a named tuple with hits, misses, current size, and maxsize. A healthy cache hit rate for a warm cache should be 70%+; if you're seeing mostly misses, either the cache is too small or the input space is too varied for caching to help.
Choosing maxsize with lru_cache
The maxsize parameter controls how many distinct results the cache holds. When the cache is full and a new result arrives, the least recently used entry is evicted. Setting maxsize correctly is a trade-off between memory usage and cache effectiveness.
# lru_maxsize.py
from functools import lru_cache
import random
# Simulate a function with variable input distribution
@lru_cache(maxsize=10)
def process_item(item_id: int) -> str:
return f"processed-{item_id}"
# Test with a skewed distribution (80% of calls use top 10 items)
call_log = []
random.seed(42)
for _ in range(1000):
if random.random() < 0.8:
item_id = random.randint(1, 10) # hot items
else:
item_id = random.randint(11, 100) # cold items
process_item(item_id)
info = process_item.cache_info()
hit_rate = info.hits / (info.hits + info.misses) * 100
print(f"maxsize=10, hit rate: {hit_rate:.1f}%")
print(f" hits={info.hits}, misses={info.misses}, currsize={info.currsize}")
# Compare with maxsize=50
process_item.cache_clear()
@lru_cache(maxsize=50)
def process_item_large(item_id: int) -> str:
return f"processed-{item_id}"
random.seed(42)
for _ in range(1000):
if random.random() < 0.8:
item_id = random.randint(1, 10)
else:
item_id = random.randint(11, 100)
process_item_large(item_id)
info2 = process_item_large.cache_info()
hit_rate2 = info2.hits / (info2.hits + info2.misses) * 100
print(f"\nmaxsize=50, hit rate: {hit_rate2:.1f}%")
print(f" hits={info2.hits}, misses={info2.misses}, currsize={info2.currsize}")
# Powers of 2 are most efficient for maxsize
print("\nNote: set maxsize to a power of 2 (32, 64, 128, 256) for best performance")
Output:
maxsize=10, hit rate: 88.6%
hits=875, misses=125, currsize=10
maxsize=50, hit rate: 91.2%
hits=895, misses=105, currsize=50
Note: set maxsize to a power of 2 (32, 64, 128, 256) for best performance
With a skewed distribution (most calls go to a small set of "hot" items), even a small cache of 10 achieves 88% hit rate. Setting maxsize to a power of 2 is a micro-optimization because the internal data structure is a dict with a doubly-linked list, and power-of-2 sizes work well with Python's dict hash probing. For most applications, use maxsize=128 (the default) as a starting point and adjust based on cache_info() data.

cached_property for Class Attributes
functools.cached_property is designed for class attributes that are expensive to compute but don't change after the first access. Unlike @property, which recomputes on every access, @cached_property computes once and stores the result on the instance.
# cached_property_demo.py
import functools
import math
import time
class Circle:
def __init__(self, radius: float):
self.radius = radius
@functools.cached_property
def area(self) -> float:
"""Computed once, cached forever on this instance."""
print(f" Computing area for radius={self.radius}...")
time.sleep(0.01) # simulate expensive computation
return math.pi * self.radius ** 2
@functools.cached_property
def circumference(self) -> float:
print(f" Computing circumference for radius={self.radius}...")
return 2 * math.pi * self.radius
c = Circle(5.0)
print("First access:")
print(f" area = {c.area:.4f}")
print(f" area (again) = {c.area:.4f}") # no recomputation
print(f" circumference = {c.circumference:.4f}")
# The cached value is stored directly in __dict__
print(f"\nInstance __dict__: {c.__dict__}")
# Change radius -- but cached values remain! Only use for immutable objects.
c.radius = 10.0
print(f"\nAfter radius change:")
print(f" area still = {c.area:.4f}") # still the old value!
# To force recomputation, delete from __dict__
del c.__dict__['area']
print(f" area after cache clear = {c.area:.4f}") # recomputed
Output:
First access:
Computing area for radius=5.0...
area = 78.5398
area (again) = 78.5398
circumference = 31.4159
Instance __dict__: {'radius': 5.0, 'area': 78.5398..., 'circumference': 31.4159...}
After radius change:
area still = 78.5398
area after cache clear = 314.1593
The critical warning: cached_property stores the result in the instance's __dict__, so changing the inputs (like self.radius) does not invalidate the cache. Only use it for properties derived from immutable data or data that never changes after initialization. For mutable objects, use a regular @property or implement explicit invalidation logic.
Real-Life Example: API Response Cache with TTL
A custom memoization decorator that adds TTL (Time To Live) expiration -- useful for caching API responses that should refresh periodically.

# ttl_cache.py
import functools
import time
from typing import Any, Callable
def ttl_cache(maxsize: int = 128, ttl: float = 60.0):
"""
Memoization decorator with TTL expiration.
Cached results expire after 'ttl' seconds.
"""
def decorator(func: Callable) -> Callable:
cache: dict[Any, tuple[Any, float]] = {} # key -> (result, timestamp)
lock_order: list = [] # LRU tracking (simplified)
@functools.wraps(func)
def wrapper(*args, **kwargs) -> Any:
# Build a hashable cache key
key = args + tuple(sorted(kwargs.items()))
now = time.monotonic()
# Check cache -- return if fresh
if key in cache:
result, ts = cache[key]
if now - ts < ttl:
return result
else:
del cache[key] # expired
# Cache miss or expired -- compute and store
result = func(*args, **kwargs)
if len(cache) >= maxsize:
# Evict oldest entry (simplified LRU)
oldest_key = next(iter(cache))
del cache[oldest_key]
cache[key] = (result, now)
return result
def cache_info() -> dict:
return {"size": len(cache), "maxsize": maxsize, "ttl": ttl}
def cache_clear() -> None:
cache.clear()
wrapper.cache_info = cache_info
wrapper.cache_clear = cache_clear
return wrapper
return decorator
# Simulate API call
api_call_count = 0
@ttl_cache(maxsize=10, ttl=2.0) # expires after 2 seconds
def fetch_weather(city: str) -> dict:
global api_call_count
api_call_count += 1
# In production: requests.get(f"https://api.weather.example.com/{city}")
return {"city": city, "temp": 22 + hash(city) % 10, "fetched_at": time.time()}
# Demonstrate TTL behavior
print("=== TTL Cache Demo ===")
cities = ["London", "Paris", "Tokyo"]
print("\nRound 1 (cold cache):")
for city in cities:
result = fetch_weather(city)
print(f" {city}: {result['temp']}C (API calls so far: {api_call_count})")
print("\nRound 2 (warm cache -- same results, no API calls):")
for city in cities:
result = fetch_weather(city)
print(f" {city}: {result['temp']}C (API calls so far: {api_call_count})")
print(f"\nSleeping 2.5s to let TTL expire...")
time.sleep(2.5)
print("\nRound 3 (TTL expired -- re-fetches from API):")
for city in cities:
result = fetch_weather(city)
print(f" {city}: {result['temp']}C (API calls so far: {api_call_count})")
print(f"\nTotal API calls: {api_call_count} (should be 6 = 2 rounds x 3 cities)")
print(f"Cache info: {fetch_weather.cache_info()}")
Output:
=== TTL Cache Demo ===
Round 1 (cold cache):
London: 26C (API calls so far: 1)
Paris: 24C (API calls so far: 2)
Tokyo: 28C (API calls so far: 3)
Round 2 (warm cache -- same results, no API calls):
London: 26C (API calls so far: 3)
Paris: 24C (API calls so far: 3)
Tokyo: 28C (API calls so far: 3)
Sleeping 2.5s to let TTL expire...
Round 3 (TTL expired -- re-fetches from API):
London: 26C (API calls so far: 4)
Paris: 24C (API calls so far: 5)
Tokyo: 28C (API calls so far: 6)
Total API calls: 6 (should be 6 = 2 rounds x 3 cities)
Cache info: {'size': 3, 'maxsize': 10, 'ttl': 2.0}
This TTL cache pattern is useful for any data that's expensive to fetch but acceptable to be slightly stale -- configuration values, user preferences, exchange rates, or any API response with a known freshness window. For production use, consider the cachetools library which provides TTLCache, LRUCache, and LFUCache with thread-safe options and well-tested eviction logic.
Frequently Asked Questions
Why does lru_cache require hashable arguments?
The cache uses function arguments as dictionary keys. Python dictionaries require hashable keys -- objects that implement __hash__. Strings, numbers, tuples (of hashable items), and frozen sets are hashable. Lists, dicts, and sets are not. If you need to cache a function that takes a list, convert it to a tuple first: @lru_cache; def func(items): ... then call it as func(tuple(my_list)). For dict arguments, convert to a sorted tuple of items: func(tuple(sorted(my_dict.items()))).
Can I use lru_cache on class methods?
Using @lru_cache directly on instance methods causes a memory leak because self is part of the cache key, preventing the instance from being garbage collected. Use @functools.cached_property for properties, or use @lru_cache on a module-level function instead of a method. For methods that should cache per-instance, store a functools.lru_cache-wrapped function in __init__: self.method = functools.lru_cache(maxsize=128)(self._method_impl).
Is lru_cache thread-safe?
Yes -- @lru_cache is thread-safe. The CPython implementation uses internal locking for the cache data structure. However, the cached function itself is not protected: if two threads simultaneously call a function with the same uncached arguments, both will compute the result and one will overwrite the other's cache entry (a "thundering herd" or cache stampede). This is generally harmless (correct result, some wasted computation) but can be a problem for very expensive functions. For strict once-only computation, add explicit locking inside the function.
When should I NOT use lru_cache?
Don't use lru_cache on functions with side effects (writing to a database, sending emails) -- the cache skips the side effect on repeat calls. Don't use it on functions that return different results for the same inputs (random number generators, functions that read changing data). Don't use it when memory is constrained and the input space is very large -- an unbounded @cache on a function called with millions of unique arguments will consume all available memory. Don't use it on methods where self varies (use cached_property instead).
What is the difference between @cache and @lru_cache(maxsize=None)?
@cache (Python 3.9+) is functionally identical to @lru_cache(maxsize=None) -- both create an unbounded cache with no eviction. The difference is implementation: @cache uses a simpler, slightly faster implementation because it doesn't need to track LRU order (no eviction needed). It also avoids the overhead of the LRU doubly-linked list. Use @cache when you're on Python 3.9+ and want unbounded caching; use @lru_cache(maxsize=N) when you need bounded memory.
Conclusion
Memoization with @lru_cache and @cache is one of the highest-value optimization techniques in Python: one line of code can produce order-of-magnitude speedups for functions with repeated inputs. Use @cache for unbounded memoization (Python 3.9+), @lru_cache(maxsize=N) for bounded memory, and @cached_property for lazy class attributes. Monitor effectiveness with cache_info() and clear stale data with cache_clear().
For TTL-based caching, thread-safe caches, or more eviction strategies, the cachetools library builds on these patterns and adds production-ready implementations.
Official documentation: https://docs.python.org/3/library/functools.html#functools.lru_cache