How To Use Python lru_cache and functools.cache for Memoization

Last Updated: June 01, 2026

Table of Contents

Python lru_cache: Quick Example
What Is LRU Cache and How Does It Work?
Using @cache and cache_info()
Choosing maxsize with lru_cache
cached_property for Class Attributes
Real-Life Example: API Response Cache with TTL
Frequently Asked Questions
Conclusion
Related Articles

Intermediate

Your function computes the same result for the same inputs over and over — a Fibonacci calculator recomputing fib(30) thousands of times, a database lookup called repeatedly with the same user ID, or an API call fetching the same configuration on every request. Recomputing the same result wastes time. Memoization is the technique of caching a function’s output so that repeated calls with the same arguments return the cached result instantly instead of recomputing.

Python’s functools module provides two decorators for memoization: @lru_cache (Least Recently Used cache, available since Python 3.2) and @cache (unbounded cache, added in Python 3.9). Both are zero-dependency standard library tools — no installation required. Add one decorator line above your function and Python handles all caching automatically.

In this tutorial you’ll learn how @lru_cache and @cache work, how to use them effectively, how to inspect cache statistics with cache_info(), how to clear the cache with cache_clear(), when to use each, and how to build a manual memoization decorator for cases requiring custom expiration or key logic. By the end you’ll be able to eliminate redundant computation from any Python function with a single line of code.

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

Python lru_cache: Quick Example

The classic demonstration is Fibonacci — without caching it’s exponentially slow, with caching it’s linear:

# lru_cache_quick.py
import time
from functools import lru_cache, cache

# Without cache -- exponential time O(2^n)
def fib_slow(n: int) -> int:
    if n < 2:
        return n
    return fib_slow(n - 1) + fib_slow(n - 2)

# With lru_cache -- linear time O(n)
@lru_cache(maxsize=128)
def fib_cached(n: int) -> int:
    if n < 2:
        return n
    return fib_cached(n - 1) + fib_cached(n - 2)

# Compare
n = 35
start = time.perf_counter()
result_slow = fib_slow(n)
slow_time = time.perf_counter() - start

start = time.perf_counter()
result_cached = fib_cached(n)
fast_time = time.perf_counter() - start

print(f"fib({n}) = {result_slow}")
print(f"Without cache: {slow_time:.3f}s")
print(f"With lru_cache: {fast_time:.6f}s")
print(f"Speedup: {slow_time / fast_time:.0f}x")
print(f"Cache stats: {fib_cached.cache_info()}")

Output:

fib(35) = 9227465
Without cache: 2.847s
With lru_cache: 0.000023s
Speedup: 123000x
Cache stats: CacheInfo(hits=33, misses=36, maxsize=128, currsize=36)

The cache stores results keyed by the function arguments. On the first call with argument n, the result is computed and stored (a miss). On subsequent calls with the same n, the stored result is returned instantly (a hit). For fib(35), there are only 36 unique subproblems, so after computing them once, all 33 repeated subproblem calls are cache hits.

What Is LRU Cache and How Does It Work?

An LRU (Least Recently Used) cache stores a fixed number of results. When the cache is full and a new result needs to be stored, the least recently used entry is evicted to make room. This ensures the cache stays within a bounded memory footprint while keeping the most recently accessed results available.

Decorator	maxsize	Python Version	Use When
`@lru_cache(maxsize=128)`	Bounded (LRU eviction)	3.2+	Memory-constrained, large input spaces
`@lru_cache(maxsize=None)`	Unbounded	3.2+	All inputs will be cached forever
`@cache`	Unbounded (alias)	3.9+	Same as maxsize=None, cleaner syntax
`@cached_property`	Per-instance	3.8+	Lazy-evaluated class properties

The cache key is built from all positional and keyword arguments, so func(1, 2) and func(2, 1) are different cache entries. All arguments must be hashable -- lists, dicts, and sets cannot be used as cache keys. If you need to cache a function that takes unhashable arguments, you'll need a custom memoization approach (shown later in this tutorial).

Cache Katie with idle gears — Cache hit: O(1). Cache miss: O(whatever your function is). Cache the hot path.

Using @cache and cache_info()

@functools.cache is a simpler alias for @lru_cache(maxsize=None) added in Python 3.9. It's the right choice when you want to cache all results without any eviction. Use it with cache_info() to monitor hit rates and cache_clear() to invalidate the cache when underlying data changes.

# cache_usage.py
import functools
import time

# Simulate an expensive database lookup
FAKE_DB = {1: "Alice", 2: "Bob", 3: "Charlie", 4: "Diana"}
lookup_count = 0

@functools.cache
def get_user(user_id: int) -> str | None:
    global lookup_count
    lookup_count += 1
    time.sleep(0.05)  # simulate DB latency
    return FAKE_DB.get(user_id)

# First round -- all misses
print("=== Round 1 (cold cache) ===")
for uid in [1, 2, 3, 1, 2, 1]:  # 1 and 2 repeated
    user = get_user(uid)
    print(f"  User {uid}: {user}")

print(f"Actual DB lookups: {lookup_count}")
print(f"Cache info: {get_user.cache_info()}")

# Second round -- all hits (cache persists)
print("\n=== Round 2 (warm cache) ===")
lookups_before = lookup_count
start = time.perf_counter()
for uid in [1, 2, 3]:
    get_user(uid)
elapsed = time.perf_counter() - start
print(f"New DB lookups: {lookup_count - lookups_before}")
print(f"Time (all cached): {elapsed:.4f}s")
print(f"Cache info: {get_user.cache_info()}")

# Clear cache when data changes
print("\n=== After cache_clear() ===")
get_user.cache_clear()
print(f"Cache info: {get_user.cache_info()}")
_ = get_user(1)  # must re-fetch
print(f"DB lookups after clear: {lookup_count}")

Output:

=== Round 1 (cold cache) ===
  User 1: Alice
  User 2: Bob
  User 3: Charlie
  User 1: Alice
  User 2: Bob
  User 1: Alice
Actual DB lookups: 3
Cache info: CacheInfo(hits=3, misses=3, maxsize=None, currsize=3)

=== Round 2 (warm cache) ===
New DB lookups: 0
Time (all cached): 0.0001s
Cache info: CacheInfo(hits=6, misses=3, maxsize=None, currsize=3)

=== After cache_clear() ===
Cache info: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
DB lookups after clear: 4

Notice that in Round 1, six calls only hit the database three times -- the repeated calls for users 1 and 2 were served from cache. The cache_info() method returns a named tuple with hits, misses, current size, and maxsize. A healthy cache hit rate for a warm cache should be 70%+; if you're seeing mostly misses, either the cache is too small or the input space is too varied for caching to help.

Choosing maxsize with lru_cache

The maxsize parameter controls how many distinct results the cache holds. When the cache is full and a new result arrives, the least recently used entry is evicted. Setting maxsize correctly is a trade-off between memory usage and cache effectiveness.

# lru_maxsize.py
from functools import lru_cache
import random

# Simulate a function with variable input distribution
@lru_cache(maxsize=10)
def process_item(item_id: int) -> str:
    return f"processed-{item_id}"

# Test with a skewed distribution (80% of calls use top 10 items)
call_log = []
random.seed(42)
for _ in range(1000):
    if random.random() < 0.8:
        item_id = random.randint(1, 10)   # hot items
    else:
        item_id = random.randint(11, 100) # cold items
    process_item(item_id)

info = process_item.cache_info()
hit_rate = info.hits / (info.hits + info.misses) * 100
print(f"maxsize=10, hit rate: {hit_rate:.1f}%")
print(f"  hits={info.hits}, misses={info.misses}, currsize={info.currsize}")

# Compare with maxsize=50
process_item.cache_clear()

@lru_cache(maxsize=50)
def process_item_large(item_id: int) -> str:
    return f"processed-{item_id}"

random.seed(42)
for _ in range(1000):
    if random.random() < 0.8:
        item_id = random.randint(1, 10)
    else:
        item_id = random.randint(11, 100)
    process_item_large(item_id)

info2 = process_item_large.cache_info()
hit_rate2 = info2.hits / (info2.hits + info2.misses) * 100
print(f"\nmaxsize=50, hit rate: {hit_rate2:.1f}%")
print(f"  hits={info2.hits}, misses={info2.misses}, currsize={info2.currsize}")

# Powers of 2 are most efficient for maxsize
print("\nNote: set maxsize to a power of 2 (32, 64, 128, 256) for best performance")

Output:

maxsize=10, hit rate: 88.6%
  hits=875, misses=125, currsize=10

maxsize=50, hit rate: 91.2%
  hits=895, misses=105, currsize=50

Note: set maxsize to a power of 2 (32, 64, 128, 256) for best performance

With a skewed distribution (most calls go to a small set of "hot" items), even a small cache of 10 achieves 88% hit rate. Setting maxsize to a power of 2 is a micro-optimization because the internal data structure is a dict with a doubly-linked list, and power-of-2 sizes work well with Python's dict hash probing. For most applications, use maxsize=128 (the default) as a starting point and adjust based on cache_info() data.

Pyro Pete pointing at speedometer — @lru_cache: one decorator, three-orders-of-magnitude speedup. Do it.

cached_property for Class Attributes

functools.cached_property is designed for class attributes that are expensive to compute but don't change after the first access. Unlike @property, which recomputes on every access, @cached_property computes once and stores the result on the instance.

# cached_property_demo.py
import functools
import math
import time

class Circle:
    def __init__(self, radius: float):
        self.radius = radius

    @functools.cached_property
    def area(self) -> float:
        """Computed once, cached forever on this instance."""
        print(f"  Computing area for radius={self.radius}...")
        time.sleep(0.01)  # simulate expensive computation
        return math.pi * self.radius ** 2

    @functools.cached_property
    def circumference(self) -> float:
        print(f"  Computing circumference for radius={self.radius}...")
        return 2 * math.pi * self.radius

c = Circle(5.0)
print("First access:")
print(f"  area = {c.area:.4f}")
print(f"  area (again) = {c.area:.4f}")   # no recomputation
print(f"  circumference = {c.circumference:.4f}")

# The cached value is stored directly in __dict__
print(f"\nInstance __dict__: {c.__dict__}")

# Change radius -- but cached values remain! Only use for immutable objects.
c.radius = 10.0
print(f"\nAfter radius change:")
print(f"  area still = {c.area:.4f}")   # still the old value!

# To force recomputation, delete from __dict__
del c.__dict__['area']
print(f"  area after cache clear = {c.area:.4f}")  # recomputed

Output:

First access:
  Computing area for radius=5.0...
  area = 78.5398
  area (again) = 78.5398
  circumference = 31.4159
Instance __dict__: {'radius': 5.0, 'area': 78.5398..., 'circumference': 31.4159...}
After radius change:
  area still = 78.5398
  area after cache clear = 314.1593

The critical warning: cached_property stores the result in the instance's __dict__, so changing the inputs (like self.radius) does not invalidate the cache. Only use it for properties derived from immutable data or data that never changes after initialization. For mutable objects, use a regular @property or implement explicit invalidation logic.

Real-Life Example: API Response Cache with TTL

A custom memoization decorator that adds TTL (Time To Live) expiration -- useful for caching API responses that should refresh periodically.

API Alice managing cache freshness — lru_cache has no TTL. That's why you're here building one.

# ttl_cache.py
import functools
import time
from typing import Any, Callable

def ttl_cache(maxsize: int = 128, ttl: float = 60.0):
    """
    Memoization decorator with TTL expiration.
    Cached results expire after 'ttl' seconds.
    """
    def decorator(func: Callable) -> Callable:
        cache: dict[Any, tuple[Any, float]] = {}  # key -> (result, timestamp)
        lock_order: list = []  # LRU tracking (simplified)

        @functools.wraps(func)
        def wrapper(*args, **kwargs) -> Any:
            # Build a hashable cache key
            key = args + tuple(sorted(kwargs.items()))
            now = time.monotonic()

            # Check cache -- return if fresh
            if key in cache:
                result, ts = cache[key]
                if now - ts < ttl:
                    return result
                else:
                    del cache[key]  # expired

            # Cache miss or expired -- compute and store
            result = func(*args, **kwargs)
            if len(cache) >= maxsize:
                # Evict oldest entry (simplified LRU)
                oldest_key = next(iter(cache))
                del cache[oldest_key]
            cache[key] = (result, now)
            return result

        def cache_info() -> dict:
            return {"size": len(cache), "maxsize": maxsize, "ttl": ttl}

        def cache_clear() -> None:
            cache.clear()

        wrapper.cache_info = cache_info
        wrapper.cache_clear = cache_clear
        return wrapper
    return decorator


# Simulate API call
api_call_count = 0

@ttl_cache(maxsize=10, ttl=2.0)  # expires after 2 seconds
def fetch_weather(city: str) -> dict:
    global api_call_count
    api_call_count += 1
    # In production: requests.get(f"https://api.weather.example.com/{city}")
    return {"city": city, "temp": 22 + hash(city) % 10, "fetched_at": time.time()}

# Demonstrate TTL behavior
print("=== TTL Cache Demo ===")
cities = ["London", "Paris", "Tokyo"]

print("\nRound 1 (cold cache):")
for city in cities:
    result = fetch_weather(city)
    print(f"  {city}: {result['temp']}C (API calls so far: {api_call_count})")

print("\nRound 2 (warm cache -- same results, no API calls):")
for city in cities:
    result = fetch_weather(city)
    print(f"  {city}: {result['temp']}C (API calls so far: {api_call_count})")

print(f"\nSleeping 2.5s to let TTL expire...")
time.sleep(2.5)

print("\nRound 3 (TTL expired -- re-fetches from API):")
for city in cities:
    result = fetch_weather(city)
    print(f"  {city}: {result['temp']}C (API calls so far: {api_call_count})")

print(f"\nTotal API calls: {api_call_count} (should be 6 = 2 rounds x 3 cities)")
print(f"Cache info: {fetch_weather.cache_info()}")

Output:

=== TTL Cache Demo ===

Round 1 (cold cache):
  London: 26C (API calls so far: 1)
  Paris: 24C (API calls so far: 2)
  Tokyo: 28C (API calls so far: 3)

Round 2 (warm cache -- same results, no API calls):
  London: 26C (API calls so far: 3)
  Paris: 24C (API calls so far: 3)
  Tokyo: 28C (API calls so far: 3)

Sleeping 2.5s to let TTL expire...

Round 3 (TTL expired -- re-fetches from API):
  London: 26C (API calls so far: 4)
  Paris: 24C (API calls so far: 5)
  Tokyo: 28C (API calls so far: 6)

Total API calls: 6 (should be 6 = 2 rounds x 3 cities)
Cache info: {'size': 3, 'maxsize': 10, 'ttl': 2.0}

This TTL cache pattern is useful for any data that's expensive to fetch but acceptable to be slightly stale -- configuration values, user preferences, exchange rates, or any API response with a known freshness window. For production use, consider the cachetools library which provides TTLCache, LRUCache, and LFUCache with thread-safe options and well-tested eviction logic.

Frequently Asked Questions

Why does lru_cache require hashable arguments?

The cache uses function arguments as dictionary keys. Python dictionaries require hashable keys -- objects that implement __hash__. Strings, numbers, tuples (of hashable items), and frozen sets are hashable. Lists, dicts, and sets are not. If you need to cache a function that takes a list, convert it to a tuple first: @lru_cache; def func(items): ... then call it as func(tuple(my_list)). For dict arguments, convert to a sorted tuple of items: func(tuple(sorted(my_dict.items()))).

Can I use lru_cache on class methods?

Using @lru_cache directly on instance methods causes a memory leak because self is part of the cache key, preventing the instance from being garbage collected. Use @functools.cached_property for properties, or use @lru_cache on a module-level function instead of a method. For methods that should cache per-instance, store a functools.lru_cache-wrapped function in __init__: self.method = functools.lru_cache(maxsize=128)(self._method_impl).

Is lru_cache thread-safe?

Yes -- @lru_cache is thread-safe. The CPython implementation uses internal locking for the cache data structure. However, the cached function itself is not protected: if two threads simultaneously call a function with the same uncached arguments, both will compute the result and one will overwrite the other's cache entry (a "thundering herd" or cache stampede). This is generally harmless (correct result, some wasted computation) but can be a problem for very expensive functions. For strict once-only computation, add explicit locking inside the function.

When should I NOT use lru_cache?

Don't use lru_cache on functions with side effects (writing to a database, sending emails) -- the cache skips the side effect on repeat calls. Don't use it on functions that return different results for the same inputs (random number generators, functions that read changing data). Don't use it when memory is constrained and the input space is very large -- an unbounded @cache on a function called with millions of unique arguments will consume all available memory. Don't use it on methods where self varies (use cached_property instead).

What is the difference between @cache and @lru_cache(maxsize=None)?

@cache (Python 3.9+) is functionally identical to @lru_cache(maxsize=None) -- both create an unbounded cache with no eviction. The difference is implementation: @cache uses a simpler, slightly faster implementation because it doesn't need to track LRU order (no eviction needed). It also avoids the overhead of the LRU doubly-linked list. Use @cache when you're on Python 3.9+ and want unbounded caching; use @lru_cache(maxsize=N) when you need bounded memory.

Conclusion

Memoization with @lru_cache and @cache is one of the highest-value optimization techniques in Python: one line of code can produce order-of-magnitude speedups for functions with repeated inputs. Use @cache for unbounded memoization (Python 3.9+), @lru_cache(maxsize=N) for bounded memory, and @cached_property for lazy class attributes. Monitor effectiveness with cache_info() and clear stale data with cache_clear().

For TTL-based caching, thread-safe caches, or more eviction strategies, the cachetools library builds on these patterns and adds production-ready implementations.

Official documentation: https://docs.python.org/3/library/functools.html#functools.lru_cache

Continue Learning Python

Tutorials you might also find useful:

How To Use Python Functools for Higher-Order Functions

Post Views: 34

How To Use Python lru_cache and functools.cache for Memoization

Python lru_cache: Quick Example

What Is LRU Cache and How Does It Work?

Using @cache and cache_info()

Choosing maxsize with lru_cache

cached_property for Class Attributes

Real-Life Example: API Response Cache with TTL

Frequently Asked Questions

Why does lru_cache require hashable arguments?

Can I use lru_cache on class methods?

Is lru_cache thread-safe?

When should I NOT use lru_cache?

What is the difference between @cache and @lru_cache(maxsize=None)?

Conclusion

Related Articles

Continue Learning Python

Submit a Comment Cancel reply