Intermediate

Python’s built-in functools.lru_cache is great for memoizing pure functions, but it has one fixed eviction strategy (Least Recently Used) and no expiry. Once an entry is cached, it stays until the cache fills up or the process restarts. For most scripts that is fine. For a web application that calls a slow external API, caches database query results, or serves pricing data that changes every few minutes, you need caches that expire old data, evict entries by frequency rather than recency, or give you a bounded size you can tune at runtime.

cachetools is a small third-party library (install with pip install cachetools) that provides four ready-made cache classes — LRU, LFU, TTL, and RR — each as a plain Python MutableMapping you can use as a dict, a decorator via cachetools.cached(), or a drop-in replacement for the standard library’s @lru_cache. It also ships thread-safe and async-safe variants of every cache type.

This article walks through each cache type with concrete code examples, explains which eviction strategy to choose for which scenario, shows how to use cachetools as a decorator, covers thread-safe caching in multi-threaded servers, and closes with a real-life example caching OpenWeather API responses with TTL expiry. By the end you will have a caching toolkit that covers every common scenario.

cachetools Quick Example

Here is a function that calls a slow external API, cached with a TTL so results expire after 60 seconds:

# quick_cachetools.py
import time
import cachetools
import cachetools.func

@cachetools.func.ttl_cache(maxsize=128, ttl=60)
def get_user_profile(user_id: int) -> dict:
    """Fetch user profile -- cached for 60 seconds per user_id."""
    print(f"  Fetching user {user_id} from API...")
    time.sleep(0.5)   # simulate network delay
    return {"id": user_id, "name": f"User {user_id}", "plan": "pro"}

print("First call (cache miss):")
print(get_user_profile(42))

print("\nSecond call (cache hit -- no fetch):")
print(get_user_profile(42))

print("\nDifferent user (cache miss):")
print(get_user_profile(99))

Output:

First call (cache miss):
  Fetching user 42 from API...
{'id': 42, 'name': 'User 42', 'plan': 'pro'}

Second call (cache hit -- no fetch):
{'id': 42, 'name': 'User 42', 'plan': 'pro'}

Different user (cache miss):
  Fetching user 99 from API...
{'id': 99, 'name': 'User 99', 'plan': 'pro'}

The second call for user 42 returns instantly from cache. After 60 seconds, the next call for user 42 will hit the API again automatically. Each unique user_id is a separate cache key. You get expiry, bounded size, and memoization in one decorator — no manual cache management.

Cache Types: LRU, LFU, TTL, and RR

Choosing the right eviction strategy matters because it determines which entries survive under memory pressure. Here is the full comparison:

CacheEvictsBest forClass
LRULeast Recently UsedGeneral memoization, recent-access patternsLRUCache
LFULeast Frequently UsedPopular-item retention (top-N queries)LFUCache
TTLEntries older than TTL secondsExternal API data, time-sensitive resultsTTLCache
RRRandom entryUniform access patterns, simple bounded cacheRRCache

LRU is the right default if you have no other information. Use LFU when you know some keys are requested far more often than others (e.g., top-10 products on an e-commerce site) and you want those to stay cached even if other keys were accessed more recently. Use TTL whenever the underlying data changes on a known schedule. Use RR only if you need a very simple bounded dict and do not care about eviction quality.

LRU and LFU Caches

Both LRUCache and LFUCache are plain Python mappings — you use them like dicts, or wrap them with the @cached() decorator:

# lru_lfu_demo.py
from cachetools import LRUCache, LFUCache, cached

# --- LRU as a decorator ---
lru = LRUCache(maxsize=3)

@cached(cache=lru)
def compute_square(n: int) -> int:
    print(f"  Computing {n}^2...")
    return n * n

for x in [1, 2, 3, 1, 4]:   # 4 causes eviction (cache full after 1,2,3)
    print(f"square({x}) = {compute_square(x)}")

print(f"\nLRU cache now holds: {list(lru.keys())}")

# --- LFU as a plain dict ---
lfu = LFUCache(maxsize=3)
lfu["a"] = 1
lfu["b"] = 2
lfu["c"] = 3
_ = lfu["a"]   # access 'a' twice so it is most frequent
_ = lfu["a"]
_ = lfu["b"]   # access 'b' once
# Adding 'd' should evict 'c' (frequency=0, least frequent)
lfu["d"] = 4
print("LFU keys after inserting 'd':", list(lfu.keys()))

Output:

  Computing 1^2...
square(1) = 1
  Computing 2^2...
square(2) = 4
  Computing 3^2...
square(3) = 9
square(1) = 1          <-- cache hit
  Computing 4^2...     <-- 4 inserted, 2 evicted (LRU)
square(4) = 16

LRU cache now holds: [3, 1, 4]
LFU keys after inserting 'd': ['a', 'b', 'd']   # 'c' evicted (never accessed)

The @cached(cache=lru) pattern separates the cache instance from the decorated function, which means you can inspect, resize, or clear the cache by name — unlike @lru_cache where the cache is hidden inside the function’s closure. compute_square.cache_info() is not available, but you can check len(lru), lru.currsize, and lru.maxsize directly.

TTL Cache for Time-Sensitive Data

TTL (Time To Live) cache entries expire automatically after a set number of seconds. This is the right choice whenever your data has a known freshness window — exchange rates, weather data, feature flags, session tokens:

# ttl_demo.py
import time
from cachetools import TTLCache, cached

ttl = TTLCache(maxsize=256, ttl=5)   # entries expire after 5 seconds

@cached(cache=ttl)
def fetch_exchange_rate(currency: str) -> float:
    print(f"  Hitting API for {currency}...")
    # In production: return requests.get(f"https://api.exchangerate.host/latest?base={currency}").json()["rates"]["AUD"]
    rates = {"USD": 1.58, "EUR": 1.72, "GBP": 2.01}
    return rates.get(currency, 1.0)

print("First call:")
print(fetch_exchange_rate("USD"))   # cache miss

print("\nImmediate second call:")
print(fetch_exchange_rate("USD"))   # cache hit

print("\nWaiting 6 seconds for TTL expiry...")
time.sleep(6)

print("Call after expiry:")
print(fetch_exchange_rate("USD"))   # cache miss again -- TTL expired

Output:

First call:
  Hitting API for USD...
1.58

Immediate second call:
1.58

Waiting 6 seconds for TTL expiry...
Call after expiry:
  Hitting API for USD...
1.58

The TTL clock starts when the entry is inserted, not when it was last accessed. This means even a highly popular key expires on schedule — exactly what you want for data that becomes stale by age, not by usage. If you want expiry-on-last-access semantics, you will need to implement a custom __getitem__ that refreshes the TTL on each read, or reconsider whether LRU is actually the right choice for your use case.

Thread-Safe Caching

All cachetools cache classes are not thread-safe by default — if two threads simultaneously insert or evict entries, you can get corrupted state. For multi-threaded servers (Flask with threads, Django, FastAPI with sync endpoints), use the thread-safe @cached(cache, lock=RLock()) pattern:

# thread_safe_demo.py
import threading
from cachetools import TTLCache, cached

# Thread-safe TTL cache using an RLock
cache = TTLCache(maxsize=512, ttl=120)
lock = threading.RLock()

@cached(cache=cache, lock=lock)
def get_user_permissions(user_id: int) -> list[str]:
    print(f"  Loading permissions for user {user_id} from DB...")
    # Simulate a slow DB query
    permissions_db = {
        1: ["read", "write", "admin"],
        2: ["read"],
        3: ["read", "write"],
    }
    return permissions_db.get(user_id, [])

# Simulate concurrent requests from multiple threads
def simulate_request(user_id: int):
    perms = get_user_permissions(user_id)
    print(f"Thread {threading.current_thread().name}: user {user_id} -> {perms}")

threads = [threading.Thread(target=simulate_request, args=(1,), name=f"T{i}") for i in range(3)]
for t in threads:
    t.start()
for t in threads:
    t.join()

Output:

  Loading permissions for user 1 from DB...
Thread T0: user 1 -> ['read', 'write', 'admin']
Thread T1: user 1 -> ['read', 'write', 'admin']
Thread T2: user 1 -> ['read', 'write', 'admin']

Only one DB query fires despite three threads requesting the same data. The RLock serialises cache lookups and inserts so no two threads modify the cache simultaneously. Use threading.RLock() rather than threading.Lock() to allow the same thread to re-enter the lock if the cached function is called recursively.

Real-Life Example: Caching OpenWeather API Responses

Here is a complete weather client that caches current conditions for 10 minutes and the 5-day forecast for 30 minutes — matching OpenWeather’s update frequency so you never pay for stale data:

# weather_client.py
import threading
import time
from cachetools import TTLCache, cached
import urllib.request
import json

# Separate caches for current vs forecast (different TTLs)
current_cache = TTLCache(maxsize=100, ttl=600)    # 10 min
forecast_cache = TTLCache(maxsize=50, ttl=1800)   # 30 min
current_lock = threading.RLock()
forecast_lock = threading.RLock()

BASE_URL = "https://api.openweathermap.org/data/2.5"

# Replace with your free key from https://openweathermap.org/api
API_KEY = "YOUR_OPENWEATHER_API_KEY"

@cached(cache=current_cache, lock=current_lock)
def get_current_weather(city: str) -> dict:
    url = f"{BASE_URL}/weather?q={city}&appid={API_KEY}&units=metric"
    with urllib.request.urlopen(url) as resp:
        data = json.loads(resp.read())
    return {
        "city": data["name"],
        "temp_c": data["main"]["temp"],
        "feels_like": data["main"]["feels_like"],
        "description": data["weather"][0]["description"],
        "humidity": data["main"]["humidity"],
    }

@cached(cache=forecast_cache, lock=forecast_lock)
def get_forecast(city: str) -> list[dict]:
    url = f"{BASE_URL}/forecast?q={city}&appid={API_KEY}&units=metric&cnt=5"
    with urllib.request.urlopen(url) as resp:
        data = json.loads(resp.read())
    return [
        {
            "time": item["dt_txt"],
            "temp_c": item["main"]["temp"],
            "description": item["weather"][0]["description"],
        }
        for item in data["list"]
    ]

def weather_report(city: str) -> None:
    print(f"\n=== Weather Report: {city} ===")
    current = get_current_weather(city)
    print(f"Now: {current['temp_c']:.1f}C, {current['description']}, "
          f"humidity {current['humidity']}%")
    forecast = get_forecast(city)
    print("Forecast (next 5 periods):")
    for slot in forecast:
        print(f"  {slot['time']}: {slot['temp_c']:.1f}C, {slot['description']}")
    print(f"Cache sizes: current={current_cache.currsize}, "
          f"forecast={forecast_cache.currsize}")

if __name__ == "__main__":
    weather_report("Sydney")
    print("\n-- Second call (cache hit) --")
    start = time.monotonic()
    weather_report("Sydney")
    elapsed = time.monotonic() - start
    print(f"Second call took {elapsed:.3f}s (should be near 0)")

Calling weather_report("Sydney") a second time returns instantly from both caches, with no network requests. After 10 minutes the current weather cache expires and the next call re-fetches, while the forecast stays cached for another 20 minutes. To extend this, add a LFUCache for city lookups (geocoding) which rarely changes — you want popular cities to stay cached indefinitely while obscure ones get evicted.

Frequently Asked Questions

When should I use cachetools instead of functools.lru_cache?

Use functools.lru_cache when you need simple memoization with no expiry and the cache lives for the entire process lifetime. Switch to cachetools when you need time-based expiry (TTL), frequency-based eviction (LFU), thread-safe access with an explicit lock, per-instance caching on class methods (lru_cache binds to the instance and leaks memory), or a cache you can introspect and clear by name. cachetools gives you explicit control; lru_cache is more convenient but less flexible.

How do I choose a maxsize?

Start with a size proportional to your memory budget divided by average entry size. A dict with a few string keys and int values is roughly 200-500 bytes. A cache holding API response dicts with 20 keys might be 2-5KB per entry. For a 10MB cache budget and 2KB entries, maxsize=5000 is reasonable. Monitor cache.currsize and eviction rates in production, then tune up or down. When in doubt, set maxsize to the number of unique keys you realistically expect to see per TTL window.

How do I cache class instance methods?

Do not use a module-level cache for instance methods — the instance itself will be the first argument (self) and will be held in the cache forever, preventing garbage collection. Instead, create the cache and lock as instance attributes in __init__ and use cachetools.cached(self._cache, lock=self._lock) as a per-instance decorator. Alternatively, use methodtools.lru_cache (a separate small library) which handles instance method caching correctly.

Does cachetools work with async functions?

The standard @cached() decorator does not work with async functions because it calls the wrapped function synchronously. For async caching, use asyncache (a companion library) which provides @acached() and @alru_cache() decorators. They use the same cachetools cache classes and locks, but await the underlying coroutine correctly and use an asyncio.Lock instead of a threading lock.

How does cachetools generate cache keys from arguments?

By default, @cached() uses a hash of all positional and keyword arguments — the same strategy as lru_cache. Arguments must be hashable (strings, ints, tuples, frozensets — not lists or dicts). If your function accepts unhashable arguments like dicts or lists, pass a custom key function: @cached(cache, key=lambda d: json.dumps(d, sort_keys=True)). The key function receives the same args and kwargs as the cached function and must return a hashable value.

Conclusion

You now have a complete caching toolkit: LRUCache for general memoization, LFUCache for popular-item retention, TTLCache for time-sensitive external data, thread-safe patterns with RLock, and the @cached() decorator for clean function-level caching. The real-life weather client shows how to combine multiple cache types with different TTLs to match the freshness windows of different data sources — a pattern that directly reduces API costs and latency in any production system.

Extend the weather client by adding an LFUCache for city geocoding (city name to lat/lon) so popular cities like “Sydney” and “London” stay cached forever while obscure ones are evicted when memory pressure grows. Check the full cachetools API at https://cachetools.readthedocs.io/ for the full list of cache classes and the keys(), values(), and eviction callback options.