Intermediate
Every API call can fail. Networks drop packets, rate limits kick in, services restart, databases timeout. The naive fix is a bare time.sleep(5); retry() — but that retries on every error (including bugs that will never succeed), uses fixed delays (which hammer a struggling server), and has no limit on attempts. The result is a program that hangs forever or spams a broken endpoint. The tenacity library solves all of this with a decorator-based retry system that is both flexible and production-safe.
Install tenacity with pip install tenacity. Its API is a single decorator, @retry, that wraps any function and controls retry behavior through composable strategy objects. You specify WHAT to retry (which exceptions or return values), HOW LONG to wait between attempts (fixed, exponential backoff, random jitter), and WHEN to stop (after N attempts, after N seconds, or a combination). Everything is declarative and testable.
In this article, you will learn to use the @retry decorator with stop conditions, wait strategies, retry predicates, and callbacks. You will see the difference between retrying on exceptions vs. return values, how to add jitter to prevent thundering herd problems, how to log retry attempts, and a real-life example that builds a resilient HTTP client for an unreliable API.
Quick Example: Retry on Exception
The minimal pattern — retry up to 3 times on any exception, with exponential backoff:
# quick_retry.py
import random
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10))
def call_flaky_api(url: str) -> dict:
if random.random() < 0.7: # Simulate 70% failure rate
raise ConnectionError(f"Failed to connect to {url}")
return {"status": "ok", "data": [1, 2, 3]}
try:
result = call_flaky_api("https://api.example.com/data")
print(f"Success: {result}")
except Exception as e:
print(f"All retries failed: {e}")
Output (example run):
Success: {'status': 'ok', 'data': [1, 2, 3]}
The decorator handles all the retry logic -- waiting, counting, and re-raising the final exception if all attempts fail. Your function body stays clean and focused on the happy path. The wait_exponential strategy doubles the wait time on each attempt: 1s, 2s, 4s, up to the max cap.
What Is Tenacity and Why Use It?
Tenacity is the maintained successor to the popular retrying library. It provides a decorator and context-manager API for adding retry logic to any function. The key design principle is composability -- you mix and match stop conditions, wait strategies, and predicates to express exactly the retry policy you need.
| Feature | Manual retry loop | tenacity |
|---|---|---|
| Exponential backoff | Manual calculation | wait_exponential() |
| Jitter | Manual random.uniform() | wait_random_exponential() |
| Multiple stop conditions | Nested if statements | stop_after_attempt | stop_after_delay |
| Retry on specific exception | except SomeError: retry() | retry=retry_if_exception_type() |
| Retry on bad return value | while result != ok: retry() | retry=retry_if_result(predicate) |
| Before/after callbacks | Manual logging | before_sleep=before_sleep_log() |
Stop Conditions
Stop conditions tell tenacity when to give up. You can stop after a number of attempts, after a total elapsed time, or combine conditions with | (stop if either is true) and & (stop only if both are true).
# stop_conditions.py
from tenacity import (retry, stop_after_attempt, stop_after_delay,
stop_never, RetryError)
import time
# Stop after 5 attempts
@retry(stop=stop_after_attempt(5))
def max_5_tries():
raise ValueError("always fails")
# Stop after 10 seconds total
@retry(stop=stop_after_delay(10))
def max_10_seconds():
raise ConnectionError("always fails")
# Stop after 3 attempts OR 5 seconds -- whichever comes first
@retry(stop=(stop_after_attempt(3) | stop_after_delay(5)))
def combined_stop():
raise IOError("always fails")
# Test: should fail after 3 attempts
try:
combined_stop()
except RetryError as e:
last_exc = e.last_attempt.exception()
print(f"Gave up after retries. Last error: {last_exc}")
Output:
Gave up after retries. Last error: always fails
When all retries are exhausted, tenacity raises a RetryError (not the original exception) unless you pass reraise=True. With reraise=True, the original exception from the last attempt is re-raised, which is usually what you want in production code so your error handling sees the actual error type.
Wait Strategies
The wait strategy controls the delay between retry attempts. Choosing the right strategy is important: too short and you hammer a struggling server, too long and your users wait unnecessarily. Adding jitter (randomness) is critical in distributed systems to prevent the "thundering herd" problem where many clients retry simultaneously.
# wait_strategies.py
from tenacity import (retry, stop_after_attempt, wait_fixed,
wait_exponential, wait_random, wait_random_exponential)
# Fixed: always wait 2 seconds
@retry(stop=stop_after_attempt(3), wait=wait_fixed(2))
def fixed_wait():
raise ConnectionError("fail")
# Exponential: 1s, 2s, 4s, 8s... up to 60s max
@retry(stop=stop_after_attempt(6), wait=wait_exponential(multiplier=1, min=1, max=60))
def exponential_wait():
raise ConnectionError("fail")
# Random jitter: wait 0-3 seconds randomly
@retry(stop=stop_after_attempt(3), wait=wait_random(min=0, max=3))
def random_wait():
raise ConnectionError("fail")
# Random exponential (RECOMMENDED for production API calls):
# Exponential base with added randomness -- prevents thundering herd
@retry(stop=stop_after_attempt(5),
wait=wait_random_exponential(multiplier=1, max=60))
def production_ready():
raise ConnectionError("fail")
# Show the wait times tenacity would use (without executing)
from tenacity import Retrying
for attempt in Retrying(stop=stop_after_attempt(5),
wait=wait_random_exponential(multiplier=1, max=60)):
with attempt:
pass # Just show timing
if not attempt.retry_state.outcome.failed:
break
wait_time = attempt.retry_state.outcome.exception().__class__.__name__
print(f"Attempt {attempt.retry_state.attempt_number}: would wait ~{attempt.retry_state.next_action.sleep:.1f}s")
Output (example -- values vary due to randomness):
Attempt 1: would wait ~0.7s
Attempt 2: would wait ~2.1s
Attempt 3: would wait ~5.4s
Attempt 4: would wait ~12.8s
Retry Predicates: What to Retry
By default, tenacity retries on any exception. Often you need more specific behavior: retry only on network errors (not on validation errors that will never succeed), or retry when a function returns a specific "not ready" value.
# retry_predicates.py
import requests
from tenacity import (retry, stop_after_attempt, wait_exponential,
retry_if_exception_type, retry_if_result,
retry_if_not_result)
# Only retry on network-related exceptions, not on ValueError
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, max=10),
retry=retry_if_exception_type((ConnectionError, TimeoutError))
)
def network_only_retry(url: str):
# This would NOT be retried (ValueError is not in the tuple)
if "bad" in url:
raise ValueError("Invalid URL format")
# This WOULD be retried (ConnectionError IS in the tuple)
raise ConnectionError("Network down")
# Retry when the function returns None or a falsy value
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=0.5, max=5),
retry=retry_if_result(lambda result: result is None)
)
def fetch_job_result(job_id: str):
"""Poll an async job until it returns a result."""
import random
# Simulate a job that takes a few polls to complete
if random.random() < 0.8:
return None # Job still running -- will retry
return {"job_id": job_id, "status": "complete", "output": 42}
# Combine: retry on exception OR on None result
from tenacity import retry_if_exception_type, retry_if_result
@retry(
stop=stop_after_attempt(5),
retry=(retry_if_exception_type(ConnectionError) |
retry_if_result(lambda r: r is None))
)
def robust_fetch(url: str):
import random
r = random.random()
if r < 0.3:
raise ConnectionError("timeout")
if r < 0.6:
return None # Not ready yet
return {"data": "success"}
result = robust_fetch("https://api.jsonplaceholder.typicode.com/todos/1")
print(f"Got result: {result}")
Output:
Got result: {'data': 'success'}
The | operator combines predicates with OR logic. When you combine an exception predicate with a result predicate, tenacity retries if either condition is true. This pattern is perfect for polling APIs that return status codes or None until a job completes.
Logging and Callbacks
In production, silent retries are a debugging nightmare. Tenacity provides before_sleep and after callbacks so you can log every retry attempt with the wait time, attempt number, and exception details.
# retry_callbacks.py
import logging
from tenacity import (retry, stop_after_attempt, wait_exponential,
before_sleep_log, after_log, RetryCallState)
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
logger = logging.getLogger(__name__)
def custom_before_sleep(retry_state: RetryCallState):
logger.warning(
f"Retry #{retry_state.attempt_number} for {retry_state.fn.__name__}() "
f"-- sleeping {retry_state.next_action.sleep:.1f}s "
f"after: {retry_state.outcome.exception()}"
)
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=1, min=1, max=8),
before_sleep=custom_before_sleep,
reraise=True
)
def flaky_database_write(record: dict) -> bool:
import random
if random.random() < 0.75:
raise TimeoutError("DB write timeout after 30s")
return True
try:
flaky_database_write({"id": 42, "name": "test"})
print("Write succeeded")
except TimeoutError as e:
print(f"Failed after all retries: {e}")
Output (example):
WARNING: Retry #1 for flaky_database_write() -- sleeping 1.0s after: DB write timeout after 30s
WARNING: Retry #2 for flaky_database_write() -- sleeping 2.0s after: DB write timeout after 30s
Write succeeded
Note the reraise=True -- without it, tenacity wraps the final exception in RetryError. With it, the original TimeoutError propagates to your caller, which is the right behavior for most production code where the caller needs to handle specific exception types.
Real-Life Example: Resilient HTTP API Client
# resilient_client.py
import requests
import logging
from tenacity import (retry, stop_after_attempt, wait_random_exponential,
retry_if_exception_type, retry_if_result,
before_sleep_log, RetryError)
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO, format='%(levelname)s %(message)s')
RETRYABLE_STATUS = {429, 500, 502, 503, 504}
class APIError(Exception):
pass
class RateLimitError(APIError):
pass
def should_retry_response(response):
"""Retry on specific HTTP status codes."""
return response is not None and response.status_code in RETRYABLE_STATUS
@retry(
stop=stop_after_attempt(5),
wait=wait_random_exponential(multiplier=1, max=30),
retry=(
retry_if_exception_type((requests.ConnectionError, requests.Timeout)) |
retry_if_result(should_retry_response)
),
before_sleep=before_sleep_log(logger, logging.WARNING),
reraise=True
)
def api_get(url: str, params: dict = None) -> dict:
"""Make a GET request with retry logic."""
resp = requests.get(url, params=params, timeout=10)
if resp.status_code in RETRYABLE_STATUS:
return resp # Return the response object -- retry_if_result handles it
resp.raise_for_status() # Raise on 4xx/5xx (not retried)
return resp.json()
# Usage
BASE = "https://jsonplaceholder.typicode.com"
try:
user = api_get(f"{BASE}/users/1")
print(f"User: {user['name']} ({user['email']})")
todos = api_get(f"{BASE}/todos", params={"userId": 1, "_limit": 3})
print(f"First 3 todos for user 1:")
for t in todos:
status = "done" if t["completed"] else "pending"
print(f" [{status}] {t['title'][:50]}")
except RetryError as e:
print(f"All retries exhausted: {e}")
except requests.HTTPError as e:
print(f"HTTP error (not retried): {e}")
Output:
User: Leanne Graham (Sincere@april.biz)
First 3 todos for user 1:
[done] delectus aut autem
[done] quis ut nam facilis et officia qui
[done] fugiat veniam minus
This client retries on network errors and specific HTTP status codes (5xx, 429 rate limit) but NOT on 4xx client errors (bad request, unauthorized, not found) which indicate bugs that retrying won't fix. The wait_random_exponential adds jitter to prevent synchronized retries across multiple client instances.
Frequently Asked Questions
Should I use reraise=True?
In most production code, yes. Without reraise=True, tenacity wraps the final exception in a RetryError, and your callers must catch RetryError instead of the original exception type. With reraise=True, the last exception propagates directly, which integrates cleanly with existing error handling. Use the default (no reraise) only when you want to distinguish "all retries failed" from "raised on first attempt".
Does tenacity support async functions?
Yes. Use @retry directly on async def functions -- tenacity detects async functions automatically and uses asyncio.sleep between retries instead of time.sleep. This means your async retry code does not block the event loop during wait periods, making it safe to use in FastAPI, aiohttp, and other async frameworks.
When should I use the Retrying context manager instead of the decorator?
Use the context manager when you need to retry a block of code rather than a function, or when retry parameters need to be dynamic (computed at runtime). For example: for attempt in Retrying(stop=stop_after_attempt(3)): with attempt: risky_operation(). The Retrying class also gives you access to attempt.retry_state for detailed control.
What does the multiplier parameter do in wait_exponential?
The multiplier scales the exponential formula: wait time = multiplier * 2^(attempt - 1). With multiplier=1: 1s, 2s, 4s, 8s. With multiplier=2: 2s, 4s, 8s, 16s. The min and max parameters clamp the result so the first wait is at least min seconds and never exceeds max seconds.
Is tenacity a circuit breaker?
No -- tenacity is a retry library, not a circuit breaker. It retries independently on each function call. A circuit breaker tracks failure rates across many calls over time and stops sending requests to a failing service entirely (opens the circuit) until it recovers. For circuit breaker functionality in Python, look at the pybreaker library. In practice, combining tenacity (for per-call retries) with pybreaker (for service-level protection) gives you the best of both.
Conclusion
Tenacity makes retry logic declarative and composable. You have seen how to use stop_after_attempt and stop_after_delay to limit retries, wait_exponential and wait_random_exponential for backoff strategies, retry_if_exception_type and retry_if_result to control what gets retried, and the before_sleep callback to log retry events. Combined, these tools let you express sophisticated retry policies without writing manual retry loops.
For a next step, add circuit breaker logic with pybreaker and integrate with structlog for structured retry logging. The official tenacity documentation is at tenacity.readthedocs.io.