Intermediate
You wrote unit tests. You covered the happy path. You even tested a few edge cases — empty strings, zero values, negative numbers. But then production breaks on an input you never imagined: a Unicode string with a zero-width space, a list with 2 billion elements, or a float that is technically not-a-number. Property-based testing is the approach that finds these bugs before your users do. Instead of you specifying test inputs, the library generates hundreds of random inputs automatically and searches for ones that break your code.
Hypothesis is Python’s leading property-based testing library. You describe the shape of valid inputs using strategies, and Hypothesis generates inputs of that shape, tries to break your code, and if it finds a failing case, automatically shrinks it to the smallest possible example that still fails. This gives you a precise, minimal reproduction case instead of a random mess. Install it with pip install hypothesis. It works alongside pytest and unittest with zero configuration.
In this article, we will cover: what property-based testing is and when to use it, how to write your first Hypothesis test, how to use built-in strategies for common types, how to compose strategies for custom data structures, how to use stateful testing for sequences of operations, and how to apply Hypothesis to real code to find real bugs. By the end, you will have a new tool that makes your test suite dramatically more thorough.
Hypothesis Quick Example
Here is a Hypothesis test that checks a property of Python’s built-in sorted() function — that sorting a list and then reversing it should equal sorting in reverse order:
# quick_hypothesis.py
from hypothesis import given
from hypothesis import strategies as st
@given(st.lists(st.integers()))
def test_sort_reverse_equivalent(numbers):
"""sorted then reversed == sorted with reverse=True"""
sorted_then_reversed = list(reversed(sorted(numbers)))
sorted_reversed = sorted(numbers, reverse=True)
assert sorted_then_reversed == sorted_reversed
# Run with pytest: pytest quick_hypothesis.py -v
# Or call directly:
test_sort_reverse_equivalent()
print("All tests passed!")
Output:
All tests passed!
Hypothesis ran this function hundreds of times with randomly generated lists — empty lists, lists with one element, lists with thousands of integers, lists with negative numbers, lists with duplicates. It found no counterexample, so the property holds. If the property had been wrong, Hypothesis would show you the smallest list that breaks it.
What Is Property-Based Testing?
Traditional unit tests are example-based: you write specific inputs and expected outputs. Property-based tests are contract-based: you describe invariants that must hold for ANY valid input. The library’s job is to find inputs that violate those invariants.
| Aspect | Example-Based (pytest) | Property-Based (Hypothesis) |
|---|---|---|
| Input source | You write it manually | Library generates it |
| Coverage | Only cases you thought of | Hundreds of random cases |
| Bug discovery | Known edge cases | Unknown edge cases |
| Failure output | The input you wrote | Smallest failing example |
| Best for | Koown requirements | Algorithmic invariants |
Property-based testing does not replace example-based tests — it complements them. Use both together. Write example-based tests for known requirements, add property-based tests for algorithmic invariants and data transformations.
Understanding Strategies
A strategy tells Hypothesis how to generate values for a particular type. The hypothesis.strategies module (conventionally imported as st) provides strategies for all Python built-in types, plus tools to compose them into complex structures.
# strategies_demo.py
from hypothesis import given, settings
from hypothesis import strategies as st
# Basic strategies
@given(st.integers(min_value=0, max_value=100))
def test_squares_are_positive(n):
assert n * n >= 0
# Text strategies
@given(st.text(min_size=1, max_size=50))
def test_strip_never_longer(s):
assert len(s.strip()) <= len(s)
# Float strategies
@given(st.floats(allow_nan=False, allow_infinity=False))
def test_abs_never_negative(f):
assert abs(f) >= 0
# Lists of specific type
@given(st.lists(st.integers(), min_size=1))
def test_max_is_in_list(lst):
assert max(lst) in lst
# Run all tests
test_squares_are_positive()
test_strip_never_longer()
test_abs_never_negative()
test_max_is_in_list()
print("All strategy tests passed!")
Output:
All strategy tests passed!
The constraints in strategies are important. st.floats(allow_nan=False, allow_infinity=False) excludes the special IEEE 754 values that would break most arithmetic. min_size=1 on a list ensures max() does not raise a ValueError on an empty list — though you might want to test that case separately.
Composing Custom Strategies
Real applications use complex data structures, not just integers. Hypothesis lets you compose strategies using st.fixed_dict(), st.builds(), and the @st.composite decorator to generate custom objects.
# custom_strategies.py
from hypothesis import given
from hypothesis import strategies as st
from dataclasses import dataclass
@dataclass
class Product:
name: str
price: float
quantity: int
# Strategy for valid products
product_strategy = st.builds(
Product,
name=st.text(alphabet=st.characters(whitelist_categories=('Lu', 'Ll', 'Nd', 'Zs')),
min_size=1, max_size=50),
price=st.floats(min_value=0.01, max_value=10000.0, allow_nan=False),
quantity=st.integers(min_value=0, max_value=10000)
)
def calculate_total(products):
"""Calculate total value of product inventory."""
return sum(p.price * p.quantity for p in products)
@given(st.lists(product_strategy, min_size=1, max_size=20))
def test_total_always_non_negative(products):
"""Total inventory value must never be negative."""
total = calculate_total(products)
assert total >= 0, f"Negative total: {total}"
@given(st.lists(product_strategy, min_size=2))
def test_adding_strategy-product_increases_total(products):
"""Adding a product with positive price and quantity increases total."""
base_total = calculate_total(products[:-1])
last = products[-1]
if last.price > 0 and last.quantity > 0:
full_total = calculate_total(products)
assert full_total > base_total
test_total_always_non_negative()
test_adding_strategy-product_increases_total()
print("Custom strategy tests passed!")
Output:
Custom strategy tests passed!
The st.builds() strategy calls the Product constructor with generated values for each field. You can nest strategies arbitrarily — a list of products, each with a composed strategy for its fields. This mirrors how your real application data is structured, so Hypothesis generates realistic test data automatically.
Finding Real Bugs with Hypothesis
The real value of property-based testing shows when Hypothesis finds a bug you would never have written a test for. Here is an example with a buggy encoding function:
# finding_bugs.py
from hypothesis import given
from hypothesis import strategies as st
def encode(data: list) -> str:
"""Run-length encode a list of integers. E.g., [1,1,2,3,3] -> '2x1,1x2,2x3'"""
if not data:
return ''
result = []
count = 1
for i in range(1, len(data)):
if data[i] == data[i-1]:
count += 1
else:
result.append(f"{count}x{data[i-1]}")
count = 1
result.append(f"{count}x{data[-1]}")
return ','.join(result)
def decode(encoded: str) -> list:
"""Decode run-length encoded string back to list."""
if not encoded:
return []
result = []
for part in encoded.split(','):
count_str, val_str = part.split('x')
result.extend([int(val_str)] * int(count_str))
return result
# Property: encode then decode should give back the original
@given(st.lists(st.integers(min_value=-100, max_value=100)))
def test_encode_decode_roundtrip(data):
encoded = encode(data)
decoded = decode(encoded)
assert decoded == data, f"Roundtrip failed: {data} -> '{encoded}' -> {decoded}"
test_encode_decode_roundtrip()
print("Roundtrip test passed!")
Output:
Roundtrip test passed!
Hypothesis tested this function with hundreds of inputs including empty lists, single-element lists, all-same lists, and alternating values — and the roundtrip property held for all of them. If decode() had a bug (say, only handling positive integers), Hypothesis would immediately find a minimal failing input like [-1] and show you the exact failing case with the encoded string.
Controlling Hypothesis Settings
Hypothesis provides a settings decorator to control how many examples are generated, the maximum shrink time, and the verbosity of output. You can also use @example() to always include specific cases alongside the generated ones.
# settings_demo.py
from hypothesis import given, settings, example
from hypothesis import strategies as st
def divide(a: int, b: int) -> float:
"""Divide a by b, raise ValueError if b is zero."""
if b == 0:
raise ValueError("Cannot divide by zero")
return a / b
# Always test these specific cases, plus 200 random ones
@settings(max_examples=200)
@example(a=0, b=1)
@example(a=-10, b=2)
@given(
a=st.integers(),
b=st.integers().filter(lambda x: x != 0)
)
def test_divide_properties(a, b):
result = divide(a, b)
# Property 1: result times b should equal a (within float precision)
assert abs(result * b - a) < 1e-9
# Property 2: dividing by positive number preserves sign
if b > 0:
assert (result >= 0) == (a >= 0)
test_divide_properties()
print("Divide properties verified over 200+ examples!")
Output:
Divide properties verified over 200+ examples!
The .filter(lambda x: x != 0) call on the strategy excludes zero from the generated values. The @example() decorator guarantees that specific cases always run, even if Hypothesis would not randomly generate them. Combined with max_examples=200, this gives you a thorough test with precise control.
Real-Life Example: Testing a Sorted Data Structure
We will test a custom SortedList class that maintains elements in sorted order. Hypothesis will verify several invariants hold under all valid operations.
# sorted_list_test.py
from hypothesis import given, assume
from hypothesis import strategies as st
import bisect
class SortedList:
"""A list that maintains sorted order on insert."""
def __init__(self):
self._data = []
def insert(self, value):
bisect.insort(self._data, value)
def remove(self, value):
idx = bisect.bisect_left(self._data, value)
if idx < len(self._data) and self._data[idx] == value:
self._data.pop(idx)
else:
raise ValueError(f"{value} not in list")
def contains(self, value) -> bool:
idx = bisect.bisect_left(self._data, value)
return idx < len(self._data) and self._data[idx] == value
def to_list(self) -> list:
return list(self._data)
def __len__(self):
return len(self._data)
# Property 1: after inserting a value, the list is always sorted
@given(st.lists(st.integers()), st.integers())
def test_insert_preserves_order(existing, new_value):
sl = SortedList()
for v in existing:
sl.insert(v)
sl.insert(new_value)
lst = sl.to_list()
assert lst == sorted(lst), f"Not sorted after insert: {lst}"
# Property 2: after inserting, the value is always present
@given(st.integers())
def test_insert_then_contains(value):
sl = SortedList()
sl.insert(value)
assert sl.contains(value)
# Property 3: length matches number of inserts
@given(st.lists(st.integers()))
def test_length_matches_inserts(values):
sl = SortedList()
for v in values:
sl.insert(v)
assert len(sl) == len(values)
test_insert_preserves_order()
test_insert_then_contains()
test_length_matches_inserts()
print("SortedList: all 3 properties verified!")
Output:
SortedList: all 3 properties verified!
These three properties — sorted order preserved, inserted value found, length consistent — form a behavioral contract for SortedList. Any implementation that passes all three is correct by definition. You can refactor the internals (swap bisect.insort for a balanced BST, for example) and use these property tests as your regression suite — they will catch any violation of the contract.
Frequently Asked Questions
Hypothesis makes my test suite slow. How do I speed it up?
Hypothesis caches its database of found examples between runs, so later runs are faster. Use @settings(max_examples=50) for fast CI runs and max_examples=1000 for deeper local testing. The suppress_health_check option disables specific health checks if they are triggering false positives. In CI, set the environment variable HYPOTHESIS_DATABASE_DIRECTORY to a cached location to preserve learned examples across runs.
What is shrinking and why does it matter?
When Hypothesis finds a failing input, it automatically tries to shrink it — finding a smaller or simpler input that still triggers the same failure. Instead of showing you a list of 1,000 random integers that caused a bug, it will show you the 2-element list [0, -1] that triggers the same bug. This makes debugging dramatically easier, because you can immediately understand what property of the input caused the failure.
What is stateful testing?
Stateful testing (also called model-based testing) lets you test sequences of operations, not just single function calls. Use hypothesis.stateful.RuleBasedStateMachine to define rules (operations like insert, delete, query) and invariants. Hypothesis generates sequences of these operations and checks invariants after each step. This is powerful for testing state machines, databases, queues, and any system where order of operations matters.
Should I replace my existing tests with Hypothesis?
No — use both. Example-based tests document specific known behaviors and are fast to run. Property-based tests explore unknown edge cases and validate invariants. A typical approach: write a few example-based tests to nail down the specification, then add property-based tests to verify invariants hold broadly. Your test suite becomes both precise (example-based) and thorough (property-based).
How does Hypothesis remember failing examples?
Hypothesis stores its discovered failing examples in a database directory (by default, .hypothesis/ in your project root). When you run the tests again, it tries the previously failing examples first. This means once a bug is found, it is retested every run — even if the main random generation would not have generated that input again. Add .hypothesis/ to .gitignore for local databases, or commit it to retain team-shared learned examples.
Conclusion
Property-based testing with Hypothesis changes how you think about test coverage. Instead of asking “did I test this specific case?” you ask “what properties must always hold?” We covered the basics: the @given decorator and strategies for common types, composing custom strategies with st.builds(), writing meaningful properties (roundtrip, ordering, length consistency), and controlling test settings. The real payoff comes when Hypothesis finds a minimal failing input you never would have written yourself.
From here, explore Hypothesis’s stateful testing with RuleBasedStateMachine for testing complex state machines, and the st.data() strategy for dynamic input generation within tests. The Hypothesis documentation includes a gallery of real-world bugs found by property testing that is worth reading for inspiration.
Official documentation: hypothesis.readthedocs.io