Intermediate
You have a Pydantic model with a dozen fields, nested relationships, and validators. Now you need to write ten different tests — some with valid data, some with edge cases, some with specific fields set to particular values. You write a factory function. Then another. Then a helper. By the time you have covered all your test scenarios, the test setup code is longer than the code being tested. There is a better way. polyfactory generates valid, type-safe test data from your existing models automatically — no separate factory classes, no repeated boilerplate, no manually wired constructors.
polyfactory reads your Pydantic models, Python dataclasses, attrs classes, and typed dicts, then generates instances that satisfy all field types and validators. Install it with pip install polyfactory. If your models use Pydantic v2, that is all you need. polyfactory supports attrs and dataclasses without any extra dependencies.
This article covers generating test data from Pydantic models and dataclasses, customizing generated values, building batch fixtures, using polyfactory with pytest, handling nested models and relationships, and knowing when to use polyfactory versus Faker. By the end you will be able to replace your hand-written factory functions with a single-line polyfactory declaration that stays in sync with your models automatically.
polyfactory Quick Example
The core pattern: subclass ModelFactory with your model as the generic type, then call .build() to get a valid instance:
# quick_polyfactory.py
from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory
class User(BaseModel):
id: int
username: str
email: str
age: int
is_active: bool
class UserFactory(ModelFactory):
__model__ = User
# Generate a single User with random valid data
user = UserFactory.build()
print(user)
# Generate with specific field overrides
admin = UserFactory.build(username="admin", is_active=True)
print(admin)
Output:
id=7432 username='WbHkQeTzFn' email='user@example.com' age=23 is_active=False
id=1891 username='admin' is_active=True email='test@example.com' age=41
polyfactory introspects the User model’s type annotations and generates a value for each field: a random integer for id and age, a random string for username, and so on. The .build() call accepts keyword arguments that override specific fields while leaving the rest auto-generated. All generated instances pass Pydantic validation because polyfactory respects your field types and constraints.
The sections below show how to customize generation, handle nested models, generate batches, and integrate with pytest fixtures.
What Is polyfactory and When Should You Use It?
polyfactory is a test data generation library that bridges the gap between your data models and your test fixtures. It reads a model’s type annotations, validators, and field constraints, then generates random but valid instances. This means your factories never go stale — if you add a required field to a model, polyfactory automatically generates data for it without any changes to your test code.
| Tool | Best for | Syncs with models? | Customizable? |
|---|---|---|---|
| polyfactory | Type-safe instances from existing models | Yes — reads annotations directly | Yes — field overrides + custom providers |
| Faker | Realistic fake data (names, emails, addresses) | No — manual wiring required | Yes — 150+ providers |
| factory_boy | Django ORM test factories, sequences | Partial (Django models only) | Yes — lazy attributes, sequences |
| Hypothesis | Property-based testing, edge case discovery | No — strategy-based | Yes — strategies + composable |
Use polyfactory when you want instances of your existing Pydantic or dataclass models with minimal boilerplate. Use Faker when you need data that looks realistic — real names, valid-format email addresses, plausible street addresses. Use Hypothesis when you want to find edge cases automatically by generating thousands of inputs and checking invariants.
Customizing Generated Values
polyfactory generates random data by default. You can lock specific fields to fixed values, use sequences, or provide custom callables for any field on the factory class:
# customizing.py
from datetime import date
from pydantic import BaseModel, EmailStr
from polyfactory.factories.pydantic_factory import ModelFactory
class Order(BaseModel):
order_id: int
customer_email: str
product_name: str
quantity: int
unit_price: float
order_date: date
shipped: bool
class OrderFactory(ModelFactory):
__model__ = Order
# Lock a field to a fixed value
shipped = False
# Use a sequence for unique IDs
order_id = ModelFactory.__random_int
# Custom callable for realistic-looking values
@classmethod
def product_name(cls) -> str:
return cls.__faker__.word().capitalize() + " Pro"
@classmethod
def unit_price(cls) -> float:
return round(cls.__faker__.pyfloat(min_value=9.99, max_value=999.99), 2)
# Build several orders
for _ in range(3):
order = OrderFactory.build()
print(f"#{order.order_id} {order.product_name:<18} ${order.unit_price:>7.2f} shipped={order.shipped}")
Output:
#8823 Approach Pro $ 342.19 shipped=False
#2147 Network Pro $ 89.45 shipped=False
#5561 Discover Pro $ 721.83 shipped=False
Class-level assignments fix a field to a constant value. Methods decorated with @classmethod are called each time .build() runs, generating fresh data per instance. cls.__faker__ is a Faker instance bundled with polyfactory — you can call any Faker provider from it without a separate Faker import. This approach keeps all your test data logic in one class, making it easy to find and update when your model changes.
Nested Models and Relationships
polyfactory handles nested Pydantic models automatically — if a field’s type is another Pydantic model, polyfactory recursively generates a valid instance for it:
# nested_models.py
from typing import List
from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory
class Address(BaseModel):
street: str
city: str
postcode: str
country: str
class LineItem(BaseModel):
product_id: int
product_name: str
quantity: int
price: float
class Invoice(BaseModel):
invoice_number: str
billing_address: Address
line_items: List[LineItem]
paid: bool
class InvoiceFactory(ModelFactory):
__model__ = Invoice
invoice = InvoiceFactory.build()
print(f"Invoice: {invoice.invoice_number}")
print(f"Bill to: {invoice.billing_address.city}, {invoice.billing_address.country}")
print(f"Items: {len(invoice.line_items)}")
for item in invoice.line_items:
print(f" - {item.product_name} x{item.quantity} @ ${item.price:.2f}")
print(f"Paid: {invoice.paid}")
Output:
Invoice: VzKmQpRtFn
Bill to: Riverdale, AustraliaFake
Items: 2
- DemoWidget x3 @ $84.20
- ProUnit x1 @ $312.50
Paid: True
polyfactory generates nested Address and LineItem instances without any extra configuration. For List[LineItem], it generates a short list with a random length (typically 1-3 items). You can control list length by setting __min_collection_length__ and __max_collection_length__ on the factory class. For relationships that need specific values (like a foreign key that must match an existing record), use a factory override or a post-generation hook.
Using polyfactory with Dataclasses and attrs
polyfactory is not limited to Pydantic. The DataclassFactory works identically with Python’s built-in dataclasses:
# dataclass_factory.py
from dataclasses import dataclass
from typing import Optional
from polyfactory.factories import DataclassFactory
@dataclass
class Config:
host: str
port: int
database: str
max_connections: int
debug: bool
timeout: Optional[float]
class ConfigFactory(DataclassFactory):
__model__ = Config
# Realistic defaults for testing
port = 5432
max_connections = 10
# Build a batch of 4 configs -- useful for parameterized tests
configs = ConfigFactory.batch(4)
for cfg in configs:
print(f" {cfg.host}:{cfg.port}/{cfg.database} debug={cfg.debug}")
Output:
TkPlNx:5432/BrVzQm debug=False
WsHjYp:5432/KrDtLn debug=True
MgRqFb:5432/PvNwSe debug=False
ZxCvTy:5432/HmQkJr debug=True
The .batch(n) method generates a list of n instances in one call — exactly what you need for parameterized tests, seeding test databases, or benchmarking code that processes collections. Import DataclassFactory from polyfactory.factories for dataclasses, and AttrsFactory from the same module for attrs classes.
Using polyfactory with pytest Fixtures
The cleanest pytest pattern is wrapping polyfactory in a fixture. This gives tests access to both default-generated instances and customized overrides:
# test_order_service.py
import pytest
from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory
class Product(BaseModel):
id: int
name: str
price: float
in_stock: bool
class Cart(BaseModel):
user_id: int
items: list[Product]
class ProductFactory(ModelFactory):
__model__ = Product
in_stock = True # always in-stock by default in tests
class CartFactory(ModelFactory):
__model__ = Cart
# --- Fixtures ---
@pytest.fixture
def product():
return ProductFactory.build()
@pytest.fixture
def cart_with_items():
items = ProductFactory.batch(3)
return CartFactory.build(items=items)
# --- Tests ---
def test_product_is_in_stock(product):
assert product.in_stock is True
def test_cart_total(cart_with_items):
total = sum(item.price for item in cart_with_items.items)
assert total > 0
assert len(cart_with_items.items) == 3
def test_out_of_stock_product():
# Override in_stock for a specific test
out_of_stock = ProductFactory.build(in_stock=False)
assert out_of_stock.in_stock is False
Output (pytest):
collected 3 items
test_order_service.py::test_product_is_in_stock PASSED
test_order_service.py::test_cart_total PASSED
test_order_service.py::test_out_of_stock_product PASSED
3 passed in 0.12s
Fixtures provide a shared default instance; tests that need specific values call ProductFactory.build(field=value) directly. This pattern keeps fixtures simple and lets individual tests express their own data requirements clearly. The factory class stays as the single source of truth for how test data is generated, and it automatically stays in sync with your Pydantic model.
Real-Life Example: API Test Suite
The following test suite uses polyfactory to test a user registration API endpoint with multiple scenarios — valid registrations, duplicate emails, invalid ages — all without writing a single fixture by hand:
# test_registration_api.py
from typing import Optional
from pydantic import BaseModel, EmailStr, field_validator
from polyfactory.factories.pydantic_factory import ModelFactory
class RegistrationRequest(BaseModel):
username: str
email: str
age: int
referral_code: Optional[str] = None
@field_validator("age")
@classmethod
def age_must_be_adult(cls, v):
if v < 18:
raise ValueError("Must be 18 or older")
return v
class RegistrationFactory(ModelFactory):
__model__ = RegistrationRequest
age = 25 # default: always adult
# Simulate an API handler
registered_emails: set = set()
def register_user(req: RegistrationRequest) -> dict:
if req.email in registered_emails:
return {"ok": False, "error": "Email already registered"}
registered_emails.add(req.email)
return {"ok": True, "user_id": hash(req.username) % 10000}
# Test 1: normal registration succeeds
req1 = RegistrationFactory.build()
result = register_user(req1)
assert result["ok"] is True, "Expected success"
print(f"[PASS] New registration: {req1.username} -> user_id {result['user_id']}")
# Test 2: duplicate email is rejected
req2 = RegistrationFactory.build(email=req1.email)
result = register_user(req2)
assert result["ok"] is False
print(f"[PASS] Duplicate email rejected: {req2.email}")
# Test 3: under-18 fails validation
try:
bad_req = RegistrationFactory.build(age=16)
print("[FAIL] Should have raised ValueError for age 16")
except Exception as e:
print(f"[PASS] Age validator fired: {e}")
# Test 4: batch test -- 10 unique registrations all succeed
batch = RegistrationFactory.batch(10)
results = [register_user(r) for r in batch]
passed = sum(1 for r in results if r["ok"])
print(f"[PASS] Batch registration: {passed}/10 succeeded")
Output:
[PASS] New registration: QzKpMrFt -> user_id 7234
[PASS] Duplicate email rejected: user@example.com
[PASS] Age validator fired: 1 validation error for RegistrationRequest
age
Value error, Must be 18 or older [type=value_error, ...]
[PASS] Batch registration: 10/10 succeeded
The factory’s age = 25 default ensures all test instances are adults by default. Individual tests override it when they need to test the validator. The batch test exercises 10 independent registrations without writing any loop logic. Notice that test 3 passes — polyfactory respects Pydantic validators, so .build(age=16) raises ValidationError before the factory even returns, exactly as it would in production.
Frequently Asked Questions
Does polyfactory respect Pydantic field constraints?
Yes. polyfactory reads Field(ge=0, le=100), Field(min_length=5), Annotated[int, Field(gt=0)], and similar constraints, then generates values that satisfy them. If a field is annotated as PositiveInt, polyfactory generates a positive integer. If a validator rejects values below a threshold, polyfactory respects that threshold. This is the core value proposition: generated data is always valid data.
When should I use polyfactory instead of Faker?
Use polyfactory when you want valid instances of your existing Python models with minimal setup. Use Faker when you need data that looks realistic — human names, street addresses, phone numbers, company names. They complement each other: polyfactory factories can call cls.__faker__.name() internally to get realistic string values for fields that benefit from it. Use both in the same factory class.
What happens when I add a field to my model?
polyfactory auto-generates data for the new field immediately — no changes to the factory class needed. If the field has a default value, polyfactory uses it. If it is required with no default, polyfactory generates a random valid value. This is the key advantage over hand-written factories, which require manual updates every time a model changes.
How does polyfactory handle Optional fields?
By default, polyfactory randomly chooses to set Optional fields to either a generated value or None. Set __random_seed__ on the factory for reproducible results, or override the field with a fixed value (referral_code = None) to always use None in that factory. Set it to a specific callable if you want the field always populated.
Does polyfactory work with async code or async validators?
polyfactory’s .build() and .batch() methods are synchronous. If your Pydantic model has async validators, use .build_async() and await the result. The async variant works identically to .build() but handles async validation hooks transparently. You can call it from async pytest tests or async test setup functions.
Conclusion
polyfactory eliminates factory boilerplate by reading your type annotations directly and generating valid instances automatically. You have seen how to build single instances and batches from Pydantic models and dataclasses, customize specific fields while leaving the rest auto-generated, handle nested model relationships without extra configuration, and wire polyfactory into pytest fixtures for clean, maintainable test suites.
The best next step is to go through your existing test files and identify every hand-written factory function or fixture that manually populates model fields. Replace them with a polyfactory class and verify that all tests still pass. You will typically find that 80% of the factory code disappears, and the remaining 20% — the domain-specific overrides — becomes much easier to read and maintain.
For the full polyfactory API reference including TypedDict support, attrs integration, and custom base factories, see the official polyfactory documentation.