Intermediate

You have a Pydantic model with a dozen fields, nested relationships, and validators. Now you need to write ten different tests — some with valid data, some with edge cases, some with specific fields set to particular values. You write a factory function. Then another. Then a helper. By the time you have covered all your test scenarios, the test setup code is longer than the code being tested. There is a better way. polyfactory generates valid, type-safe test data from your existing models automatically — no separate factory classes, no repeated boilerplate, no manually wired constructors.

polyfactory reads your Pydantic models, Python dataclasses, attrs classes, and typed dicts, then generates instances that satisfy all field types and validators. Install it with pip install polyfactory. If your models use Pydantic v2, that is all you need. polyfactory supports attrs and dataclasses without any extra dependencies.

This article covers generating test data from Pydantic models and dataclasses, customizing generated values, building batch fixtures, using polyfactory with pytest, handling nested models and relationships, and knowing when to use polyfactory versus Faker. By the end you will be able to replace your hand-written factory functions with a single-line polyfactory declaration that stays in sync with your models automatically.

polyfactory Quick Example

The core pattern: subclass ModelFactory with your model as the generic type, then call .build() to get a valid instance:

# quick_polyfactory.py
from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory

class User(BaseModel):
    id: int
    username: str
    email: str
    age: int
    is_active: bool

class UserFactory(ModelFactory):
    __model__ = User

# Generate a single User with random valid data
user = UserFactory.build()
print(user)

# Generate with specific field overrides
admin = UserFactory.build(username="admin", is_active=True)
print(admin)

Output:

id=7432 username='WbHkQeTzFn' email='user@example.com' age=23 is_active=False
id=1891 username='admin' is_active=True email='test@example.com' age=41

polyfactory introspects the User model’s type annotations and generates a value for each field: a random integer for id and age, a random string for username, and so on. The .build() call accepts keyword arguments that override specific fields while leaving the rest auto-generated. All generated instances pass Pydantic validation because polyfactory respects your field types and constraints.

The sections below show how to customize generation, handle nested models, generate batches, and integrate with pytest fixtures.

Sudo Sam watching data objects assemble from type annotations — Writing factory functions manually is a junior dev initiation ritual. You’ve been liberated.

What Is polyfactory and When Should You Use It?

polyfactory is a test data generation library that bridges the gap between your data models and your test fixtures. It reads a model’s type annotations, validators, and field constraints, then generates random but valid instances. This means your factories never go stale — if you add a required field to a model, polyfactory automatically generates data for it without any changes to your test code.

Tool	Best for	Syncs with models?	Customizable?
polyfactory	Type-safe instances from existing models	Yes — reads annotations directly	Yes — field overrides + custom providers
Faker	Realistic fake data (names, emails, addresses)	No — manual wiring required	Yes — 150+ providers
factory_boy	Django ORM test factories, sequences	Partial (Django models only)	Yes — lazy attributes, sequences
Hypothesis	Property-based testing, edge case discovery	No — strategy-based	Yes — strategies + composable

Use polyfactory when you want instances of your existing Pydantic or dataclass models with minimal boilerplate. Use Faker when you need data that looks realistic — real names, valid-format email addresses, plausible street addresses. Use Hypothesis when you want to find edge cases automatically by generating thousands of inputs and checking invariants.

Customizing Generated Values

polyfactory generates random data by default. You can lock specific fields to fixed values, use sequences, or provide custom callables for any field on the factory class:

# customizing.py
from datetime import date
from pydantic import BaseModel, EmailStr
from polyfactory.factories.pydantic_factory import ModelFactory

class Order(BaseModel):
    order_id: int
    customer_email: str
    product_name: str
    quantity: int
    unit_price: float
    order_date: date
    shipped: bool

class OrderFactory(ModelFactory):
    __model__ = Order

    # Lock a field to a fixed value
    shipped = False

    # Use a sequence for unique IDs
    order_id = ModelFactory.__random_int

    # Custom callable for realistic-looking values
    @classmethod
    def product_name(cls) -> str:
        return cls.__faker__.word().capitalize() + " Pro"

    @classmethod
    def unit_price(cls) -> float:
        return round(cls.__faker__.pyfloat(min_value=9.99, max_value=999.99), 2)

# Build several orders
for _ in range(3):
    order = OrderFactory.build()
    print(f"#{order.order_id} {order.product_name:<18} ${order.unit_price:>7.2f}  shipped={order.shipped}")

Output:

#8823 Approach Pro        $  342.19  shipped=False
#2147 Network Pro         $   89.45  shipped=False
#5561 Discover Pro        $  721.83  shipped=False

Class-level assignments fix a field to a constant value. Methods decorated with @classmethod are called each time .build() runs, generating fresh data per instance. cls.__faker__ is a Faker instance bundled with polyfactory — you can call any Faker provider from it without a separate Faker import. This approach keeps all your test data logic in one class, making it easy to find and update when your model changes.

Cache Katie adjusting dials on holographic control panel with field annotations — __faker__.pyfloat() — realistic enough to fool a QA engineer.

Nested Models and Relationships

polyfactory handles nested Pydantic models automatically — if a field’s type is another Pydantic model, polyfactory recursively generates a valid instance for it:

# nested_models.py
from typing import List
from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory

class Address(BaseModel):
    street: str
    city: str
    postcode: str
    country: str

class LineItem(BaseModel):
    product_id: int
    product_name: str
    quantity: int
    price: float

class Invoice(BaseModel):
    invoice_number: str
    billing_address: Address
    line_items: List[LineItem]
    paid: bool

class InvoiceFactory(ModelFactory):
    __model__ = Invoice

invoice = InvoiceFactory.build()

print(f"Invoice: {invoice.invoice_number}")
print(f"Bill to: {invoice.billing_address.city}, {invoice.billing_address.country}")
print(f"Items:   {len(invoice.line_items)}")
for item in invoice.line_items:
    print(f"  - {item.product_name} x{item.quantity} @ ${item.price:.2f}")
print(f"Paid:    {invoice.paid}")

Output:

Invoice: VzKmQpRtFn
Bill to: Riverdale, AustraliaFake
Items:   2
  - DemoWidget x3 @ $84.20
  - ProUnit x1 @ $312.50
Paid:    True

polyfactory generates nested Address and LineItem instances without any extra configuration. For List[LineItem], it generates a short list with a random length (typically 1-3 items). You can control list length by setting __min_collection_length__ and __max_collection_length__ on the factory class. For relationships that need specific values (like a foreign key that must match an existing record), use a factory override or a post-generation hook.

Using polyfactory with Dataclasses and attrs

polyfactory is not limited to Pydantic. The DataclassFactory works identically with Python’s built-in dataclasses:

# dataclass_factory.py
from dataclasses import dataclass
from typing import Optional
from polyfactory.factories import DataclassFactory

@dataclass
class Config:
    host: str
    port: int
    database: str
    max_connections: int
    debug: bool
    timeout: Optional[float]

class ConfigFactory(DataclassFactory):
    __model__ = Config

    # Realistic defaults for testing
    port = 5432
    max_connections = 10

# Build a batch of 4 configs -- useful for parameterized tests
configs = ConfigFactory.batch(4)
for cfg in configs:
    print(f"  {cfg.host}:{cfg.port}/{cfg.database}  debug={cfg.debug}")

Output:

  TkPlNx:5432/BrVzQm  debug=False
  WsHjYp:5432/KrDtLn  debug=True
  MgRqFb:5432/PvNwSe  debug=False
  ZxCvTy:5432/HmQkJr  debug=True

The .batch(n) method generates a list of n instances in one call — exactly what you need for parameterized tests, seeding test databases, or benchmarking code that processes collections. Import DataclassFactory from polyfactory.factories for dataclasses, and AttrsFactory from the same module for attrs classes.

Loop Larry watching batches of test objects on conveyor belt — .batch(100) — your integration test database seeds itself now.

Using polyfactory with pytest Fixtures

The cleanest pytest pattern is wrapping polyfactory in a fixture. This gives tests access to both default-generated instances and customized overrides:

# test_order_service.py
import pytest
from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory

class Product(BaseModel):
    id: int
    name: str
    price: float
    in_stock: bool

class Cart(BaseModel):
    user_id: int
    items: list[Product]

class ProductFactory(ModelFactory):
    __model__ = Product
    in_stock = True   # always in-stock by default in tests

class CartFactory(ModelFactory):
    __model__ = Cart

# --- Fixtures ---

@pytest.fixture
def product():
    return ProductFactory.build()

@pytest.fixture
def cart_with_items():
    items = ProductFactory.batch(3)
    return CartFactory.build(items=items)

# --- Tests ---

def test_product_is_in_stock(product):
    assert product.in_stock is True

def test_cart_total(cart_with_items):
    total = sum(item.price for item in cart_with_items.items)
    assert total > 0
    assert len(cart_with_items.items) == 3

def test_out_of_stock_product():
    # Override in_stock for a specific test
    out_of_stock = ProductFactory.build(in_stock=False)
    assert out_of_stock.in_stock is False

Output (pytest):

collected 3 items

test_order_service.py::test_product_is_in_stock PASSED
test_order_service.py::test_cart_total PASSED
test_order_service.py::test_out_of_stock_product PASSED

3 passed in 0.12s

Fixtures provide a shared default instance; tests that need specific values call ProductFactory.build(field=value) directly. This pattern keeps fixtures simple and lets individual tests express their own data requirements clearly. The factory class stays as the single source of truth for how test data is generated, and it automatically stays in sync with your Pydantic model.

Real-Life Example: API Test Suite

The following test suite uses polyfactory to test a user registration API endpoint with multiple scenarios — valid registrations, duplicate emails, invalid ages — all without writing a single fixture by hand:

# test_registration_api.py
from typing import Optional
from pydantic import BaseModel, EmailStr, field_validator
from polyfactory.factories.pydantic_factory import ModelFactory

class RegistrationRequest(BaseModel):
    username: str
    email: str
    age: int
    referral_code: Optional[str] = None

    @field_validator("age")
    @classmethod
    def age_must_be_adult(cls, v):
        if v < 18:
            raise ValueError("Must be 18 or older")
        return v

class RegistrationFactory(ModelFactory):
    __model__ = RegistrationRequest
    age = 25   # default: always adult

# Simulate an API handler
registered_emails: set = set()

def register_user(req: RegistrationRequest) -> dict:
    if req.email in registered_emails:
        return {"ok": False, "error": "Email already registered"}
    registered_emails.add(req.email)
    return {"ok": True, "user_id": hash(req.username) % 10000}

# Test 1: normal registration succeeds
req1 = RegistrationFactory.build()
result = register_user(req1)
assert result["ok"] is True, "Expected success"
print(f"[PASS] New registration: {req1.username} -> user_id {result['user_id']}")

# Test 2: duplicate email is rejected
req2 = RegistrationFactory.build(email=req1.email)
result = register_user(req2)
assert result["ok"] is False
print(f"[PASS] Duplicate email rejected: {req2.email}")

# Test 3: under-18 fails validation
try:
    bad_req = RegistrationFactory.build(age=16)
    print("[FAIL] Should have raised ValueError for age 16")
except Exception as e:
    print(f"[PASS] Age validator fired: {e}")

# Test 4: batch test -- 10 unique registrations all succeed
batch = RegistrationFactory.batch(10)
results = [register_user(r) for r in batch]
passed = sum(1 for r in results if r["ok"])
print(f"[PASS] Batch registration: {passed}/10 succeeded")

Output:

[PASS] New registration: QzKpMrFt -> user_id 7234
[PASS] Duplicate email rejected: user@example.com
[PASS] Age validator fired: 1 validation error for RegistrationRequest
age
  Value error, Must be 18 or older [type=value_error, ...]
[PASS] Batch registration: 10/10 succeeded

The factory’s age = 25 default ensures all test instances are adults by default. Individual tests override it when they need to test the validator. The batch test exercises 10 independent registrations without writing any loop logic. Notice that test 3 passes — polyfactory respects Pydantic validators, so .build(age=16) raises ValidationError before the factory even returns, exactly as it would in production.

Frequently Asked Questions

Does polyfactory respect Pydantic field constraints?

Yes. polyfactory reads Field(ge=0, le=100), Field(min_length=5), Annotated[int, Field(gt=0)], and similar constraints, then generates values that satisfy them. If a field is annotated as PositiveInt, polyfactory generates a positive integer. If a validator rejects values below a threshold, polyfactory respects that threshold. This is the core value proposition: generated data is always valid data.

When should I use polyfactory instead of Faker?

Use polyfactory when you want valid instances of your existing Python models with minimal setup. Use Faker when you need data that looks realistic — human names, street addresses, phone numbers, company names. They complement each other: polyfactory factories can call cls.__faker__.name() internally to get realistic string values for fields that benefit from it. Use both in the same factory class.

What happens when I add a field to my model?

polyfactory auto-generates data for the new field immediately — no changes to the factory class needed. If the field has a default value, polyfactory uses it. If it is required with no default, polyfactory generates a random valid value. This is the key advantage over hand-written factories, which require manual updates every time a model changes.

How does polyfactory handle Optional fields?

By default, polyfactory randomly chooses to set Optional fields to either a generated value or None. Set __random_seed__ on the factory for reproducible results, or override the field with a fixed value (referral_code = None) to always use None in that factory. Set it to a specific callable if you want the field always populated.

Does polyfactory work with async code or async validators?

polyfactory’s .build() and .batch() methods are synchronous. If your Pydantic model has async validators, use .build_async() and await the result. The async variant works identically to .build() but handles async validation hooks transparently. You can call it from async pytest tests or async test setup functions.

Conclusion

polyfactory eliminates factory boilerplate by reading your type annotations directly and generating valid instances automatically. You have seen how to build single instances and batches from Pydantic models and dataclasses, customize specific fields while leaving the rest auto-generated, handle nested model relationships without extra configuration, and wire polyfactory into pytest fixtures for clean, maintainable test suites.

The best next step is to go through your existing test files and identify every hand-written factory function or fixture that manually populates model fields. Replace them with a polyfactory class and verify that all tests still pass. You will typically find that 80% of the factory code disappears, and the remaining 20% — the domain-specific overrides — becomes much easier to read and maintain.

For the full polyfactory API reference including TypedDict support, attrs integration, and custom base factories, see the official polyfactory documentation.

Post Views: 3

How To Use Python polyfactory for Test Data Generation

polyfactory Quick Example

What Is polyfactory and When Should You Use It?

Customizing Generated Values

Nested Models and Relationships

Using polyfactory with Dataclasses and attrs

Using polyfactory with pytest Fixtures

Real-Life Example: API Test Suite

Frequently Asked Questions

Does polyfactory respect Pydantic field constraints?

When should I use polyfactory instead of Faker?

What happens when I add a field to my model?

How does polyfactory handle Optional fields?

Does polyfactory work with async code or async validators?

Conclusion

Submit a Comment Cancel reply

How To Use Python polyfactory for Test Data Generation

polyfactory Quick Example

What Is polyfactory and When Should You Use It?

Customizing Generated Values

Nested Models and Relationships

Using polyfactory with Dataclasses and attrs

Using polyfactory with pytest Fixtures

Real-Life Example: API Test Suite

Frequently Asked Questions

Does polyfactory respect Pydantic field constraints?

When should I use polyfactory instead of Faker?

What happens when I add a field to my model?

How does polyfactory handle Optional fields?

Does polyfactory work with async code or async validators?

Conclusion

Related Articles

Submit a Comment Cancel reply