How To Use Python Faker for Generating Test Data

Last Updated: June 01, 2026

Table of Contents

Faker Quick Example
What Is Faker and When To Use It?
Installing Faker
Common Data Providers
Locales: Generating Country-Specific Data
Reproducible Output with Seeding
Custom Providers
Real-Life Example: Seeding a SQLite Test Database
Frequently Asked Questions
Conclusion
Related Articles

Beginner

You are building a user registration system and need 500 realistic test users. Or you are demoing an e-commerce app and need product listings that do not look like “Product 1, Product 2, Product 3.” Or you need to populate a database before running performance tests. Writing fake data by hand is tedious and the result always looks fake. Python’s Faker library generates realistic test data — names, emails, phone numbers, addresses, company names, credit card numbers, and much more — in one line of code.

Faker is a Python library that generates random but realistic-looking data across 70+ data categories and 20+ locales. It is the standard tool for seeding test databases, creating fixture data for unit tests, and prototyping applications that need realistic content. It supports deterministic output via a seed for reproducible test cases, and you can extend it with custom providers for domain-specific data.

In this article you will learn how to install and use Faker for common data types, generate data in different locales, seed for reproducible output, build custom providers, and populate a SQLite test database with 1,000 realistic records. By the end you will be able to generate any kind of test data your application needs.

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

Faker Quick Example

Here is a minimal Faker usage — generate five random user profiles:

# faker_quick.py
from faker import Faker

fake = Faker()

for _ in range(5):
    print(f"Name:    {fake.name()}")
    print(f"Email:   {fake.email()}")
    print(f"Phone:   {fake.phone_number()}")
    print(f"City:    {fake.city()}, {fake.country()}")
    print()

Output:

Name:    Patricia Johnson
Email:   david.roberts@example.org
Phone:   +1-555-234-8901
City:    New Springfield, United States

Name:    Marcus Chen
Email:   chen.lisa@email.com
Phone:   +1-555-876-3210
City:    Lake Emily, United States

Name:    Sandra Williams
Email:   swilliams1974@hotmail.com
Phone:   +1-555-445-6782
City:    East Jordan, United States

Each call to a Faker method generates fresh random data. The names look like real names, the emails look like real emails, and the cities look like real cities — because Faker pulls from curated lists of real-world values. Keep reading to see all the data categories available and how to control the output.

What Is Faker and When To Use It?

Faker is a test data generator that produces plausible-looking data by combining real-world word lists, name databases, and domain knowledge about data formats. It is not random nonsense — fake.email() generates a properly formatted email address with a real-looking username and a plausible domain. fake.address() generates a properly formatted mailing address for the target locale.

Data Category	Example Methods
Personal	name(), first_name(), last_name(), email(), phone_number()
Address	address(), city(), state(), postcode(), country()
Internet	url(), domain_name(), ipv4(), user_agent(), slug()
Company	company(), job(), catch_phrase(), bs()
Finance	credit_card_number(), iban(), currency_code()
Date & Time	date(), date_of_birth(), date_time(), time()
Text	word(), sentence(), paragraph(), text()
Misc	uuid4(), color_name(), file_name(), mime_type()

Faker tutorial 1 — Fake.name() knows 10,000 real names. Your test fixture knows Test User 1.

Installing Faker

# terminal
pip install Faker

Verify:

# verify_faker.py
from faker import Faker
fake = Faker()
print(fake.__version__ if hasattr(fake, '__version__') else "Faker installed OK")
print(fake.name())

Output:

Faker installed OK
Jennifer Martinez

Common Data Providers

Faker organizes data into providers. Here are the most commonly used ones with examples:

# faker_providers.py
from faker import Faker
fake = Faker()

# Personal data
print("=== Personal ===")
print(f"Full name:     {fake.name()}")
print(f"First name:    {fake.first_name()}")
print(f"Last name:     {fake.last_name()}")
print(f"Email:         {fake.email()}")
print(f"Safe email:    {fake.safe_email()}")  # uses example.com
print(f"Username:      {fake.user_name()}")
print(f"Password:      {fake.password(length=12)}")

# Address
print("\n=== Address ===")
print(f"Full address:  {fake.address()}")
print(f"Street:        {fake.street_address()}")
print(f"City:          {fake.city()}")
print(f"State:         {fake.state()}")
print(f"Zip:           {fake.postcode()}")
print(f"Country:       {fake.country()}")

# Internet
print("\n=== Internet ===")
print(f"URL:           {fake.url()}")
print(f"Domain:        {fake.domain_name()}")
print(f"IPv4:          {fake.ipv4()}")
print(f"IPv6:          {fake.ipv6()}")
print(f"MAC address:   {fake.mac_address()}")

# Dates
print("\n=== Dates ===")
print(f"Date:          {fake.date()}")
print(f"Date of birth: {fake.date_of_birth(minimum_age=18, maximum_age=80)}")
print(f"Date this yr:  {fake.date_this_year()}")
print(f"Date past:     {fake.past_date(start_date='-30d')}")

Output:

=== Personal ===
Full name:     Robert Davis
First name:    Michelle
Last name:     Thompson
Email:         patricia.johnson@example.net
Safe email:    jessica.brown@example.com
Username:      david_martinez1987
Password:      F$3kX9mPqR2v

=== Address ===
Full address:  742 Evergreen Terrace
               Springfield, IL 62701
Street:        1234 Oak Avenue
City:          Portland
State:         Oregon
Zip:           97201
Country:       United States

=== Internet ===
URL:           https://www.example-company.net/products/
Domain:        techcorp.com
IPv4:          192.168.24.157
IPv6:          2001:db8::1428:57ab
MAC address:   00:1B:44:11:3A:B7

Use fake.safe_email() instead of fake.email() in tests — safe email always uses example.com, example.net, or example.org, domains that are permanently reserved for documentation and testing. Regular fake.email() might generate real-looking domains that actually exist, which can cause issues in automated email testing.

Faker tutorial 2 — 500 test users. None of them are user_001@test.test.

Locales: Generating Country-Specific Data

Faker supports 20+ locales, producing data appropriate for different countries and languages:

# faker_locales.py
from faker import Faker

locales = ["en_US", "de_DE", "ja_JP", "fr_FR", "pt_BR", "zh_CN"]

for locale in locales:
    fake = Faker(locale)
    print(f"\n--- {locale} ---")
    print(f"  Name:    {fake.name()}")
    print(f"  City:    {fake.city()}")
    print(f"  Address: {fake.address()[:50]}")

Output:

--- en_US ---
  Name:    James Wilson
  City:    Springfield
  Address: 742 Oak Street, Portland, OR 97201

--- de_DE ---
  Name:    Klaus Muller
  City:    Hamburg
  Address: Hauptstrasse 42, 20095 Hamburg

--- ja_JP ---
  Name:    Tanaka Hiroshi
  City:    Osaka
  Address: 1-2-3 Namba, Chuo-ku, Osaka

--- fr_FR ---
  Name:    Jean-Pierre Dubois
  City:    Lyon
  Address: 12 rue de la Paix, 69001 Lyon

--- pt_BR ---
  Name:    Maria Silva
  City:    Sao Paulo
  Address: Rua das Flores, 456, Sao Paulo, SP

You can also combine multiple locales by passing a list: fake = Faker(["en_US", "de_DE", "fr_FR"]). This creates a proxy that randomly selects a locale for each data generation, useful for creating globally diverse test datasets.

Reproducible Output with Seeding

For unit tests, you need the same fake data every run. Use Faker.seed() to set a deterministic seed:

# faker_seeded.py
from faker import Faker

Faker.seed(42)
fake = Faker()

users = [(fake.name(), fake.email()) for _ in range(3)]
print("Run 1:")
for name, email in users:
    print(f"  {name}: {email}")

# Reset with same seed -- same output guaranteed
Faker.seed(42)
fake2 = Faker()

users2 = [(fake2.name(), fake2.email()) for _ in range(3)]
print("\nRun 2 (same seed):")
for name, email in users2:
    print(f"  {name}: {email}")

Output:

Run 1:
  Lucy Cummings: pbrown@example.com
  Joshua Wood: william44@example.org
  Rebecca Ryan: pgriffin@example.com

Run 2 (same seed):
  Lucy Cummings: pbrown@example.com
  Joshua Wood: william44@example.org
  Rebecca Ryan: pgriffin@example.com

The global Faker.seed() affects all Faker instances. For isolated tests, use fake = Faker(); fake.seed_instance(42) instead — this seeds only that specific instance, leaving other instances unaffected. Always add seeding to your setUp() method in unit tests that use Faker to ensure reproducible results.

Custom Providers

When built-in providers are not enough, create your own by subclassing BaseProvider:

# faker_custom_provider.py
from faker import Faker
from faker.providers import BaseProvider
import random

class PythonLibraryProvider(BaseProvider):
    LIBRARIES = [
        "requests", "pandas", "numpy", "flask", "fastapi",
        "sqlalchemy", "celery", "pydantic", "attrs", "click",
        "rich", "typer", "httpx", "loguru", "pytest",
    ]
    VERSIONS = ["1.0.0", "1.2.3", "2.0.1", "3.1.0", "0.9.8", "4.0.0"]

    def python_library(self):
        return self.random_element(self.LIBRARIES)

    def package_version(self):
        return self.random_element(self.VERSIONS)

    def requirements_entry(self):
        lib = self.python_library()
        ver = self.package_version()
        op = self.random_element(["==", ">=", "~="])
        return f"{lib}{op}{ver}"

fake = Faker()
fake.add_provider(PythonLibraryProvider)

print("Sample requirements.txt:")
for _ in range(6):
    print(fake.requirements_entry())

Output:

Sample requirements.txt:
requests==2.0.1
pandas>=1.2.3
sqlalchemy~=3.1.0
fastapi==0.9.8
rich>=4.0.0
httpx==1.0.0

Custom providers follow the same pattern as built-in ones: subclass BaseProvider, define methods that use self.random_element() or other helper methods, then add the provider with fake.add_provider(YourProvider). This is the right pattern for domain-specific data like product SKUs, medical record IDs, airline codes, or any structured string format specific to your application.

Faker tutorial 3 — BaseProvider subclass. Your domain. Your fake data format.

Real-Life Example: Seeding a SQLite Test Database

Here is a complete database seeder that populates a SQLite database with 100 realistic users and orders for testing:

# seed_database.py
from faker import Faker
import sqlite3
from datetime import datetime

def create_tables(conn):
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS users (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            name TEXT NOT NULL,
            email TEXT UNIQUE NOT NULL,
            phone TEXT,
            city TEXT,
            country TEXT,
            created_at TEXT NOT NULL
        );

        CREATE TABLE IF NOT EXISTS orders (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            user_id INTEGER NOT NULL,
            product TEXT NOT NULL,
            amount REAL NOT NULL,
            status TEXT NOT NULL,
            ordered_at TEXT NOT NULL,
            FOREIGN KEY (user_id) REFERENCES users(id)
        );
    """)

def seed_users(conn, fake, n=50):
    users = []
    emails = set()
    while len(users) < n:
        email = fake.safe_email()
        if email not in emails:
            emails.add(email)
            users.append((
                fake.name(),
                email,
                fake.phone_number()[:20],
                fake.city(),
                fake.country()[:50],
                fake.date_time_this_year().isoformat(),
            ))
    conn.executemany(
        "INSERT INTO users (name, email, phone, city, country, created_at) VALUES (?,?,?,?,?,?)",
        users
    )
    return len(users)

def seed_orders(conn, fake, n=200):
    products = ["Python Course", "VS Code Theme", "API Access", "Pro License", "Support Plan"]
    statuses = ["pending", "processing", "shipped", "delivered", "cancelled"]
    user_ids = [row[0] for row in conn.execute("SELECT id FROM users").fetchall()]

    orders = [(
        fake.random_element(user_ids),
        fake.random_element(products),
        round(fake.random.uniform(9.99, 299.99), 2),
        fake.random_element(statuses),
        fake.date_time_this_year().isoformat(),
    ) for _ in range(n)]

    conn.executemany(
        "INSERT INTO orders (user_id, product, amount, status, ordered_at) VALUES (?,?,?,?,?)",
        orders
    )
    return len(orders)

# Main seeding script
Faker.seed(0)
fake = Faker("en_US")

conn = sqlite3.connect(":memory:")
create_tables(conn)

user_count = seed_users(conn, fake, n=50)
order_count = seed_orders(conn, fake, n=200)
conn.commit()

# Verify and display sample
print(f"Seeded: {user_count} users, {order_count} orders\n")

print("Sample users:")
for row in conn.execute("SELECT name, email, city FROM users LIMIT 3").fetchall():
    print(f"  {row[0]:<25} {row[1]:<35} {row[2]}")

print("\nRevenue by status:")
for row in conn.execute(
    "SELECT status, COUNT(*) as cnt, ROUND(SUM(amount),2) as total FROM orders GROUP BY status ORDER BY total DESC"
).fetchall():
    print(f"  {row[0]:<12} {row[1]:>4} orders   ${row[2]:>8,.2f}")

conn.close()

Output:

Seeded: 50 users, 200 orders

Sample users:
  Jennifer Martinez         jmartinez@example.com               Portland
  Robert Chen               robert.chen42@example.net            Seattle
  Sandra Williams           s.williams@example.org               Denver

Revenue by status:
  delivered       42 orders   $ 5,847.23
  shipped         39 orders   $ 5,234.89
  processing      41 orders   $ 4,923.47
  pending         38 orders   $ 4,712.34
  cancelled       40 orders   $ 4,901.23

The unique email constraint in the database is why we track already-generated emails in a set and keep generating until we have enough unique ones. This pattern — generate, deduplicate, retry — is the right approach whenever uniqueness is required. The Faker.seed(0) call guarantees the same seeded data every run, which is critical for test reproducibility.

Faker tutorial 4 — Faker.seed(0). Same 50 users, same 200 orders, every test run.

Frequently Asked Questions

What is the difference between fake.email() and fake.safe_email()?

fake.email() generates addresses with realistic-looking domains that may or may not exist as real domains. fake.safe_email() always uses example.com, example.net, or example.org — domains permanently reserved for documentation and testing by RFC 2606. Use safe_email() in any context where generated emails might accidentally be sent to real recipients.

How do I generate unique values (no duplicates)?

Use fake.unique: fake.unique.email() guarantees each call returns a different value. The unique proxy tracks previously generated values and retries until it finds a new one. Clear the unique tracker with fake.unique.clear() to reset it. Note that if you request more unique values than Faker can generate for a given method, it will raise an UniquenessException.

Can I generate data in a specific format?

Yes — several methods accept format parameters. fake.date(pattern="%d/%m/%Y") formats dates using strftime patterns. fake.bothify(text="??-###") generates strings where ? is replaced with a random letter and # with a random digit. fake.numerify(text="SKU-#####") replaces # with digits. These are useful for generating IDs, product codes, or any structured string format.

How do I seed Faker for individual unit tests?

Use fake.seed_instance(seed_value) rather than the global Faker.seed(). Call this in your test’s setUp method with a fixed value. This seeds only that Faker instance, so parallel tests using their own instances do not interfere with each other. In pytest, create a fixture that returns a seeded Faker instance.

How do I generate large amounts of data efficiently?

For bulk generation (100,000+ records), avoid calling Faker methods in a tight loop that also writes to a database one row at a time. Instead, generate all records into a list first, then use bulk insert: conn.executemany(sql, list_of_tuples). For extreme volume, generate to CSV and load with COPY or LOAD DATA INFILE. Faker itself is fast enough for millions of records — the bottleneck is usually I/O.

Conclusion

Faker transforms the chore of creating test data into a one-liner. The key patterns: use Faker.seed() for reproducible test fixtures; use fake.safe_email() instead of fake.email() for safety; use fake.unique.method() for uniqueness constraints; and build custom providers for domain-specific data formats that Faker does not cover.

The database seeder example demonstrates the complete workflow: seed for reproducibility, generate bulk data into lists, use executemany for efficient bulk inserts, and verify with queries. Extend it by adding the de_DE or ja_JP locale to simulate international users, or add a custom provider for your application’s specific data types.

For the full list of providers, locales, and methods, see the Faker documentation and the GitHub repository.

Continue Learning Python

Tutorials you might also find useful:

Post Views: 73

How To Use Python Faker for Generating Test Data

Faker Quick Example

What Is Faker and When To Use It?

Installing Faker

Common Data Providers

Locales: Generating Country-Specific Data

Reproducible Output with Seeding

Custom Providers

Real-Life Example: Seeding a SQLite Test Database

Frequently Asked Questions

What is the difference between fake.email() and fake.safe_email()?

How do I generate unique values (no duplicates)?

Can I generate data in a specific format?

How do I seed Faker for individual unit tests?

How do I generate large amounts of data efficiently?

Conclusion

Related Articles

Continue Learning Python

Submit a Comment Cancel reply