Beginner
You are building a user registration system and need 500 realistic test users. Or you are demoing an e-commerce app and need product listings that do not look like “Product 1, Product 2, Product 3.” Or you need to populate a database before running performance tests. Writing fake data by hand is tedious and the result always looks fake. Python’s Faker library generates realistic test data — names, emails, phone numbers, addresses, company names, credit card numbers, and much more — in one line of code.
Faker is a Python library that generates random but realistic-looking data across 70+ data categories and 20+ locales. It is the standard tool for seeding test databases, creating fixture data for unit tests, and prototyping applications that need realistic content. It supports deterministic output via a seed for reproducible test cases, and you can extend it with custom providers for domain-specific data.
In this article you will learn how to install and use Faker for common data types, generate data in different locales, seed for reproducible output, build custom providers, and populate a SQLite test database with 1,000 realistic records. By the end you will be able to generate any kind of test data your application needs.
Faker Quick Example
Here is a minimal Faker usage — generate five random user profiles:
# faker_quick.py
from faker import Faker
fake = Faker()
for _ in range(5):
print(f"Name: {fake.name()}")
print(f"Email: {fake.email()}")
print(f"Phone: {fake.phone_number()}")
print(f"City: {fake.city()}, {fake.country()}")
print()
Output:
Name: Patricia Johnson
Email: david.roberts@example.org
Phone: +1-555-234-8901
City: New Springfield, United States
Name: Marcus Chen
Email: chen.lisa@email.com
Phone: +1-555-876-3210
City: Lake Emily, United States
Name: Sandra Williams
Email: swilliams1974@hotmail.com
Phone: +1-555-445-6782
City: East Jordan, United States
Each call to a Faker method generates fresh random data. The names look like real names, the emails look like real emails, and the cities look like real cities — because Faker pulls from curated lists of real-world values. Keep reading to see all the data categories available and how to control the output.
What Is Faker and When To Use It?
Faker is a test data generator that produces plausible-looking data by combining real-world word lists, name databases, and domain knowledge about data formats. It is not random nonsense — fake.email() generates a properly formatted email address with a real-looking username and a plausible domain. fake.address() generates a properly formatted mailing address for the target locale.
| Data Category | Example Methods |
|---|---|
| Personal | name(), first_name(), last_name(), email(), phone_number() |
| Address | address(), city(), state(), postcode(), country() |
| Internet | url(), domain_name(), ipv4(), user_agent(), slug() |
| Company | company(), job(), catch_phrase(), bs() |
| Finance | credit_card_number(), iban(), currency_code() |
| Date & Time | date(), date_of_birth(), date_time(), time() |
| Text | word(), sentence(), paragraph(), text() |
| Misc | uuid4(), color_name(), file_name(), mime_type() |
Installing Faker
# terminal
pip install Faker
Verify:
# verify_faker.py
from faker import Faker
fake = Faker()
print(fake.__version__ if hasattr(fake, '__version__') else "Faker installed OK")
print(fake.name())
Output:
Faker installed OK
Jennifer Martinez
Common Data Providers
Faker organizes data into providers. Here are the most commonly used ones with examples:
# faker_providers.py
from faker import Faker
fake = Faker()
# Personal data
print("=== Personal ===")
print(f"Full name: {fake.name()}")
print(f"First name: {fake.first_name()}")
print(f"Last name: {fake.last_name()}")
print(f"Email: {fake.email()}")
print(f"Safe email: {fake.safe_email()}") # uses example.com
print(f"Username: {fake.user_name()}")
print(f"Password: {fake.password(length=12)}")
# Address
print("\n=== Address ===")
print(f"Full address: {fake.address()}")
print(f"Street: {fake.street_address()}")
print(f"City: {fake.city()}")
print(f"State: {fake.state()}")
print(f"Zip: {fake.postcode()}")
print(f"Country: {fake.country()}")
# Internet
print("\n=== Internet ===")
print(f"URL: {fake.url()}")
print(f"Domain: {fake.domain_name()}")
print(f"IPv4: {fake.ipv4()}")
print(f"IPv6: {fake.ipv6()}")
print(f"MAC address: {fake.mac_address()}")
# Dates
print("\n=== Dates ===")
print(f"Date: {fake.date()}")
print(f"Date of birth: {fake.date_of_birth(minimum_age=18, maximum_age=80)}")
print(f"Date this yr: {fake.date_this_year()}")
print(f"Date past: {fake.past_date(start_date='-30d')}")
Output:
=== Personal ===
Full name: Robert Davis
First name: Michelle
Last name: Thompson
Email: patricia.johnson@example.net
Safe email: jessica.brown@example.com
Username: david_martinez1987
Password: F$3kX9mPqR2v
=== Address ===
Full address: 742 Evergreen Terrace
Springfield, IL 62701
Street: 1234 Oak Avenue
City: Portland
State: Oregon
Zip: 97201
Country: United States
=== Internet ===
URL: https://www.example-company.net/products/
Domain: techcorp.com
IPv4: 192.168.24.157
IPv6: 2001:db8::1428:57ab
MAC address: 00:1B:44:11:3A:B7
Use fake.safe_email() instead of fake.email() in tests — safe email always uses example.com, example.net, or example.org, domains that are permanently reserved for documentation and testing. Regular fake.email() might generate real-looking domains that actually exist, which can cause issues in automated email testing.
Locales: Generating Country-Specific Data
Faker supports 20+ locales, producing data appropriate for different countries and languages:
# faker_locales.py
from faker import Faker
locales = ["en_US", "de_DE", "ja_JP", "fr_FR", "pt_BR", "zh_CN"]
for locale in locales:
fake = Faker(locale)
print(f"\n--- {locale} ---")
print(f" Name: {fake.name()}")
print(f" City: {fake.city()}")
print(f" Address: {fake.address()[:50]}")
Output:
--- en_US ---
Name: James Wilson
City: Springfield
Address: 742 Oak Street, Portland, OR 97201
--- de_DE ---
Name: Klaus Muller
City: Hamburg
Address: Hauptstrasse 42, 20095 Hamburg
--- ja_JP ---
Name: Tanaka Hiroshi
City: Osaka
Address: 1-2-3 Namba, Chuo-ku, Osaka
--- fr_FR ---
Name: Jean-Pierre Dubois
City: Lyon
Address: 12 rue de la Paix, 69001 Lyon
--- pt_BR ---
Name: Maria Silva
City: Sao Paulo
Address: Rua das Flores, 456, Sao Paulo, SP
You can also combine multiple locales by passing a list: fake = Faker(["en_US", "de_DE", "fr_FR"]). This creates a proxy that randomly selects a locale for each data generation, useful for creating globally diverse test datasets.
Reproducible Output with Seeding
For unit tests, you need the same fake data every run. Use Faker.seed() to set a deterministic seed:
# faker_seeded.py
from faker import Faker
Faker.seed(42)
fake = Faker()
users = [(fake.name(), fake.email()) for _ in range(3)]
print("Run 1:")
for name, email in users:
print(f" {name}: {email}")
# Reset with same seed -- same output guaranteed
Faker.seed(42)
fake2 = Faker()
users2 = [(fake2.name(), fake2.email()) for _ in range(3)]
print("\nRun 2 (same seed):")
for name, email in users2:
print(f" {name}: {email}")
Output:
Run 1:
Lucy Cummings: pbrown@example.com
Joshua Wood: william44@example.org
Rebecca Ryan: pgriffin@example.com
Run 2 (same seed):
Lucy Cummings: pbrown@example.com
Joshua Wood: william44@example.org
Rebecca Ryan: pgriffin@example.com
The global Faker.seed() affects all Faker instances. For isolated tests, use fake = Faker(); fake.seed_instance(42) instead — this seeds only that specific instance, leaving other instances unaffected. Always add seeding to your setUp() method in unit tests that use Faker to ensure reproducible results.
Custom Providers
When built-in providers are not enough, create your own by subclassing BaseProvider:
# faker_custom_provider.py
from faker import Faker
from faker.providers import BaseProvider
import random
class PythonLibraryProvider(BaseProvider):
LIBRARIES = [
"requests", "pandas", "numpy", "flask", "fastapi",
"sqlalchemy", "celery", "pydantic", "attrs", "click",
"rich", "typer", "httpx", "loguru", "pytest",
]
VERSIONS = ["1.0.0", "1.2.3", "2.0.1", "3.1.0", "0.9.8", "4.0.0"]
def python_library(self):
return self.random_element(self.LIBRARIES)
def package_version(self):
return self.random_element(self.VERSIONS)
def requirements_entry(self):
lib = self.python_library()
ver = self.package_version()
op = self.random_element(["==", ">=", "~="])
return f"{lib}{op}{ver}"
fake = Faker()
fake.add_provider(PythonLibraryProvider)
print("Sample requirements.txt:")
for _ in range(6):
print(fake.requirements_entry())
Output:
Sample requirements.txt:
requests==2.0.1
pandas>=1.2.3
sqlalchemy~=3.1.0
fastapi==0.9.8
rich>=4.0.0
httpx==1.0.0
Custom providers follow the same pattern as built-in ones: subclass BaseProvider, define methods that use self.random_element() or other helper methods, then add the provider with fake.add_provider(YourProvider). This is the right pattern for domain-specific data like product SKUs, medical record IDs, airline codes, or any structured string format specific to your application.
Real-Life Example: Seeding a SQLite Test Database
Here is a complete database seeder that populates a SQLite database with 100 realistic users and orders for testing:
# seed_database.py
from faker import Faker
import sqlite3
from datetime import datetime
def create_tables(conn):
conn.executescript("""
CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
phone TEXT,
city TEXT,
country TEXT,
created_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS orders (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL,
product TEXT NOT NULL,
amount REAL NOT NULL,
status TEXT NOT NULL,
ordered_at TEXT NOT NULL,
FOREIGN KEY (user_id) REFERENCES users(id)
);
""")
def seed_users(conn, fake, n=50):
users = []
emails = set()
while len(users) < n:
email = fake.safe_email()
if email not in emails:
emails.add(email)
users.append((
fake.name(),
email,
fake.phone_number()[:20],
fake.city(),
fake.country()[:50],
fake.date_time_this_year().isoformat(),
))
conn.executemany(
"INSERT INTO users (name, email, phone, city, country, created_at) VALUES (?,?,?,?,?,?)",
users
)
return len(users)
def seed_orders(conn, fake, n=200):
products = ["Python Course", "VS Code Theme", "API Access", "Pro License", "Support Plan"]
statuses = ["pending", "processing", "shipped", "delivered", "cancelled"]
user_ids = [row[0] for row in conn.execute("SELECT id FROM users").fetchall()]
orders = [(
fake.random_element(user_ids),
fake.random_element(products),
round(fake.random.uniform(9.99, 299.99), 2),
fake.random_element(statuses),
fake.date_time_this_year().isoformat(),
) for _ in range(n)]
conn.executemany(
"INSERT INTO orders (user_id, product, amount, status, ordered_at) VALUES (?,?,?,?,?)",
orders
)
return len(orders)
# Main seeding script
Faker.seed(0)
fake = Faker("en_US")
conn = sqlite3.connect(":memory:")
create_tables(conn)
user_count = seed_users(conn, fake, n=50)
order_count = seed_orders(conn, fake, n=200)
conn.commit()
# Verify and display sample
print(f"Seeded: {user_count} users, {order_count} orders\n")
print("Sample users:")
for row in conn.execute("SELECT name, email, city FROM users LIMIT 3").fetchall():
print(f" {row[0]:<25} {row[1]:<35} {row[2]}")
print("\nRevenue by status:")
for row in conn.execute(
"SELECT status, COUNT(*) as cnt, ROUND(SUM(amount),2) as total FROM orders GROUP BY status ORDER BY total DESC"
).fetchall():
print(f" {row[0]:<12} {row[1]:>4} orders ${row[2]:>8,.2f}")
conn.close()
Output:
Seeded: 50 users, 200 orders
Sample users:
Jennifer Martinez jmartinez@example.com Portland
Robert Chen robert.chen42@example.net Seattle
Sandra Williams s.williams@example.org Denver
Revenue by status:
delivered 42 orders $ 5,847.23
shipped 39 orders $ 5,234.89
processing 41 orders $ 4,923.47
pending 38 orders $ 4,712.34
cancelled 40 orders $ 4,901.23
The unique email constraint in the database is why we track already-generated emails in a set and keep generating until we have enough unique ones. This pattern — generate, deduplicate, retry — is the right approach whenever uniqueness is required. The Faker.seed(0) call guarantees the same seeded data every run, which is critical for test reproducibility.
Frequently Asked Questions
What is the difference between fake.email() and fake.safe_email()?
fake.email() generates addresses with realistic-looking domains that may or may not exist as real domains. fake.safe_email() always uses example.com, example.net, or example.org — domains permanently reserved for documentation and testing by RFC 2606. Use safe_email() in any context where generated emails might accidentally be sent to real recipients.
How do I generate unique values (no duplicates)?
Use fake.unique: fake.unique.email() guarantees each call returns a different value. The unique proxy tracks previously generated values and retries until it finds a new one. Clear the unique tracker with fake.unique.clear() to reset it. Note that if you request more unique values than Faker can generate for a given method, it will raise an UniquenessException.
Can I generate data in a specific format?
Yes — several methods accept format parameters. fake.date(pattern="%d/%m/%Y") formats dates using strftime patterns. fake.bothify(text="??-###") generates strings where ? is replaced with a random letter and # with a random digit. fake.numerify(text="SKU-#####") replaces # with digits. These are useful for generating IDs, product codes, or any structured string format.
How do I seed Faker for individual unit tests?
Use fake.seed_instance(seed_value) rather than the global Faker.seed(). Call this in your test’s setUp method with a fixed value. This seeds only that Faker instance, so parallel tests using their own instances do not interfere with each other. In pytest, create a fixture that returns a seeded Faker instance.
How do I generate large amounts of data efficiently?
For bulk generation (100,000+ records), avoid calling Faker methods in a tight loop that also writes to a database one row at a time. Instead, generate all records into a list first, then use bulk insert: conn.executemany(sql, list_of_tuples). For extreme volume, generate to CSV and load with COPY or LOAD DATA INFILE. Faker itself is fast enough for millions of records — the bottleneck is usually I/O.
Conclusion
Faker transforms the chore of creating test data into a one-liner. The key patterns: use Faker.seed() for reproducible test fixtures; use fake.safe_email() instead of fake.email() for safety; use fake.unique.method() for uniqueness constraints; and build custom providers for domain-specific data formats that Faker does not cover.
The database seeder example demonstrates the complete workflow: seed for reproducibility, generate bulk data into lists, use executemany for efficient bulk inserts, and verify with queries. Extend it by adding the de_DE or ja_JP locale to simulate international users, or add a custom provider for your application’s specific data types.
For the full list of providers, locales, and methods, see the Faker documentation and the GitHub repository.