How To Deploy a Python App to AWS Lambda

How To Deploy a Python App to AWS Lambda

Skill Level: Intermediate

Serverless Computing Meets Python

AWS Lambda has fundamentally changed how developers deploy backend applications. Instead of managing servers, worrying about scaling, or paying for idle compute time, you write functions and let AWS handle the rest. In this guide, you’ll learn how to take a Python application, package it with dependencies, and deploy it to Lambda where it’ll handle thousands of concurrent requests without you touching a single EC2 instance.

Don’t let the “serverless” terminology intimidate you. You’re still writing Python code–the infrastructure complexity is just abstracted away. We’ll walk through every step from local development to production deployment, and you’ll understand exactly what’s happening at each stage. By the end, you’ll have a working Lambda function integrated with API Gateway, ready to scale infinitely.

Here’s what we’re covering: setting up the AWS CLI, creating a Lambda handler function, managing dependencies with zip files and Lambda Layers, deploying via the command line, integrating with API Gateway for HTTP endpoints, configuring environment variables, and building a real-world URL shortener service. You’ll also see common pitfalls and how to avoid them.

Quick Example: Your First Lambda Function

Let’s skip the theory for a moment and get something working in five minutes. This is your first Lambda function:

# lambda_handler.py
import json
from datetime import datetime

def handler(event, context):
    """Simple Lambda handler that returns current time and a message"""
    return {
        'statusCode': 200,
        'body': json.dumps({
            'message': 'Hello from Lambda',
            'timestamp': datetime.now().isoformat()
        })
    }

Output:

{
  "statusCode": 200,
  "body": "{\"message\": \"Hello from Lambda\", \"timestamp\": \"2026-04-08T14:30:45.123456\"}"
}

That’s it. This function will run on Lambda, handles HTTP requests through API Gateway, and costs you nothing until someone actually invokes it. The event parameter contains request data, and context provides runtime information like request ID and remaining execution time.

Sudo Sam watching CloudWatch logs stream on monitors
Watching logs stream in real-time beats traditional server SSH sessions any day.

Understanding AWS Lambda Architecture

Lambda is Amazon’s serverless compute service. You upload code, set memory/timeout limits, and AWS auto-scales based on incoming requests. You pay only for execution time in 1ms increments. Unlike EC2 where you provision instances all day, Lambda spins up and down instantly.

The fundamental unit is the handler function. AWS calls this function whenever an event triggers it–could be an HTTP request through API Gateway, an S3 upload, a scheduled CloudWatch event, or an SQS message. Your function receives the event details and must return a response.

Aspect Lambda Traditional EC2 Containerized Services
Scaling Automatic, instant Manual or ASG Manual or orchestrator
Cold Starts 50-500ms first call None Can be optimized
Pricing Per invocation + duration Per instance-hour Per container hour
Management Code only Full OS control Container image
Ideal Use Bursty traffic, microservices Consistent workloads Complex deployments

Setting Up AWS CLI and Credentials

Before deploying anything, you need the AWS CLI configured with credentials. Install it via pip if you haven’t already, then create an IAM user with Lambda deployment permissions.

# terminal
pip install awscli --upgrade
aws --version

Output:

aws-cli/2.15.0 Python/3.11.7 Linux/6.1.0-20 botocore/2.32.1

Now configure credentials. Generate an access key in the AWS IAM console, then run:

# terminal
aws configure

You’ll be prompted for Access Key ID and Secret Access Key. Store them securely–never commit them to version control. The AWS CLI stores them in ~/.aws/credentials.

For deployment, your IAM user needs these permissions: lambda:CreateFunction, lambda:UpdateFunctionCode, iam:PassRole, apigateway:*. Create an inline policy or use the AWSLambdaFullAccess managed policy for development.

Stack Trace Steve reviewing IAM permissions disapprovingly
Granting overly broad permissions is how side projects become expensive side projects.

Creating Your Lambda Handler Function

A Lambda handler is any Python function that AWS Lambda invokes. The signature must accept two parameters: event (contains request data) and context (runtime metadata).

# app.py
import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    """
    Main Lambda handler function

    Args:
        event: dict containing request data from trigger source
        context: LambdaContext object with runtime info

    Returns:
        dict with statusCode and body for API Gateway
    """
    try:
        logger.info(f"Received event: {json.dumps(event)}")

        # Extract query parameters or body
        body = event.get('body', '{}')
        if isinstance(body, str):
            body = json.loads(body)

        name = body.get('name', 'World')

        response_data = {
            'message': f'Hello, {name}!',
            'request_id': context.request_id,
            'function_name': context.function_name
        }

        return {
            'statusCode': 200,
            'headers': {'Content-Type': 'application/json'},
            'body': json.dumps(response_data)
        }

    except Exception as e:
        logger.error(f"Error: {str(e)}")
        return {
            'statusCode': 500,
            'body': json.dumps({'error': 'Internal server error'})
        }

Output:

{
  "statusCode": 200,
  "headers": {"Content-Type": "application/json"},
  "body": "{\"message\": \"Hello, Alice!\", \"request_id\": \"12345-abcde\", \"function_name\": \"my-python-app\"}"
}

Notice the structure: we return a dict with statusCode, headers, and body. This format is specifically for API Gateway integration. The context object provides metadata like execution duration, memory limits, and the current request ID for logging.

Packaging Dependencies for Lambda

Lambda has a 50MB limit for unzipped deployment packages. Installing dependencies locally and zipping them is the standard approach. Many libraries like requests, numpy, and psycopg2 have compiled C extensions that must match Lambda’s Linux environment.

# terminal
mkdir lambda_package
cd lambda_package

pip install -r requirements.txt -t .

cat > requirements.txt << 'EOF'
requests==2.31.0
python-dateutil==2.8.2
boto3==1.28.0
EOF

zip -r function.zip .

Output:

  adding: app.py (deflated 45%)
  adding: requests/ (stored 0%)
  adding: requests/__init__.py (deflated 52%)
  ...
  adding: botocore/data/sts/2011-06-15/service-2.json (deflated 78%)
     31 files, 8.4 MB compressed into 2.1 MB

The key is installing dependencies with the -t flag, which places them in the current directory. Lambda's runtime will find them automatically when you import them. For larger packages (NumPy, TensorFlow), consider using Lambda Layers which support up to 5 layers of 50MB each.

Loop Larry shocked at growing zip file package size
Is your zip file under 50MB? You haven't installed pandas yet, have you.

Deploying via AWS CLI

With your code zipped and dependencies included, deploy it. First, create an IAM role that Lambda can assume.

# terminal
# Create the trust policy
cat > trust-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

# Create the role
aws iam create-role \\
  --role-name lambda-execution-role \\
  --assume-role-policy-document file://trust-policy.json

# Attach basic execution policy for CloudWatch logs
aws iam attach-role-policy \\
  --role-name lambda-execution-role \\
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Output:

{
    "Role": {
        "RoleName": "lambda-execution-role",
        "Arn": "arn:aws:iam::123456789012:role/lambda-execution-role",
        "Path": "/",
        "CreateDate": "2026-04-08T10:30:00+00:00"
    }
}

Now deploy the function:

# terminal
aws lambda create-function \\
  --function-name my-python-app \\
  --runtime python3.11 \\
  --role arn:aws:iam::123456789012:role/lambda-execution-role \\
  --handler app.lambda_handler \\
  --zip-file fileb://function.zip \\
  --timeout 30 \\
  --memory-size 256

Output:

{
    "FunctionName": "my-python-app",
    "FunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:my-python-app",
    "Runtime": "python3.11",
    "Handler": "app.lambda_handler",
    "CodeSize": 2157812,
    "MemorySize": 256,
    "Timeout": 30,
    "LastModified": "2026-04-08T10:35:22.000+0000"
}

The --handler parameter points to your function: filename.function_name. Updates are just as easy:

# terminal
aws lambda update-function-code \\
  --function-name my-python-app \\
  --zip-file fileb://function.zip

Integrating with API Gateway

Raw Lambda invocations are fine for internal triggers, but to handle HTTP requests, you need API Gateway. This acts as the front door, converting HTTP requests into Lambda events.

# terminal
# Create REST API
API_ID=$(aws apigateway create-rest-api \\
  --name my-python-app-api \\
  --description "API for Python Lambda app" \\
  --query 'id' --output text)

echo "API ID: $API_ID"

# Get root resource
ROOT_ID=$(aws apigateway get-resources \\
  --rest-api-id $API_ID \\
  --query 'items[0].id' --output text)

# Create resource
RESOURCE_ID=$(aws apigateway create-resource \\
  --rest-api-id $API_ID \\
  --parent-id $ROOT_ID \\
  --path-part "greet" \\
  --query 'id' --output text)

# Create POST method
aws apigateway put-method \\
  --rest-api-id $API_ID \\
  --resource-id $RESOURCE_ID \\
  --http-method POST \\
  --authorization-type NONE

Then grant API Gateway permission to invoke Lambda:

# terminal
aws lambda add-permission \\
  --function-name my-python-app \\
  --statement-id AllowAPIGatewayInvoke \\
  --action lambda:InvokeFunction \\
  --principal apigateway.amazonaws.com

Wire the API to Lambda and deploy:

# terminal
# Set Lambda as integration
aws apigateway put-integration \\
  --rest-api-id $API_ID \\
  --resource-id $RESOURCE_ID \\
  --http-method POST \\
  --type AWS_PROXY \\
  --integration-http-method POST \\
  --uri arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:123456789012:function:my-python-app/invocations

# Deploy the API
aws apigateway create-deployment \\
  --rest-api-id $API_ID \\
  --stage-name prod

Output:

{
    "id": "abc123def456",
    "createdDate": "2026-04-08T11:00:00+00:00"
}

Your API endpoint is now available at https://{api-id}.execute-api.us-east-1.amazonaws.com/prod/greet. Send a POST request with JSON body:

# terminal
curl -X POST https://abc123.execute-api.us-east-1.amazonaws.com/prod/greet \\
  -H "Content-Type: application/json" \\
  -d '{"name": "Alice"}'

Output:

{"message":"Hello, Alice!","request_id":"req-12345","function_name":"my-python-app"}
Pyro Pete celebrating API Gateway response
That moment when your Lambda function actually responds over the internet.

Managing Environment Variables and Secrets

Never hardcode API keys or database passwords. Lambda supports environment variables for configuration.

# terminal
aws lambda update-function-configuration \\
  --function-name my-python-app \\
  --environment Variables="{ENVIRONMENT=production,LOG_LEVEL=INFO,DATABASE_HOST=mydb.us-east-1.rds.amazonaws.com}"

Access them in your code:

# app.py
import os
import json

def lambda_handler(event, context):
    env = os.getenv('ENVIRONMENT', 'development')
    log_level = os.getenv('LOG_LEVEL', 'INFO')
    db_host = os.getenv('DATABASE_HOST')

    return {
        'statusCode': 200,
        'body': json.dumps({
            'environment': env,
            'db_host': db_host
        })
    }

For sensitive data, use AWS Secrets Manager or Systems Manager Parameter Store instead of plaintext environment variables:

# app.py
import json
import boto3

secrets_client = boto3.client('secretsmanager')

def get_database_password():
    """Retrieve password from Secrets Manager"""
    try:
        response = secrets_client.get_secret_value(
            SecretId='prod/database/password'
        )
        return json.loads(response['SecretString'])['password']
    except Exception as e:
        print(f"Error retrieving secret: {e}")
        raise

def lambda_handler(event, context):
    db_password = get_database_password()
    # Use password safely
    return {'statusCode': 200, 'body': 'OK'}

Using Lambda Layers for Shared Dependencies

If you have multiple Lambda functions sharing libraries, Lambda Layers avoid duplication. A layer is a zip file containing code or libraries that all your functions can access.

# terminal
mkdir -p lambda_layer/python/lib/python3.11/site-packages

pip install requests numpy -t lambda_layer/python/lib/python3.11/site-packages/

cd lambda_layer
zip -r requests_numpy_layer.zip python
aws lambda publish-layer-version \\
  --layer-name shared-dependencies \\
  --zip-file fileb://requests_numpy_layer.zip \\
  --compatible-runtimes python3.11

Output:

{
    "LayerVersionArn": "arn:aws:lambda:us-east-1:123456789012:layer:shared-dependencies:1",
    "Version": 1,
    "CodeSize": 12457283
}

Now attach this layer to your function:

# terminal
aws lambda update-function-configuration \\
  --function-name my-python-app \\
  --layers arn:aws:lambda:us-east-1:123456789012:layer:shared-dependencies:1

Your function immediately gains access to requests and numpy without bundling them in your deployment package.

Cache Katie comparing package sizes with Lambda Layers
Layers are how you stop submitting 47MB zip files that contain the same pandas library across five functions.

Real-World Example: Serverless URL Shortener

Let's build a practical service that shortens URLs and redirects them. It uses DynamoDB for storage and API Gateway for HTTP endpoints.

# url_shortener.py
import json
import uuid
import boto3
import logging
from datetime import datetime, timedelta

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('url-mappings')
logger = logging.getLogger()

def generate_short_code(length=6):
    """Generate a random short code"""
    return str(uuid.uuid4())[:length]

def create_short_url(original_url, custom_alias=None):
    """Store mapping and return short code"""
    short_code = custom_alias or generate_short_code()

    try:
        table.put_item(
            Item={
                'short_code': short_code,
                'original_url': original_url,
                'created_at': datetime.now().isoformat(),
                'expires_at': (datetime.now() + timedelta(days=365)).isoformat(),
                'click_count': 0
            },
            ConditionExpression='attribute_not_exists(short_code)'
        )
        return short_code
    except Exception as e:
        logger.error(f"Error creating mapping: {e}")
        raise

def get_redirect_url(short_code):
    """Retrieve original URL and increment click count"""
    try:
        response = table.get_item(Key={'short_code': short_code})

        if 'Item' not in response:
            return None

        item = response['Item']

        # Increment click counter
        table.update_item(
            Key={'short_code': short_code},
            UpdateExpression='SET click_count = click_count + :inc',
            ExpressionAttributeValues={':inc': 1}
        )

        return item['original_url']
    except Exception as e:
        logger.error(f"Error retrieving URL: {e}")
        return None

def lambda_handler(event, context):
    """Handle shorten and redirect requests"""
    path = event.get('path', '/')
    method = event.get('httpMethod', 'GET')

    try:
        if path == '/shorten' and method == 'POST':
            body = json.loads(event.get('body', '{}'))
            original_url = body.get('url')
            custom_alias = body.get('alias')

            if not original_url:
                return {
                    'statusCode': 400,
                    'body': json.dumps({'error': 'URL is required'})
                }

            short_code = create_short_url(original_url, custom_alias)
            short_url = f"https://short.example.com/{short_code}"

            return {
                'statusCode': 201,
                'body': json.dumps({
                    'short_url': short_url,
                    'short_code': short_code
                })
            }

        elif path.startswith('/') and method == 'GET':
            short_code = path.lstrip('/')

            if not short_code:
                return {
                    'statusCode': 400,
                    'body': json.dumps({'error': 'Short code required'})
                }

            original_url = get_redirect_url(short_code)

            if not original_url:
                return {
                    'statusCode': 404,
                    'body': json.dumps({'error': 'Short URL not found'})
                }

            return {
                'statusCode': 301,
                'headers': {'Location': original_url}
            }

        else:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'Invalid request'})
            }

    except Exception as e:
        logger.error(f"Handler error: {e}")
        return {
            'statusCode': 500,
            'body': json.dumps({'error': 'Internal server error'})
        }

Test Requests:

# Create short URL
curl -X POST https://api.example.com/shorten \\
  -H "Content-Type: application/json" \\
  -d '{"url": "https://www.example.com/very/long/article/path", "alias": "article123"}'

# Response
{"short_url":"https://short.example.com/article123","short_code":"article123"}

# Redirect
curl -L https://short.example.com/article123
# Follows 301 redirect to original URL

This example demonstrates several key Lambda patterns: DynamoDB integration, handling multiple HTTP methods, conditional writes to prevent duplicates, and atomic operations for click counting. Deploy it by creating a DynamoDB table first, then zipping the code with boto3 as a dependency.

API Alice pointing at analytics dashboard with URL metrics
Scaling a URL shortener to handle millions of clicks without touching a server feels like cheating.

Frequently Asked Questions

What causes cold starts and can I eliminate them?

Cold starts occur when Lambda initializes a new execution environment, adding 50-500ms latency. They happen when no warm containers are available. Provisioned Concurrency eliminates cold starts by keeping instances warm, but costs extra. Alternatively, keep your functions small and fast--warm starts (reusing existing containers) are virtually free.

Can I test Lambda functions locally?

Use the AWS SAM CLI (Serverless Application Model) to run functions locally with Lambda emulation. Install it, then run sam local start-api to test API Gateway integration. You can also invoke functions directly with sam local invoke. It's not 100% identical to production AWS Lambda, but it's close enough for development.

How do I handle long-running tasks in Lambda?

Lambda has a 15-minute timeout maximum. For longer tasks, decouple using SQS or SNS: receive a quick acknowledgment, queue the work, then process asynchronously. Or run your code on EC2 or ECS triggered by Lambda. For data processing, consider AWS Batch or Step Functions for orchestration.

What's the difference between Lambda and containers?

Lambda is fully managed--you upload code and don't worry about infrastructure. Containers (ECS/EKS) give you more control but require managing clusters. Lambda scales infinitely and automatically; containers require capacity planning. Use Lambda for bursty microservices; use containers for consistent workloads or when you need specific OS-level control.

How do I debug Lambda functions in production?

CloudWatch Logs capture all print statements and exceptions. Use aws logs tail /aws/lambda/my-function --follow to stream logs. Add structured logging with JSON output for easier parsing. Lambda Insights provides additional metrics and performance analysis. X-Ray integrates with Lambda to trace requests across services.

Can I use async/await with Lambda?

Yes, Python async/await works fine in Lambda. However, ensure your handler function itself is not async (Lambda doesn't await it). Call async functions using asyncio.run(). For truly asynchronous patterns, use SQS with Lambda's batch message processor or invoke Lambda asynchronously with InvocationType=Event.

Wrapping Up

Deploying Python to AWS Lambda eliminates infrastructure headaches. You've learned the complete pipeline: writing handlers, packaging dependencies, using Lambda Layers to share code, integrating with API Gateway for HTTP endpoints, managing secrets securely, and building real-world services like a URL shortener.

The serverless model isn't suitable for every workload--continuous background services are cheaper on EC2, and processes exceeding 15 minutes require different architectures. But for APIs, webhooks, microservices, and event-driven workflows, Lambda is unbeatable for simplicity and cost efficiency.

Next steps: explore Lambda@Edge for CDN functions, Step Functions for orchestrating multi-function workflows, and EventBridge for decoupling event sources. Check the official AWS Lambda documentation at docs.aws.amazon.com/lambda and the Boto3 documentation for Python SDK reference.

Official Resources

How To Dockerize a Python Application

How To Dockerize a Python Application

Intermediate

You have built a Python application that works perfectly on your machine. Then you deploy it to a server, and everything breaks — different Python version, missing system libraries, conflicting dependencies. This scenario plays out daily across development teams worldwide, and Docker solves it completely. By packaging your application with its exact runtime environment, Docker guarantees that what works on your laptop works identically in production.

Docker is free, runs on all major operating systems, and requires no special Python knowledge beyond what you already have. You will need Docker Desktop installed on your machine (available from docs.docker.com), and a basic Python application to containerize. If you do not have one, we will create a simple Flask app from scratch in this tutorial.

In this article, you will learn how to write a Dockerfile for Python applications, build and run Docker images, use multi-stage builds to keep images small, manage dependencies properly, set up Docker Compose for multi-container apps, and follow production-ready best practices. By the end, you will be able to containerize any Python project with confidence.

Dockerizing a Python App: Quick Example

Before we dive deep, here is the fastest way to containerize a Python script. Create these two files in an empty directory:

# app.py
from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello():
    return {"message": "Hello from Docker!", "status": "running"}

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)
# Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["python", "app.py"]
# requirements.txt
flask==3.0.0

Now build and run it with two commands:

# terminal commands
docker build -t my-python-app .
docker run -p 5000:5000 my-python-app

Output:

 * Serving Flask app 'app'
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://172.17.0.2:5000

Visit http://localhost:5000 in your browser and you will see {"message": "Hello from Docker!", "status": "running"}. That is your Python app running inside a container — isolated, reproducible, and ready to deploy anywhere Docker runs. The rest of this article explains every piece in detail and shows you how to handle real-world scenarios.

What Is Docker and Why Use It for Python?

Docker is a platform that packages applications into lightweight, portable containers. A container includes your code, Python runtime, system libraries, and dependencies — everything needed to run your app. Unlike virtual machines, containers share the host OS kernel, making them start in seconds and use minimal resources.

For Python developers specifically, Docker solves several painful problems. It eliminates “works on my machine” issues by ensuring identical environments everywhere. It prevents dependency conflicts between projects without needing virtual environments on the host. It makes deployment as simple as shipping a single image file. And it lets you run different Python versions side by side without pyenv or system-level changes.

Docker vs virtual environments comparison for Python
Works on my machine. Ships on every machine.
ConceptDockerVirtual Environment (venv)
Isolates Python packagesYesYes
Isolates system librariesYesNo
Isolates Python versionYesNo
Isolates OSYesNo
Portable across machinesYesNo
Adds overheadMinimal (~50MB base)None
Learning curveModerateLow

Think of Docker as a virtual environment on steroids — it does not just isolate your pip packages, it isolates the entire operating system layer. This makes it the standard tool for deploying Python applications in production.

Understanding Dockerfiles: Line by Line

A Dockerfile is a text file with instructions that tell Docker how to build your image. Each instruction creates a layer, and Docker caches these layers to speed up subsequent builds. Let us break down every line from our quick example:

# Dockerfile
FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 5000

CMD ["python", "app.py"]

Line-by-line explanation:

FROM python:3.12-slim — This sets the base image. The slim variant includes Python and minimal system packages, keeping your image small (~150MB vs ~900MB for the full image). Always pin your Python version to avoid surprises when a new release comes out.

WORKDIR /app — Sets the working directory inside the container. All subsequent commands run from this path. If the directory does not exist, Docker creates it.

COPY requirements.txt . — Copies only the requirements file first. This is a deliberate optimization — Docker caches each layer, so if your requirements have not changed, it skips the pip install step on rebuild.

RUN pip install --no-cache-dir -r requirements.txt — Installs dependencies. The --no-cache-dir flag prevents pip from storing downloaded packages in the cache, reducing image size.

COPY . . — Copies the rest of your application code. This comes after pip install so that code changes do not trigger a full dependency reinstall.

EXPOSE 5000 — Documents which port the container listens on. This does not actually publish the port — you still need -p 5000:5000 when running the container.

CMD ["python", "app.py"] — The default command that runs when the container starts. Use the exec form (JSON array) rather than shell form for proper signal handling.

Understanding Dockerfile layers and build caching
Layer by layer, cache hit by cache hit — that is how fast Docker builds are made.

Choosing the Right Python Base Image

The base image you choose significantly affects your container’s size, security, and compatibility. Python offers several official variants on Docker Hub:

Image TagSizeIncludesBest For
python:3.12~900MBFull Debian + build toolsApps needing C compilation
python:3.12-slim~150MBMinimal DebianMost web apps (recommended)
python:3.12-alpine~50MBAlpine Linux (musl libc)Tiny images (watch for compatibility)
python:3.12-bookworm~900MBDebian Bookworm + build toolsSpecific Debian version needed

For most Python web applications, python:3.12-slim is the best starting point. It includes enough system libraries to install common packages like psycopg2-binary and Pillow without being bloated. The alpine variant looks attractive at 50MB, but it uses musl libc instead of glibc, which can cause subtle compatibility issues with some Python packages — especially those with C extensions like numpy or pandas.

When Alpine Actually Makes Sense

Alpine works well for pure-Python applications with no C extensions. If your app only uses packages like Flask, requests, and click, Alpine gives you the smallest possible image. But the moment you need numpy, pandas, or any package that compiles C code, you will spend more time fighting build issues than you save on image size.

# alpine_example.py
# This Dockerfile works great for pure-Python apps
# FROM python:3.12-alpine
# WORKDIR /app
# COPY requirements.txt .
# RUN pip install --no-cache-dir -r requirements.txt
# COPY . .
# CMD ["python", "app.py"]

# But if requirements.txt contains numpy, you need:
# RUN apk add --no-cache gcc musl-dev linux-headers
# This adds complexity and build time

Managing Dependencies in Docker

Proper dependency management is the difference between a Docker image that builds reliably and one that breaks randomly. The key principle is deterministic builds — every build should install the exact same package versions.

Pin Every Version

Never use unpinned requirements in Docker. A requirements.txt that says flask without a version will install whatever the latest version is at build time. This means your image built today might behave differently from one built tomorrow.

# requirements_pinned.py
# BAD - unpinned versions
# flask
# requests
# sqlalchemy

# GOOD - pinned versions
# flask==3.0.0
# requests==2.31.0
# sqlalchemy==2.0.23

# BEST - use pip freeze to capture exact versions
# pip freeze > requirements.txt

Output:

# Output of pip freeze (example)
blinker==1.7.0
certifi==2023.11.17
charset-normalizer==3.3.2
click==8.1.7
flask==3.0.0
idna==3.6
itsdangerous==2.1.2
Jinja2==3.1.2
MarkupSafe==2.1.3
requests==2.31.0
urllib3==2.1.0
Werkzeug==3.0.1

Running pip freeze captures every installed package including transitive dependencies. This guarantees reproducible builds. For even more control, consider using pip-compile from the pip-tools package, which generates a locked requirements file from a high-level requirements.in.

Pinning dependency versions in requirements.txt
Pin your versions, or your Friday deploy pins you to your desk.

Using .dockerignore

Just like .gitignore keeps files out of your repository, .dockerignore keeps files out of your Docker build context. Without it, Docker sends everything in your project directory to the Docker daemon, including large files that are not needed in the container.

# .dockerignore
__pycache__/
*.pyc
*.pyo
.git/
.gitignore
.env
.venv/
venv/
node_modules/
*.md
.pytest_cache/
.mypy_cache/
docker-compose*.yml
Dockerfile*
.dockerignore

This file reduces build context size and prevents sensitive files (like .env with secrets) from being copied into your image. Always create a .dockerignore file before building your first image.

Multi-Stage Builds for Smaller Images

Multi-stage builds are a Docker feature that lets you use multiple FROM statements in a single Dockerfile. This is powerful for Python because you can compile dependencies in a full build environment, then copy only the results into a slim runtime image.

# Dockerfile.multistage
# Stage 1: Build stage with full toolchain
FROM python:3.12 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2: Runtime stage with minimal image
FROM python:3.12-slim
WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /install /usr/local

# Copy application code
COPY . .

EXPOSE 5000
CMD ["python", "app.py"]

Output (comparing image sizes):

# Single-stage build
my-app-single    latest    892MB

# Multi-stage build
my-app-multi     latest    167MB

The multi-stage build produces an image that is over 5 times smaller. The first stage (builder) has gcc, make, and other build tools needed to compile C extensions. The second stage only includes the compiled packages and your code. Build tools, source files, and pip cache are all left behind in the builder stage.

This technique is especially valuable when your dependencies include packages like psycopg2 (needs libpq-dev), Pillow (needs libjpeg), or cryptography (needs OpenSSL headers). You compile in the full image and run in the slim image.

Docker Compose for Multi-Container Apps

Real applications rarely run in isolation. Your Python app probably needs a database, a cache layer, or a message queue. Docker Compose lets you define and run multi-container applications with a single YAML file.

# docker-compose.yml
version: "3.9"

services:
  web:
    build: .
    ports:
      - "5000:5000"
    environment:
      - DATABASE_URL=postgresql://user:password@db:5432/myapp
      - REDIS_URL=redis://cache:6379/0
    depends_on:
      - db
      - cache
    volumes:
      - .:/app

  db:
    image: postgres:16-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=myapp
    volumes:
      - pgdata:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  cache:
    image: redis:7-alpine
    ports:
      - "6379:6379"

volumes:
  pgdata:

Output (docker compose up):

$ docker compose up
[+] Running 3/3
 - Container myapp-db-1     Started
 - Container myapp-cache-1  Started
 - Container myapp-web-1    Started
Attaching to myapp-cache-1, myapp-db-1, myapp-web-1
myapp-db-1     | PostgreSQL init process complete; ready for start up.
myapp-cache-1  | Ready to accept connections
myapp-web-1    |  * Running on http://0.0.0.0:5000

With one docker compose up command, you get a Python web app, PostgreSQL database, and Redis cache all running together. The depends_on directive ensures the database starts before your app. The volumes section persists database data between restarts and mounts your source code for live reloading during development.

Docker Compose orchestrating multiple services
docker compose up — one command, three services, zero excuses.

Production Best Practices

Development Dockerfiles and production Dockerfiles have different priorities. In development, you want fast rebuilds and live reloading. In production, you want small images, security, and reliability.

Run as Non-Root User

By default, containers run as root. This is a security risk — if an attacker exploits your app, they have root access inside the container. Always create and switch to a non-root user:

# Dockerfile.production
FROM python:3.12-slim

# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Change ownership and switch user
RUN chown -R appuser:appuser /app
USER appuser

EXPOSE 5000
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "app:app"]

Notice we also switched from the Flask development server to Gunicorn for production. The Flask dev server is single-threaded and not designed for production traffic. Gunicorn runs multiple worker processes to handle concurrent requests.

Add Health Checks

Health checks tell Docker (and orchestrators like Kubernetes) whether your application is actually working, not just running:

# Dockerfile.healthcheck
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/health')" || exit 1

EXPOSE 5000
CMD ["python", "app.py"]

Output (docker inspect):

$ docker inspect --format='{{.State.Health.Status}}' my-container
healthy

The health check hits your /health endpoint every 30 seconds. If it fails 3 times in a row, Docker marks the container as unhealthy. Orchestrators can then automatically restart it or route traffic elsewhere.

Handle Secrets with Environment Variables

Never bake secrets into your Docker image. Anyone who pulls your image can extract them. Use environment variables instead:

# config.py
import os

DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///local.db")
SECRET_KEY = os.environ.get("SECRET_KEY", "dev-only-secret")
DEBUG = os.environ.get("DEBUG", "false").lower() == "true"

print(f"Database: {DATABASE_URL.split('@')[-1] if '@' in DATABASE_URL else DATABASE_URL}")
print(f"Debug mode: {DEBUG}")

Output:

Database: db:5432/myapp
Debug mode: False

Pass environment variables at runtime with docker run -e SECRET_KEY=mysecret or through Docker Compose’s environment section. For sensitive values in production, use Docker secrets or your cloud provider’s secrets manager.

Real-Life Example: Dockerized Task Tracker API

Deploying Dockerized Python app to production
From localhost to the cloud in one docker push.

Let us build a complete, production-ready Dockerized application — a task tracker API with Flask, SQLite, and proper project structure:

# task_tracker.py
from flask import Flask, request, jsonify
import sqlite3
import os
from datetime import datetime

app = Flask(__name__)
DB_PATH = os.environ.get("DB_PATH", "/app/data/tasks.db")


def get_db():
    os.makedirs(os.path.dirname(DB_PATH), exist_ok=True)
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    return conn


def init_db():
    db = get_db()
    db.execute("""
        CREATE TABLE IF NOT EXISTS tasks (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            title TEXT NOT NULL,
            description TEXT DEFAULT '',
            completed BOOLEAN DEFAULT 0,
            created_at TEXT DEFAULT CURRENT_TIMESTAMP
        )
    """)
    db.commit()
    db.close()


@app.route("/health")
def health():
    return {"status": "healthy", "timestamp": datetime.now().isoformat()}


@app.route("/tasks", methods=["GET"])
def list_tasks():
    db = get_db()
    tasks = db.execute("SELECT * FROM tasks ORDER BY created_at DESC").fetchall()
    db.close()
    return jsonify([dict(t) for t in tasks])


@app.route("/tasks", methods=["POST"])
def create_task():
    data = request.get_json()
    if not data or not data.get("title"):
        return {"error": "Title is required"}, 400

    db = get_db()
    cursor = db.execute(
        "INSERT INTO tasks (title, description) VALUES (?, ?)",
        (data["title"], data.get("description", ""))
    )
    db.commit()
    task_id = cursor.lastrowid
    task = db.execute("SELECT * FROM tasks WHERE id = ?", (task_id,)).fetchone()
    db.close()
    return jsonify(dict(task)), 201


@app.route("/tasks/<int:task_id>/complete", methods=["PATCH"])
def complete_task(task_id):
    db = get_db()
    db.execute("UPDATE tasks SET completed = 1 WHERE id = ?", (task_id,))
    db.commit()
    task = db.execute("SELECT * FROM tasks WHERE id = ?", (task_id,)).fetchone()
    db.close()
    if task is None:
        return {"error": "Task not found"}, 404
    return jsonify(dict(task))


init_db()

if __name__ == "__main__":
    port = int(os.environ.get("PORT", 5000))
    app.run(host="0.0.0.0", port=port, debug=os.environ.get("DEBUG") == "true")

Output (testing the API):

$ curl -X POST http://localhost:5000/tasks \
    -H "Content-Type: application/json" \
    -d '{"title": "Learn Docker", "description": "Complete the tutorial"}'
{
  "id": 1,
  "title": "Learn Docker",
  "description": "Complete the tutorial",
  "completed": 0,
  "created_at": "2026-04-08 10:30:00"
}

$ curl http://localhost:5000/tasks
[
  {
    "id": 1,
    "title": "Learn Docker",
    "description": "Complete the tutorial",
    "completed": 0,
    "created_at": "2026-04-08 10:30:00"
  }
]

$ curl http://localhost:5000/health
{"status": "healthy", "timestamp": "2026-04-08T10:30:15.123456"}

This application demonstrates several production patterns: health check endpoint for container orchestration, environment variable configuration, proper database initialization, input validation, and error handling. The Docker setup uses volumes to persist the SQLite database and a non-root user for security. You can extend this by adding authentication, switching to PostgreSQL via Docker Compose, or deploying to a cloud container service.

Essential Docker Commands Reference

CommandWhat It Does
docker build -t name .Build image from Dockerfile in current directory
docker run -p 5000:5000 nameRun container, map port 5000
docker run -d nameRun container in background (detached)
docker psList running containers
docker logs container_idView container logs
docker exec -it container_id bashOpen shell inside running container
docker stop container_idStop a running container
docker imagesList all local images
docker system pruneRemove unused images, containers, networks
docker compose upStart all services in docker-compose.yml
docker compose downStop and remove all services

Frequently Asked Questions

Do I still need a virtual environment inside Docker?

No. The Docker container itself provides isolation, so there is no risk of conflicting with system Python or other projects. Some teams still use venv inside Docker for consistency with their local development workflow, but it adds no practical benefit. Skip it and install directly with pip install in your Dockerfile.

Why is my Docker build so slow?

The most common cause is poor layer caching. If you COPY . . before RUN pip install, any code change invalidates the pip install cache. Always copy requirements.txt first and install dependencies before copying the rest of your code. Also check that your .dockerignore excludes large directories like .git/, node_modules/, and __pycache__/.

Should I use Alpine Linux for my Python Docker image?

Only if your app uses pure-Python packages with no C extensions. Alpine uses musl libc instead of glibc, which causes build failures and subtle runtime issues with packages like numpy, pandas, and psycopg2. The slim variant is only ~100MB larger than Alpine and avoids these compatibility headaches entirely.

How do I reduce my Docker image size?

Use multi-stage builds to separate build tools from your runtime image. Use python:3.12-slim as your runtime base. Add --no-cache-dir to pip install commands. Create a thorough .dockerignore file. Remove unnecessary system packages with apt-get clean and rm -rf /var/lib/apt/lists/* after installing system dependencies.

What is the difference between Dockerfile and docker-compose.yml?

A Dockerfile defines how to build a single image — what base to start from, what to install, what to copy, and what command to run. Docker Compose defines how to run multiple containers together — which images to use, how they connect, what ports to expose, and what volumes to mount. You need a Dockerfile for each custom service and a docker-compose.yml to orchestrate them all.

How do I get hot-reload working in Docker during development?

Mount your source code as a volume so changes on your host are reflected inside the container immediately. In your docker-compose.yml, add volumes: [".:/app"] under your web service. Then run Flask with debug=True or use a tool like watchdog to restart on file changes. This gives you the fast feedback loop of local development with the consistency of Docker.

Conclusion

You have learned how to containerize Python applications with Docker from scratch. We covered writing Dockerfiles with proper layer caching, choosing the right base image, pinning dependencies for reproducible builds, using multi-stage builds to shrink image size, orchestrating multi-container apps with Docker Compose, and following production best practices like non-root users and health checks.

Try extending the task tracker example by adding PostgreSQL via Docker Compose, or deploy it to a cloud service like AWS ECS, Google Cloud Run, or Railway. The skills you have learned here apply to any Python application — from simple scripts to complex microservice architectures.

For the official Docker documentation, visit docs.docker.com. For Python-specific Docker guidance, see the official Python Docker images documentation.

How To Use Pydantic V2 for Data Validation in Python

How To Use Pydantic V2 for Data Validation in Python

Intermediate

Every Python application that receives data from the outside world — API requests, configuration files, CSV imports, form submissions — faces the same problem: how do you guarantee that the data matches what your code expects? A missing field crashes your function. A string where you expected an integer causes silent bugs. An email address without an “@” slips into your database. Manual validation with if-else chains works for one or two fields, but it does not scale.

Pydantic V2 solves this by letting you define data shapes as Python classes with type hints, then validating and converting incoming data automatically. Released in mid-2023, V2 is a complete rewrite of Pydantic with a Rust-powered core that runs 5-50x faster than V1. It is the validation engine behind FastAPI, and it works just as well standalone in any Python project. Install it with pip install pydantic.

This tutorial covers everything you need to use Pydantic V2 effectively: defining models with type annotations, using built-in validators and constraints, writing custom validation logic, working with nested models, serializing data to dictionaries and JSON, and handling validation errors gracefully. By the end, you will be able to validate any data structure your application encounters.

Pydantic Validation in 30 Seconds

Here is the smallest useful Pydantic model. It validates a user’s data and converts types automatically.

# quick_example.py
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int
    email: str

# Valid data -- works perfectly
user = User(name="Alice", age=30, email="alice@example.com")
print(user)
print(user.model_dump())

# Type coercion -- "25" becomes int 25
user2 = User(name="Bob", age="25", email="bob@example.com")
print(f"Bob's age: {user2.age} (type: {type(user2.age).__name__})")

Output:

name='Alice' age=30 email='alice@example.com'
{'name': 'Alice', 'age': 30, 'email': 'alice@example.com'}
Bob's age: 25 (type: int)

Notice that Pydantic automatically converted the string "25" to the integer 25 because the age field is typed as int. This type coercion is one of Pydantic’s most practical features — it handles the messy reality of data that comes in as strings from JSON, forms, or environment variables.

What Is Pydantic and Why Use It?

Pydantic is a data validation library that uses Python type hints to define data structures and validate them at runtime. When you create a Pydantic model instance, it checks every field against its declared type, applies any constraints you have defined, and raises a detailed error if anything is wrong.

ApproachLines of CodeType CoercionError MessagesNested Validation
Manual if-elseManyManualYou write themYou build it
dataclassesFewNoneBasic TypeErrorNone
Pydantic V2FewAutomaticDetailed, structuredBuilt-in
marshmallowModerateConfigurableDetailedBuilt-in

The key advantage of Pydantic over alternatives like dataclasses or attrs is that it validates at runtime. A dataclass with age: int happily accepts age="hello" — it only declares the type hint without enforcing it. Pydantic actually checks and converts the value, raising a ValidationError if conversion fails.

Creating Pydantic models
Twelve lines of __init__ boilerplate, or one BaseModel. Pick wisely.

Built-in Field Types and Constraints

Pydantic supports all standard Python types plus specialized types for common validation patterns. The Field function adds constraints like minimum length, numeric ranges, and regex patterns.

# field_types.py
from pydantic import BaseModel, Field, EmailStr
from typing import Optional
from datetime import datetime

class Product(BaseModel):
    name: str = Field(..., min_length=1, max_length=100)
    price: float = Field(..., gt=0, description="Price in dollars")
    quantity: int = Field(default=0, ge=0)
    sku: str = Field(..., pattern=r"^[A-Z]{2}-\d{4}$")
    description: Optional[str] = None
    created_at: datetime = Field(default_factory=datetime.now)

# Valid product
product = Product(name="Widget", price=19.99, sku="AB-1234")
print(product.model_dump())

# Invalid -- price is negative
try:
    bad = Product(name="Widget", price=-5, sku="AB-1234")
except Exception as e:
    print(f"Error: {e}")

Output:

{'name': 'Widget', 'price': 19.99, 'quantity': 0, 'sku': 'AB-1234', 'description': None, 'created_at': '2026-04-07T...'}
Error: 1 validation error for Product
price
  Input should be greater than 0 [type=greater_than, input_value=-5, input_type=int]

The Field function is where you add constraints beyond basic type checking. The ... (Ellipsis) means the field is required. The gt=0 constraint rejects zero and negative numbers. The pattern constraint validates the SKU format with a regex. All of these constraints are checked automatically when you create the model instance.

Common Pydantic Types

Pydantic provides specialized types that go beyond basic Python types. These handle common validation patterns that would otherwise require custom code.

TypeWhat It ValidatesExample
EmailStrValid email format"user@example.com"
HttpUrlValid HTTP/HTTPS URL"https://example.com"
IPvAnyAddressValid IPv4 or IPv6 address"192.168.1.1"
SecretStrString hidden in repr/logs"s3cr3t" (shows "**********")
PositiveIntInteger greater than 042
FutureDatetimeDatetime in the future"2027-01-01T00:00:00"
constrConstrained string (length, pattern)constr(min_length=3)

To use EmailStr, install the optional dependency: pip install pydantic[email]. The other types are available in the core package.

Custom Validators

When built-in constraints are not enough, Pydantic V2 provides the @field_validator and @model_validator decorators for custom validation logic.

Field Validators

# custom_validators.py
from pydantic import BaseModel, field_validator

class Registration(BaseModel):
    username: str
    password: str
    confirm_password: str

    @field_validator("username")
    @classmethod
    def username_must_be_alphanumeric(cls, v: str) -> str:
        if not v.isalnum():
            raise ValueError("Username must contain only letters and numbers")
        if len(v) < 3:
            raise ValueError("Username must be at least 3 characters")
        return v.lower()  # Normalize to lowercase

    @field_validator("password")
    @classmethod
    def password_strength(cls, v: str) -> str:
        if len(v) < 8:
            raise ValueError("Password must be at least 8 characters")
        if not any(c.isupper() for c in v):
            raise ValueError("Password must contain an uppercase letter")
        if not any(c.isdigit() for c in v):
            raise ValueError("Password must contain a digit")
        return v

# Valid registration
reg = Registration(username="Alice42", password="Secure1Pass", confirm_password="Secure1Pass")
print(f"Username: {reg.username}")

# Invalid username
try:
    Registration(username="a!", password="Secure1Pass", confirm_password="Secure1Pass")
except Exception as e:
    print(f"Error: {e}")

Output:

Username: alice42
Error: 1 validation error for Registration
username
  Value error, Username must contain only letters and numbers [type=value_error, ...]

Field validators receive the raw value and can either return a transformed value (like v.lower()) or raise a ValueError with a descriptive message. The @classmethod decorator is required in V2.

Pydantic field validators
@field_validator stamps your data with approval — or stamps it into the ground.

Model Validators

Model validators check relationships between multiple fields. Use them when validation depends on more than one field at a time.

# model_validator.py
from pydantic import BaseModel, model_validator

class DateRange(BaseModel):
    start_date: str
    end_date: str

    @model_validator(mode="after")
    def check_dates(self):
        if self.start_date >= self.end_date:
            raise ValueError("end_date must be after start_date")
        return self

class Registration(BaseModel):
    password: str
    confirm_password: str

    @model_validator(mode="after")
    def passwords_match(self):
        if self.password != self.confirm_password:
            raise ValueError("Passwords do not match")
        return self

# Valid
dates = DateRange(start_date="2026-01-01", end_date="2026-12-31")
print(f"Range: {dates.start_date} to {dates.end_date}")

# Invalid -- passwords don't match
try:
    Registration(password="Secret1Pass", confirm_password="Different1Pass")
except Exception as e:
    print(f"Error: {e}")

Output:

Range: 2026-01-01 to 2026-12-31
Error: 1 validation error for Registration
  Value error, Passwords do not match [type=value_error, ...]

The mode="after" parameter means the validator runs after individual field validation is complete, so you can safely access all fields. Use mode="before" when you need to transform the raw input data before field-level validation runs.

Nested Models

Real-world data is rarely flat. Pydantic handles nested structures by composing models inside other models. Validation cascades through every level automatically.

# nested_models.py
from pydantic import BaseModel, EmailStr
from typing import Optional

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class ContactInfo(BaseModel):
    email: str
    phone: Optional[str] = None
    address: Address

class Employee(BaseModel):
    name: str
    title: str
    department: str
    contact: ContactInfo

# Create from nested dictionaries
data = {
    "name": "Alice Johnson",
    "title": "Senior Developer",
    "department": "Engineering",
    "contact": {
        "email": "alice@company.com",
        "phone": "555-0123",
        "address": {
            "street": "123 Main St",
            "city": "Melbourne",
            "state": "VIC",
            "zip_code": "3000"
        }
    }
}

employee = Employee(**data)
print(f"Name: {employee.name}")
print(f"City: {employee.contact.address.city}")
print(f"Email: {employee.contact.email}")

Output:

Name: Alice Johnson
City: Melbourne
Email: alice@company.com

Pydantic validates every level of the nested structure. If the zip code is missing from the address, you get an error pointing to the exact path: contact -> address -> zip_code. This nested validation is especially valuable when parsing complex JSON from APIs or configuration files.

Serialization: Models to Dictionaries and JSON

Pydantic models are not just for validation -- they also handle serialization. The model_dump() and model_dump_json() methods convert models back to dictionaries and JSON strings with fine-grained control.

# serialization.py
from pydantic import BaseModel, Field
from typing import Optional
from datetime import datetime

class Article(BaseModel):
    title: str
    content: str
    author: str
    published: bool = False
    created_at: datetime = Field(default_factory=datetime.now)
    internal_notes: Optional[str] = None

article = Article(
    title="Pydantic V2 Guide",
    content="Learn data validation...",
    author="Alice",
    internal_notes="Draft needs review"
)

# Full dictionary
print("Full:", article.model_dump())

# Exclude internal fields
public = article.model_dump(exclude={"internal_notes", "created_at"})
print("Public:", public)

# Only include specific fields
summary = article.model_dump(include={"title", "author", "published"})
print("Summary:", summary)

# Skip fields with None values
clean = article.model_dump(exclude_none=True)
print("Clean:", clean)

# JSON string
json_str = article.model_dump_json(indent=2)
print("JSON:", json_str[:80], "...")

Output:

Full: {'title': 'Pydantic V2 Guide', 'content': 'Learn data validation...', 'author': 'Alice', 'published': False, 'created_at': datetime(...), 'internal_notes': 'Draft needs review'}
Public: {'title': 'Pydantic V2 Guide', 'content': 'Learn data validation...', 'author': 'Alice', 'published': False}
Summary: {'title': 'Pydantic V2 Guide', 'author': 'Alice', 'published': False}
Clean: {'title': 'Pydantic V2 Guide', 'content': 'Learn data validation...', 'author': 'Alice', 'published': False, 'created_at': datetime(...), 'internal_notes': 'Draft needs review'}
JSON: {
  "title": "Pydantic V2 Guide",
  "content": "Learn data validation..." ...

The exclude and include parameters give you control over which fields appear in the output. This is essential for APIs where internal fields like notes or timestamps should not be sent to clients.

Handling validation errors
Square peg, round hole. Pydantic will tell you exactly which corner doesn't fit.

Handling Validation Errors

When validation fails, Pydantic raises a ValidationError with detailed information about every problem. You can catch this exception and extract structured error data for API responses or logging.

# error_handling.py
from pydantic import BaseModel, Field, ValidationError

class Order(BaseModel):
    product: str = Field(..., min_length=1)
    quantity: int = Field(..., gt=0)
    price: float = Field(..., gt=0)
    email: str

# Multiple validation errors at once
try:
    Order(product="", quantity=-5, price="free", email="not-an-email")
except ValidationError as e:
    print(f"Error count: {e.error_count()}")
    print()
    for error in e.errors():
        print(f"Field: {error['loc']}")
        print(f"Message: {error['msg']}")
        print(f"Type: {error['type']}")
        print()

Output:

Error count: 3

Field: ('product',)
Message: String should have at least 1 character
Type: string_too_short

Field: ('quantity',)
Message: Input should be greater than 0
Type: greater_than

Field: ('price',)
Message: Input should be a valid number, unable to parse string as a number
Type: float_parsing

Pydantic collects all validation errors rather than stopping at the first one. Each error includes the field path (loc), a human-readable message (msg), and a machine-readable error type (type). In a FastAPI application, these errors are automatically converted to 422 responses with this same structured format.

Model Configuration

Pydantic V2 uses model_config to customize model behavior. This replaces the inner class Config from V1.

# model_config.py
from pydantic import BaseModel, ConfigDict

class StrictUser(BaseModel):
    model_config = ConfigDict(
        str_strip_whitespace=True,    # Strip leading/trailing whitespace
        str_min_length=1,             # No empty strings allowed
        frozen=True,                  # Immutable after creation
        extra="forbid",               # No extra fields allowed
    )

    name: str
    email: str

# Whitespace gets stripped automatically
user = StrictUser(name="  Alice  ", email="alice@example.com")
print(f"Name: '{user.name}'")

# Extra fields are rejected
try:
    StrictUser(name="Bob", email="bob@example.com", role="admin")
except Exception as e:
    print(f"Extra field error: {e}")

# Immutable -- cannot change after creation
try:
    user.name = "Charlie"
except Exception as e:
    print(f"Frozen error: {e}")

Output:

Name: 'Alice'
Extra field error: 1 validation error for StrictUser
role
  Extra inputs are not permitted [type=extra_forbidden, ...]
Frozen error: 1 validation error for StrictUser
name
  Instance is frozen [type=frozen_instance, ...]

The extra="forbid" setting is especially important for security -- it prevents attackers from injecting unexpected fields into your data models. The frozen=True setting makes models behave like named tuples, which is useful for configuration objects that should not be modified after creation.

Real-Life Example: Application Configuration Manager

Pydantic V2 performance
V2 rewrote the core in Rust. Your validation just got a turbocharger.

Here is a practical example: a type-safe application configuration system that reads from environment variables, validates every setting at startup, and provides clean access throughout your application.

# config_manager.py
import os
from pydantic import BaseModel, Field, field_validator, model_validator
from pydantic import ConfigDict
from typing import Optional

class DatabaseConfig(BaseModel):
    host: str = "localhost"
    port: int = Field(default=5432, ge=1, le=65535)
    name: str
    user: str
    password: str
    pool_size: int = Field(default=10, ge=1, le=100)

    @property
    def connection_url(self) -> str:
        return f"postgresql://{self.user}:{self.password}@{self.host}:{self.port}/{self.name}"

class CacheConfig(BaseModel):
    enabled: bool = True
    ttl_seconds: int = Field(default=300, ge=0)
    max_size: int = Field(default=1000, ge=1)

class LoggingConfig(BaseModel):
    level: str = "INFO"
    format: str = "json"

    @field_validator("level")
    @classmethod
    def validate_level(cls, v: str) -> str:
        allowed = {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}
        v_upper = v.upper()
        if v_upper not in allowed:
            raise ValueError(f"Log level must be one of {allowed}")
        return v_upper

    @field_validator("format")
    @classmethod
    def validate_format(cls, v: str) -> str:
        if v not in ("json", "text"):
            raise ValueError("Format must be 'json' or 'text'")
        return v

class AppConfig(BaseModel):
    model_config = ConfigDict(frozen=True)

    app_name: str = "MyApp"
    debug: bool = False
    api_version: str = "v1"
    database: DatabaseConfig
    cache: CacheConfig = CacheConfig()
    logging: LoggingConfig = LoggingConfig()

    @model_validator(mode="after")
    def warn_debug_in_production(self):
        if self.debug and self.logging.level not in ("DEBUG", "INFO"):
            raise ValueError("Debug mode requires log level DEBUG or INFO")
        return self

# Create config from a settings dictionary
settings = {
    "app_name": "BookStore API",
    "debug": True,
    "database": {
        "host": "db.example.com",
        "port": 5432,
        "name": "bookstore",
        "user": "app_user",
        "password": "secure_password_here",
    },
    "cache": {"ttl_seconds": 600, "max_size": 5000},
    "logging": {"level": "debug", "format": "json"},
}

config = AppConfig(**settings)
print(f"App: {config.app_name}")
print(f"DB URL: {config.database.connection_url}")
print(f"Cache TTL: {config.cache.ttl_seconds}s")
print(f"Log Level: {config.logging.level}")
print(f"Debug: {config.debug}")

Output:

App: BookStore API
DB URL: postgresql://app_user:secure_password_here@db.example.com:5432/bookstore
Cache TTL: 600s
Log Level: DEBUG
Debug: True

This configuration model validates every setting at application startup. If the database port is out of range, the log level is invalid, or debug mode conflicts with the log level, you get a clear error immediately instead of a mysterious failure at runtime. The frozen=True config prevents accidental modification after initialization.

Frequently Asked Questions

What changed between Pydantic V1 and V2?

The biggest changes are: .dict() became .model_dump(), .json() became .model_dump_json(), inner class Config became model_config = ConfigDict(...), validators use @field_validator instead of @validator, and the core validation engine was rewritten in Rust for major performance improvements. The migration guide at docs.pydantic.dev/latest/migration covers every change.

How much faster is V2 than V1?

Pydantic V2 is 5-50x faster than V1 depending on the operation. Simple model creation is about 5x faster, while complex nested validation can see 50x improvements. The speed comes from the Rust core (pydantic-core) that handles parsing and validation natively.

Should I use Pydantic or dataclasses?

Use Pydantic when you need runtime validation (API inputs, config files, external data). Use dataclasses when you need simple data containers for internal application state where the data is already trusted. Pydantic also has a @pydantic.dataclasses.dataclass decorator that adds validation to standard dataclass syntax.

How does Pydantic integrate with FastAPI?

FastAPI uses Pydantic models for request body validation, query parameter validation, and response serialization. When you define a FastAPI endpoint parameter as a Pydantic model, FastAPI automatically validates incoming JSON against it and returns 422 errors with Pydantic's structured error format.

Can I use Pydantic with SQLAlchemy or Django ORM?

Yes. Use model_config = ConfigDict(from_attributes=True) (formerly orm_mode) to create Pydantic models from ORM objects. This lets you validate and serialize database records through Pydantic models: UserSchema.model_validate(db_user) converts an ORM object into a validated Pydantic model.

Conclusion

Pydantic V2 is the most practical data validation library in the Python ecosystem. Its combination of type hint-driven validation, automatic type coercion, structured error reporting, and Rust-powered performance makes it the right choice for any project that handles external data. The configuration manager example shows how a well-designed model catches errors at startup instead of letting them surface as runtime crashes.

Start by replacing your manual validation code with Pydantic models, then explore advanced features like computed fields, generic models, and custom types. The official documentation at docs.pydantic.dev is comprehensive and well-organized.

How To Add Authentication to FastAPI with OAuth2 and JWT

How To Add Authentication to FastAPI with OAuth2 and JWT

Intermediate

You have built a FastAPI application with clean endpoints and Pydantic validation. Everything works perfectly — until you realize anyone on the internet can call your API. No login required, no identity checks, no permission controls. This is the exact moment every API developer reaches: your endpoints need authentication, and you need it done properly without building a security framework from scratch.

FastAPI has built-in support for OAuth2 with Password flow and integrates cleanly with JSON Web Tokens (JWT). You will need three packages beyond FastAPI itself: python-jose[cryptography] for creating and verifying JWT tokens, passlib[bcrypt] for secure password hashing, and python-multipart for handling form data in the login endpoint. Install them with pip install python-jose[cryptography] passlib[bcrypt] python-multipart.

This tutorial walks you through the complete authentication flow: hashing and verifying passwords, creating JWT access tokens, protecting endpoints with dependency injection, extracting the current user from tokens, and implementing role-based access control. By the end, you will have a reusable authentication system that you can drop into any FastAPI project.

JWT Authentication in 30 Seconds

Here is the simplest possible protected endpoint in FastAPI. This gives you the core pattern before we build out the full system.

# quick_auth.py
from fastapi import FastAPI, Depends, HTTPException
from fastapi.security import OAuth2PasswordBearer

app = FastAPI()
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="login")

@app.get("/protected")
def protected_route(token: str = Depends(oauth2_scheme)):
    # In a real app, you would decode and verify the JWT here
    if token != "secret-token":
        raise HTTPException(status_code=401, detail="Invalid token")
    return {"message": "You have access!", "token": token}

Output (without token):

{"detail": "Not authenticated"}

Output (with valid token in Authorization header):

{"message": "You have access!", "token": "secret-token"}

The OAuth2PasswordBearer dependency automatically extracts the token from the Authorization: Bearer <token> header. If the header is missing, FastAPI returns a 401 response before your function even runs. The tokenUrl parameter tells Swagger UI where to send login requests.

How JWT Authentication Works

Before writing the full implementation, it helps to understand the flow. JWT (JSON Web Token) authentication works in four steps: the client sends credentials (username and password), the server verifies them and returns a signed token, the client includes that token in every subsequent request, and the server verifies the token signature on each protected endpoint.

StepWhoWhat Happens
1. LoginClientSends username + password to /login
2. Token creationServerVerifies password, creates signed JWT
3. Authenticated requestClientSends JWT in Authorization: Bearer header
4. Token verificationServerDecodes JWT, checks signature and expiry

The JWT itself contains encoded JSON with the user’s identity (called “claims”), an expiration time, and a cryptographic signature. The server signs the token with a secret key, so any tampering is detectable without a database lookup. This makes JWT ideal for stateless APIs where you do not want to store session data on the server.

OAuth2 authentication flow
OAuth2 is just a series of handshakes. Miss one and the bouncer won’t let you in.

Password Hashing with Passlib

Never store passwords in plain text. Use passlib with the bcrypt algorithm to hash passwords before storing them and verify them during login.

# password_utils.py
from passlib.context import CryptContext

pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")

def hash_password(password: str) -> str:
    """Hash a plain text password for storage."""
    return pwd_context.hash(password)

def verify_password(plain_password: str, hashed_password: str) -> bool:
    """Check a plain text password against a stored hash."""
    return pwd_context.verify(plain_password, hashed_password)

# Example usage
hashed = hash_password("my_secure_password")
print(f"Hashed: {hashed}")
print(f"Verify correct: {verify_password('my_secure_password', hashed)}")
print(f"Verify wrong: {verify_password('wrong_password', hashed)}")

Output:

Hashed: $2b$12$LJ3m4ys3Lg...Kz8dHJKe (unique each time)
Verify correct: True
Verify wrong: False

The CryptContext handles algorithm selection, salt generation, and hash verification. The deprecated="auto" setting means passlib will automatically upgrade old hashes to the current scheme when passwords are verified. Each hash is unique even for the same password because bcrypt generates a random salt internally.

Creating and Decoding JWT Tokens

The python-jose library creates and verifies JWT tokens. Each token contains a payload (claims) signed with your secret key.

# jwt_utils.py
from datetime import datetime, timedelta, timezone
from jose import jwt, JWTError

SECRET_KEY = "your-secret-key-keep-this-safe-and-long"
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 30

def create_access_token(data: dict, expires_delta: timedelta = None) -> str:
    """Create a JWT access token with an expiration time."""
    to_encode = data.copy()
    expire = datetime.now(timezone.utc) + (expires_delta or timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES))
    to_encode.update({"exp": expire})
    return jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)

def decode_access_token(token: str) -> dict:
    """Decode and verify a JWT token. Raises JWTError if invalid."""
    return jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])

# Example
token = create_access_token({"sub": "alice", "role": "admin"})
print(f"Token: {token[:50]}...")

payload = decode_access_token(token)
print(f"Payload: {payload}")

Output:

Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzd...
Payload: {'sub': 'alice', 'role': 'admin', 'exp': 1712534400}

The sub (subject) claim is a JWT standard for identifying the user. The exp claim sets when the token expires — after this time, decode_access_token raises a JWTError automatically. In production, store SECRET_KEY in an environment variable, not in your source code.

JWT token creation
Three Base64 segments walk into a bar. The signature picks up the tab.

Building the Complete Authentication System

Now let us combine password hashing and JWT tokens into a complete FastAPI authentication system with a user database, login endpoint, and protected routes.

# auth_app.py
from datetime import datetime, timedelta, timezone
from fastapi import FastAPI, Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
from jose import jwt, JWTError
from passlib.context import CryptContext
from pydantic import BaseModel

# Configuration
SECRET_KEY = "your-secret-key-change-in-production"
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 30

# Password hashing
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")

# OAuth2 scheme
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="login")

app = FastAPI(title="Auth Demo")

# Simulated user database
fake_users_db = {
    "alice": {
        "username": "alice",
        "hashed_password": pwd_context.hash("alice123"),
        "role": "admin",
    },
    "bob": {
        "username": "bob",
        "hashed_password": pwd_context.hash("bob456"),
        "role": "user",
    },
}

# Pydantic models
class Token(BaseModel):
    access_token: str
    token_type: str

class User(BaseModel):
    username: str
    role: str

# Helper functions
def authenticate_user(username: str, password: str):
    user = fake_users_db.get(username)
    if not user or not pwd_context.verify(password, user["hashed_password"]):
        return None
    return user

def create_access_token(data: dict, expires_delta: timedelta = None) -> str:
    to_encode = data.copy()
    expire = datetime.now(timezone.utc) + (expires_delta or timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES))
    to_encode.update({"exp": expire})
    return jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)

async def get_current_user(token: str = Depends(oauth2_scheme)) -> User:
    credentials_exception = HTTPException(
        status_code=status.HTTP_401_UNAUTHORIZED,
        detail="Could not validate credentials",
        headers={"WWW-Authenticate": "Bearer"},
    )
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
        username = payload.get("sub")
        if username is None:
            raise credentials_exception
    except JWTError:
        raise credentials_exception
    user = fake_users_db.get(username)
    if user is None:
        raise credentials_exception
    return User(username=user["username"], role=user["role"])

# Endpoints
@app.post("/login", response_model=Token)
async def login(form_data: OAuth2PasswordRequestForm = Depends()):
    user = authenticate_user(form_data.username, form_data.password)
    if not user:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Incorrect username or password",
            headers={"WWW-Authenticate": "Bearer"},
        )
    token = create_access_token(data={"sub": user["username"], "role": user["role"]})
    return {"access_token": token, "token_type": "bearer"}

@app.get("/me", response_model=User)
async def read_current_user(current_user: User = Depends(get_current_user)):
    return current_user

@app.get("/admin")
async def admin_only(current_user: User = Depends(get_current_user)):
    if current_user.role != "admin":
        raise HTTPException(status_code=403, detail="Admin access required")
    return {"message": f"Welcome admin {current_user.username}!"}

@app.get("/public")
async def public_endpoint():
    return {"message": "This endpoint is open to everyone"}

Testing the login flow:

# Step 1: Login to get a token
curl -X POST http://127.0.0.1:8000/login \
  -d "username=alice&password=alice123"

# Response:
{"access_token": "eyJhbGciOiJIUzI1NiI...", "token_type": "bearer"}

# Step 2: Access protected endpoint with the token
curl http://127.0.0.1:8000/me \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiI..."

# Response:
{"username": "alice", "role": "admin"}

# Step 3: Access admin-only endpoint
curl http://127.0.0.1:8000/admin \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiI..."

# Response:
{"message": "Welcome admin alice!"}

The get_current_user dependency is the heart of this system. It extracts the token from the header, decodes it, looks up the user, and returns a User object — all before your endpoint function runs. Any endpoint that includes current_user: User = Depends(get_current_user) is automatically protected.

Password hashing with bcrypt
bcrypt turns your password into unrecognizable mush. That’s the point.

Role-Based Access Control

The admin endpoint above uses a simple role check, but you can make this reusable with a dependency factory that creates role-checking dependencies on the fly.

# role_checker.py
from fastapi import Depends, HTTPException, status

def require_role(required_role: str):
    """Create a dependency that checks if the user has the required role."""
    async def role_checker(current_user: User = Depends(get_current_user)):
        if current_user.role != required_role:
            raise HTTPException(
                status_code=status.HTTP_403_FORBIDDEN,
                detail=f"Role '{required_role}' required. You have '{current_user.role}'.",
            )
        return current_user
    return role_checker

# Usage in endpoints
@app.get("/admin/dashboard")
async def admin_dashboard(admin: User = Depends(require_role("admin"))):
    return {"message": "Admin dashboard", "user": admin.username}

@app.get("/editor/publish")
async def editor_publish(editor: User = Depends(require_role("editor"))):
    return {"message": "Editor publish page", "user": editor.username}

Output (bob tries /admin/dashboard):

{"detail": "Role 'admin' required. You have 'user'."}

The require_role function returns a new dependency for each role. This pattern scales cleanly — add as many roles as your application needs without duplicating validation logic in every endpoint.

Token Refresh Strategy

Access tokens should be short-lived (15-30 minutes) for security. But you do not want users logging in every 30 minutes. The standard solution is a refresh token with a longer lifespan that can generate new access tokens.

# token_refresh.py
REFRESH_TOKEN_EXPIRE_DAYS = 7

def create_refresh_token(data: dict) -> str:
    to_encode = data.copy()
    expire = datetime.now(timezone.utc) + timedelta(days=REFRESH_TOKEN_EXPIRE_DAYS)
    to_encode.update({"exp": expire, "type": "refresh"})
    return jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)

@app.post("/login", response_model=dict)
async def login_with_refresh(form_data: OAuth2PasswordRequestForm = Depends()):
    user = authenticate_user(form_data.username, form_data.password)
    if not user:
        raise HTTPException(status_code=401, detail="Invalid credentials")
    access_token = create_access_token({"sub": user["username"], "role": user["role"]})
    refresh_token = create_refresh_token({"sub": user["username"]})
    return {
        "access_token": access_token,
        "refresh_token": refresh_token,
        "token_type": "bearer",
    }

@app.post("/refresh", response_model=Token)
async def refresh_access_token(refresh_token: str):
    try:
        payload = jwt.decode(refresh_token, SECRET_KEY, algorithms=[ALGORITHM])
        if payload.get("type") != "refresh":
            raise HTTPException(status_code=401, detail="Invalid token type")
        username = payload.get("sub")
        new_token = create_access_token({"sub": username, "role": fake_users_db[username]["role"]})
        return {"access_token": new_token, "token_type": "bearer"}
    except JWTError:
        raise HTTPException(status_code=401, detail="Invalid refresh token")

The refresh token has a type claim set to "refresh" so it cannot be used as an access token. When the access token expires, the client sends the refresh token to /refresh to get a new access token without re-entering credentials.

Security Best Practices

Authentication code is one place where shortcuts cause real damage. Here are the practices that matter most for a production FastAPI application.

PracticeWhy It Matters
Store SECRET_KEY in env varsKeys in source code end up in Git history
Use bcrypt (not MD5/SHA)Bcrypt is intentionally slow, resistant to brute force
Set short token expiry (15-30 min)Limits damage if a token is stolen
Use HTTPS in productionTokens sent over HTTP can be intercepted
Validate token claims (sub, exp)Missing validation lets forged tokens through
Return generic error messages“Invalid credentials” not “User not found” prevents user enumeration
Token refresh handling
Access tokens expire. Refresh tokens expire slower. Your patience expires fastest.

Real-Life Example: Protected Notes API

Let us build a practical application: a notes API where each user can only see and modify their own notes. This demonstrates how authentication integrates with real CRUD operations.

# notes_api.py
from datetime import datetime, timedelta, timezone
from fastapi import FastAPI, Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
from jose import jwt, JWTError
from passlib.context import CryptContext
from pydantic import BaseModel
from typing import Optional

SECRET_KEY = "notes-app-secret-change-me"
ALGORITHM = "HS256"
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="login")

app = FastAPI(title="Protected Notes API")

# Databases
users_db = {
    "alice": {"username": "alice", "hashed_password": pwd_context.hash("alice123")},
    "bob": {"username": "bob", "hashed_password": pwd_context.hash("bob456")},
}
notes_db = {}
next_note_id = 1

# Models
class Token(BaseModel):
    access_token: str
    token_type: str

class NoteCreate(BaseModel):
    title: str
    content: str

class NoteResponse(BaseModel):
    id: int
    title: str
    content: str
    owner: str
    created_at: str

# Auth helpers
def create_token(username: str) -> str:
    expire = datetime.now(timezone.utc) + timedelta(minutes=30)
    return jwt.encode({"sub": username, "exp": expire}, SECRET_KEY, algorithm=ALGORITHM)

async def get_current_user(token: str = Depends(oauth2_scheme)) -> str:
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
        username = payload.get("sub")
        if username is None or username not in users_db:
            raise HTTPException(status_code=401, detail="Invalid token")
        return username
    except JWTError:
        raise HTTPException(status_code=401, detail="Invalid token")

# Endpoints
@app.post("/login", response_model=Token)
async def login(form_data: OAuth2PasswordRequestForm = Depends()):
    user = users_db.get(form_data.username)
    if not user or not pwd_context.verify(form_data.password, user["hashed_password"]):
        raise HTTPException(status_code=401, detail="Invalid credentials")
    return {"access_token": create_token(form_data.username), "token_type": "bearer"}

@app.post("/notes", response_model=NoteResponse, status_code=201)
async def create_note(note: NoteCreate, username: str = Depends(get_current_user)):
    global next_note_id
    note_data = {
        "id": next_note_id,
        "title": note.title,
        "content": note.content,
        "owner": username,
        "created_at": datetime.now(timezone.utc).isoformat(),
    }
    notes_db[next_note_id] = note_data
    next_note_id += 1
    return note_data

@app.get("/notes", response_model=list[NoteResponse])
async def list_my_notes(username: str = Depends(get_current_user)):
    return [n for n in notes_db.values() if n["owner"] == username]

@app.delete("/notes/{note_id}", status_code=204)
async def delete_note(note_id: int, username: str = Depends(get_current_user)):
    note = notes_db.get(note_id)
    if not note:
        raise HTTPException(status_code=404, detail="Note not found")
    if note["owner"] != username:
        raise HTTPException(status_code=403, detail="Not your note")
    del notes_db[note_id]

Testing the flow:

# Login as Alice
curl -X POST http://127.0.0.1:8000/login -d "username=alice&password=alice123"
# {"access_token": "eyJ...", "token_type": "bearer"}

# Create a note (use Alice's token)
curl -X POST http://127.0.0.1:8000/notes \
  -H "Authorization: Bearer eyJ..." \
  -H "Content-Type: application/json" \
  -d '{"title": "Shopping List", "content": "Milk, eggs, bread"}'
# {"id": 1, "title": "Shopping List", "content": "Milk, eggs, bread", "owner": "alice", ...}

# Bob cannot see Alice's notes
curl http://127.0.0.1:8000/notes -H "Authorization: Bearer BOB_TOKEN"
# [] (empty list -- Bob has no notes)

Each note is tagged with its owner, and the list_my_notes endpoint filters by the authenticated user. The delete endpoint checks ownership before allowing deletion. This pattern of tying data to the authenticated user is fundamental to building multi-tenant APIs.

Frequently Asked Questions

Should I use sessions or JWT for my API?

Use JWT for stateless APIs (especially mobile or SPA clients) where you do not want to maintain server-side session storage. Use sessions for traditional server-rendered web applications where the backend controls the full page lifecycle. JWT works well for microservices because any service can verify the token independently without a shared session store.

How should I generate and store the SECRET_KEY?

Generate a random key with openssl rand -hex 32 in your terminal. Store it in an environment variable and read it with os.environ["SECRET_KEY"]. Never commit it to version control. In production, use a secrets manager like AWS Secrets Manager, HashiCorp Vault, or your platform’s built-in secrets.

How do I implement password reset?

Create a /forgot-password endpoint that generates a short-lived token (10-15 minutes) and sends it to the user’s email. Create a /reset-password endpoint that accepts the token and the new password. Use the same JWT mechanism but with a different token type claim so reset tokens cannot be used for API access.

Can I add Google or GitHub login?

Yes. Use the authlib or httpx-oauth library to implement OAuth2 authorization code flow. FastAPI’s dependency injection makes it straightforward to add multiple authentication methods. The user logs in through the provider’s OAuth flow, and your server exchanges the authorization code for user profile data.

How do I test authenticated endpoints?

Use FastAPI’s TestClient with the headers parameter. Create a test user, generate a token in your test setup, and include it in requests: client.get("/me", headers={"Authorization": f"Bearer {token}"}). For dependency overrides, use app.dependency_overrides[get_current_user] = lambda: test_user.

Conclusion

FastAPI’s dependency injection system makes OAuth2 and JWT authentication clean and reusable. The get_current_user dependency is the single point of authentication that you inject into any endpoint that needs protection. Combined with passlib for secure password hashing and python-jose for token management, you have a production-ready auth system in under 100 lines of code.

Start with the Protected Notes API example and extend it with a real database, email verification, and OAuth2 social login. The official FastAPI security documentation at fastapi.tiangolo.com/tutorial/security covers advanced patterns including scopes and multiple authentication schemes.

FastAPI vs Flask: Which Python Framework Should You Choose?

FastAPI vs Flask: Which Python Framework Should You Choose?

Intermediate

You have a new Python project that needs a web API. You open Google, type “best Python web framework,” and immediately drown in opinions. Flask has been the go-to choice for over a decade. FastAPI showed up in 2018 and climbed to 75,000+ GitHub stars faster than almost any Python project in history. Both can build APIs. Both are lightweight. So which one should you actually pick for your next project?

The answer depends on what you are building. Flask gives you maximum flexibility and a massive ecosystem of extensions. FastAPI gives you automatic data validation, async support out of the box, and self-documenting endpoints. Neither is universally better — they solve different problems in different ways, and understanding those differences saves you from rewriting code six months later.

In this article, we compare Flask and FastAPI across every dimension that matters: setup and routing, data validation, async support, performance, automatic documentation, ecosystem maturity, and real-world project structure. Every comparison includes runnable code so you can see the differences yourself. By the end, you will have a clear decision framework for choosing the right tool.

FastAPI vs Flask: Quick Comparison

Before we dig into code, here is a high-level comparison table covering the key differences between Flask and FastAPI.

FeatureFlaskFastAPI
First release20102018
Async supportLimited (Flask 2.0+)Native, built on Starlette
Data validationManual or Flask-MarshmallowBuilt-in via Pydantic
Auto documentationRequires Flask-RESTX or FlasggerBuilt-in Swagger and ReDoc
Type hintsOptional, no runtime effectRequired, drive validation and docs
Learning curveVery gentleGentle (steeper if new to type hints)
Ecosystem sizeHuge (thousands of extensions)Growing fast, fewer extensions
WSGI/ASGIWSGI (Werkzeug)ASGI (Starlette + Uvicorn)
Best forTraditional web apps, prototypes, server-rendered pagesModern APIs, microservices, async workloads

Now let us see how these differences play out in actual code.

FastAPI async speed advantage
async/await turns your API into a rocket. Flask takes the scenic route.

Hello World: Setup and First Route

The fastest way to feel the difference between Flask and FastAPI is to build the simplest possible endpoint in each. Both frameworks let you go from zero to running server in under 10 lines.

Flask Hello World

Flask uses the @app.route decorator and returns plain strings or dictionaries. Install it with pip install flask and run with the built-in development server.

# flask_hello.py
from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/hello")
def hello():
    return jsonify({"message": "Hello from Flask!"})

if __name__ == "__main__":
    app.run(debug=True, port=5000)

Output (when you visit http://localhost:5000/hello):

{"message": "Hello from Flask!"}

FastAPI Hello World

FastAPI uses the same decorator pattern but returns dictionaries directly — no need for jsonify. Install it with pip install fastapi uvicorn and run with Uvicorn.

# fastapi_hello.py
from fastapi import FastAPI

app = FastAPI()

@app.get("/hello")
async def hello():
    return {"message": "Hello from FastAPI!"}

# Run with: uvicorn fastapi_hello:app --reload --port 8000

Output (when you visit http://localhost:8000/hello):

{"message": "Hello from FastAPI!"}

The syntax is nearly identical. The key differences: FastAPI uses HTTP method decorators (@app.get instead of @app.route), supports async def natively, and returns dicts without a wrapper function. Flask requires jsonify() to return proper JSON responses.

Data Validation: Where FastAPI Pulls Ahead

Data validation is where the two frameworks diverge most sharply. Flask leaves validation entirely up to you. FastAPI makes it automatic through Python type hints and Pydantic models. This single difference changes how much boilerplate you write for every endpoint.

Flask: Manual Validation

In Flask, you parse the request body yourself, check each field manually, and return error responses when something is wrong. Here is a typical pattern for creating a user.

# flask_validation.py
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/users", methods=["POST"])
def create_user():
    data = request.get_json()

    # Manual validation -- you write every check
    if not data:
        return jsonify({"error": "Request body required"}), 400
    if "name" not in data or not isinstance(data["name"], str):
        return jsonify({"error": "name must be a string"}), 400
    if "email" not in data or "@" not in data["email"]:
        return jsonify({"error": "Valid email required"}), 400
    if "age" in data and not isinstance(data["age"], int):
        return jsonify({"error": "age must be an integer"}), 400

    return jsonify({"status": "created", "user": data}), 201

if __name__ == "__main__":
    app.run(debug=True, port=5000)

Output (POST with valid data):

{"status": "created", "user": {"name": "Alice", "email": "alice@example.com", "age": 30}}

FastAPI: Pydantic Does the Work

FastAPI validates request data automatically using Pydantic models. You define a model with type hints, and FastAPI rejects invalid requests before your function even runs.

# fastapi_validation.py
from fastapi import FastAPI
from pydantic import BaseModel, EmailStr

app = FastAPI()

class User(BaseModel):
    name: str
    email: EmailStr
    age: int | None = None

@app.post("/users", status_code=201)
async def create_user(user: User):
    return {"status": "created", "user": user.model_dump()}

# Run with: uvicorn fastapi_validation:app --reload --port 8000

Output (POST with invalid email):

{
  "detail": [
    {
      "type": "value_error",
      "loc": ["body", "email"],
      "msg": "value is not a valid email address"
    }
  ]
}

The FastAPI version is half the code and catches more errors. Pydantic validates types, checks email format, handles optional fields with defaults, and returns structured error responses automatically. In Flask, you would need to add a library like Marshmallow or Cerberus to get similar functionality, and even then it requires more setup.

Flask traditional approach
Flask has been battle-tested since 2010. That’s either reassuring or terrifying.

Async Support: Native vs Retrofitted

Modern APIs often need to handle many concurrent connections — waiting on database queries, calling external services, or streaming responses. Async programming lets your server handle thousands of these waiting operations without blocking. This is where the architectural difference between Flask and FastAPI matters most.

FastAPI was built on ASGI (Asynchronous Server Gateway Interface) from the start. Every route handler can be an async def function, and the framework coordinates concurrency through Python’s asyncio event loop. Flask was built on WSGI (Web Server Gateway Interface), which is synchronous by design. Flask 2.0 added async def support, but it runs each async view in a separate thread rather than using a true event loop.

FastAPI Async Example

# fastapi_async.py
import asyncio
from fastapi import FastAPI

app = FastAPI()

async def fetch_from_database():
    """Simulate a slow database query."""
    await asyncio.sleep(1)
    return {"users": ["Alice", "Bob", "Charlie"]}

async def fetch_from_cache():
    """Simulate a cache lookup."""
    await asyncio.sleep(0.5)
    return {"cached": True}

@app.get("/dashboard")
async def dashboard():
    # Both calls run concurrently -- total time ~1 second, not 1.5
    db_task = asyncio.create_task(fetch_from_database())
    cache_task = asyncio.create_task(fetch_from_cache())
    db_result = await db_task
    cache_result = await cache_task
    return {**db_result, **cache_result}

# Run with: uvicorn fastapi_async:app --reload --port 8000

Output:

{"users": ["Alice", "Bob", "Charlie"], "cached": true}

The two async calls run concurrently using asyncio.create_task(), so the total response time is about 1 second instead of 1.5 seconds. This pattern scales beautifully when your API calls multiple microservices or databases per request.

Flask Async (Limited)

# flask_async.py
import asyncio
from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/dashboard")
async def dashboard():
    # Flask 2.0+ supports async views, but runs them in threads
    await asyncio.sleep(1)
    return jsonify({"users": ["Alice", "Bob", "Charlie"], "note": "async works but limited"})

if __name__ == "__main__":
    app.run(debug=True, port=5000)

Flask’s async support works for simple cases, but it does not give you the same concurrency benefits as FastAPI’s native ASGI approach. For I/O-heavy workloads with many concurrent connections, FastAPI’s architecture has a clear advantage.

Automatic API Documentation

One of FastAPI’s most impressive features is automatic interactive documentation. The moment you define an endpoint with type hints, FastAPI generates Swagger UI and ReDoc pages at /docs and /redoc respectively. No configuration needed.

Flask has no built-in documentation generator. You can add it with extensions like Flask-RESTX (which includes Swagger) or Flasgger, but they require additional decorators and configuration. Here is what you need in Flask to get what FastAPI gives you for free.

Docs FeatureFlaskFastAPI
Swagger UIFlask-RESTX or Flasgger (install + configure)Built-in at /docs
ReDocManual setupBuilt-in at /redoc
OpenAPI schemaGenerated by extensionBuilt-in at /openapi.json
Request body docsManual schema definitionsAuto-generated from Pydantic models
Response examplesManualAuto-generated from return type hints

This is a significant productivity win for teams building APIs. Frontend developers, QA testers, and external consumers can explore your API interactively without reading source code or maintaining a separate Postman collection.

Comparing FastAPI and Flask features
Feature comparison: where one framework’s ceiling is the other’s floor.

Project Structure and Scalability

Both frameworks support clean project organization through modular patterns, but they use different terminology. Flask uses Blueprints to split a large application into reusable modules. FastAPI uses APIRouter for the same purpose. The concepts are nearly identical.

Flask Blueprints

# flask_blueprint.py
from flask import Flask, Blueprint, jsonify

# Define a blueprint for user routes
users_bp = Blueprint("users", __name__, url_prefix="/users")

@users_bp.route("/")
def list_users():
    return jsonify({"users": ["Alice", "Bob"]})

@users_bp.route("/<int:user_id>")
def get_user(user_id):
    return jsonify({"user_id": user_id, "name": "Alice"})

# Main app registers the blueprint
app = Flask(__name__)
app.register_blueprint(users_bp)

if __name__ == "__main__":
    app.run(debug=True, port=5000)

Output (GET /users/):

{"users": ["Alice", "Bob"]}

FastAPI APIRouter

# fastapi_router.py
from fastapi import FastAPI, APIRouter

# Define a router for user routes
users_router = APIRouter(prefix="/users", tags=["users"])

@users_router.get("/")
async def list_users():
    return {"users": ["Alice", "Bob"]}

@users_router.get("/{user_id}")
async def get_user(user_id: int):
    return {"user_id": user_id, "name": "Alice"}

# Main app includes the router
app = FastAPI()
app.include_router(users_router)

# Run with: uvicorn fastapi_router:app --reload --port 8000

Output (GET /users/):

{"users": ["Alice", "Bob"]}

The patterns are almost identical. The main difference is that FastAPI’s router automatically includes the routes in the generated documentation under the specified tag, while Flask Blueprints need additional configuration for documentation.

Ecosystem and Community

Flask has been around since 2010, which gives it a massive head start in ecosystem size. If you need a feature, chances are someone built a Flask extension for it: Flask-Login for authentication, Flask-SQLAlchemy for ORM integration, Flask-Mail for email, Flask-CORS for cross-origin requests, Flask-Migrate for database migrations, and hundreds more.

FastAPI’s ecosystem is smaller but growing rapidly. It leans on the broader Python ecosystem rather than framework-specific extensions. For databases, you use SQLAlchemy or Tortoise-ORM directly. For authentication, you use python-jose and passlib. For CORS, FastAPI has built-in middleware. This approach means fewer “FastAPI-specific” packages but more flexibility in choosing your tools.

NeedFlask ExtensionFastAPI Approach
AuthenticationFlask-Login, Flask-JWT-ExtendedBuilt-in OAuth2 + python-jose
Database ORMFlask-SQLAlchemySQLAlchemy (async) or SQLModel
CORSFlask-CORSBuilt-in CORSMiddleware
Form handlingFlask-WTFPydantic models
Admin panelFlask-AdminSQLAdmin or Starlette-Admin
Rate limitingFlask-Limiterslowapi
Migration strategy
Migrating frameworks mid-project is like changing engines mid-flight. Plan accordingly.

When to Use Each Framework

After comparing code, features, and ecosystems, here is a practical decision framework. Neither framework is objectively better — the right choice depends on what you are building and who is building it.

Choose Flask When

Flask is the better choice when you are building server-rendered web applications with HTML templates (Jinja2 is deeply integrated), when your team is new to web development and benefits from Flask’s minimal learning curve, when you need a specific Flask extension that has no equivalent in FastAPI’s ecosystem, when you are prototyping quickly and do not need type-enforced validation, or when you are maintaining an existing Flask codebase and migration is not justified.

Choose FastAPI When

FastAPI is the better choice when you are building a pure REST API or microservice (no server-rendered HTML), when you need automatic request/response validation and do not want to write it yourself, when your API handles many concurrent I/O operations (database calls, external API requests), when you want auto-generated interactive documentation for your team or API consumers, or when you are starting a new project and your team is comfortable with Python type hints.

Real-Life Example: Todo API in Both Frameworks

To make the comparison concrete, here is the same Todo API built in both frameworks. This gives you a side-by-side view of how the same requirements translate into code.

Choosing the right framework
The best framework is the one that ships your project. The second best is the one you actually know.

Flask Version

# flask_todo.py
from flask import Flask, request, jsonify

app = Flask(__name__)
todos = []
next_id = 1

@app.route("/todos", methods=["GET"])
def list_todos():
    return jsonify(todos)

@app.route("/todos", methods=["POST"])
def create_todo():
    global next_id
    data = request.get_json()
    if not data or "title" not in data:
        return jsonify({"error": "title is required"}), 400
    todo = {
        "id": next_id,
        "title": data["title"],
        "done": data.get("done", False)
    }
    next_id += 1
    todos.append(todo)
    return jsonify(todo), 201

@app.route("/todos/<int:todo_id>", methods=["PUT"])
def update_todo(todo_id):
    data = request.get_json()
    for todo in todos:
        if todo["id"] == todo_id:
            todo["title"] = data.get("title", todo["title"])
            todo["done"] = data.get("done", todo["done"])
            return jsonify(todo)
    return jsonify({"error": "Not found"}), 404

@app.route("/todos/<int:todo_id>", methods=["DELETE"])
def delete_todo(todo_id):
    global todos
    todos = [t for t in todos if t["id"] != todo_id]
    return jsonify({"status": "deleted"})

if __name__ == "__main__":
    app.run(debug=True, port=5000)

FastAPI Version

# fastapi_todo.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI()
todos = []
next_id = 1

class TodoCreate(BaseModel):
    title: str
    done: bool = False

class TodoResponse(BaseModel):
    id: int
    title: str
    done: bool

@app.get("/todos", response_model=list[TodoResponse])
async def list_todos():
    return todos

@app.post("/todos", response_model=TodoResponse, status_code=201)
async def create_todo(todo: TodoCreate):
    global next_id
    new_todo = {"id": next_id, "title": todo.title, "done": todo.done}
    next_id += 1
    todos.append(new_todo)
    return new_todo

@app.put("/todos/{todo_id}", response_model=TodoResponse)
async def update_todo(todo_id: int, todo: TodoCreate):
    for t in todos:
        if t["id"] == todo_id:
            t["title"] = todo.title
            t["done"] = todo.done
            return t
    raise HTTPException(status_code=404, detail="Not found")

@app.delete("/todos/{todo_id}")
async def delete_todo(todo_id: int):
    global todos
    todos = [t for t in todos if t["id"] != todo_id]
    return {"status": "deleted"}

# Run with: uvicorn fastapi_todo:app --reload --port 8000

Both versions do the same thing, but the FastAPI version gives you automatic validation on every POST and PUT request, typed response models that document exactly what each endpoint returns, and interactive Swagger docs at /docs — all without a single extra line of configuration. The Flask version requires you to validate manually and document separately.

Frequently Asked Questions

Is Flask dead now that FastAPI exists?

Not at all. Flask remains one of the most popular Python web frameworks with active development, regular releases, and a massive ecosystem. Flask 3.0 introduced further improvements and the framework continues to evolve. Many production applications run Flask successfully, and it remains an excellent choice for server-rendered web applications, prototypes, and projects that benefit from its extensive extension library.

Can I migrate from Flask to FastAPI incrementally?

Yes, but it is not a simple find-and-replace. The routing syntax is similar, but data validation patterns, middleware, and extension usage differ significantly. The most practical approach is to build new endpoints in FastAPI while keeping existing Flask endpoints running, then migrate route by route. Libraries like a2wsgi can help run both frameworks during the transition period.

Is FastAPI really faster than Flask?

In benchmarks, FastAPI running on Uvicorn typically handles 2-3x more requests per second than Flask running on Gunicorn for async workloads. For synchronous, CPU-bound tasks, the difference is smaller. The real performance gain comes from FastAPI’s native async support, which lets a single worker handle many concurrent I/O-bound requests without blocking.

Which should I learn first as a beginner?

Flask is often recommended as a first framework because it has fewer concepts to learn upfront. You can build a working web app without understanding type hints, Pydantic models, or async/await. Once you are comfortable with Flask and HTTP concepts, learning FastAPI becomes straightforward because you already understand routing, request handling, and middleware.

What about Django? How does it compare?

Django is a “batteries-included” full-stack framework with an ORM, admin panel, authentication system, and template engine built in. Flask and FastAPI are both “micro” frameworks that let you choose your own components. If you need a full web application with user management, an admin dashboard, and server-rendered pages, Django is worth considering. For pure REST APIs and microservices, Flask or FastAPI are typically more appropriate.

Conclusion

Flask and FastAPI are both excellent Python web frameworks that serve different needs. Flask gives you simplicity, flexibility, and the largest extension ecosystem in the Python web world. FastAPI gives you automatic validation, native async support, and self-documenting APIs with zero extra configuration. The code examples in this article show that the syntax is similar enough to switch between them comfortably.

For your next project, start by asking: “Am I building a REST API or a full web application?” If you are building a pure API with typed data flowing in and out, FastAPI will save you hours of boilerplate validation and documentation. If you are building a traditional web app with HTML templates, or you need a specific Flask extension, Flask remains the proven choice.

You can explore the official documentation for both frameworks to go deeper: Flask documentation and FastAPI documentation.

How To Build a REST API with FastAPI in Python

How To Build a REST API with FastAPI in Python

Intermediate

Building a REST API is one of the most common tasks in modern software development, and FastAPI has quickly become the go-to Python framework for doing it. If you have been using Flask or Django REST Framework and wondered whether there is a faster, more modern alternative with built-in data validation and automatic documentation, FastAPI is your answer.

You will need Python 3.8 or later, plus two packages: fastapi and uvicorn. Install them with pip install fastapi uvicorn. FastAPI uses standard Python type hints for request validation and automatic OpenAPI documentation, so everything you already know about type annotations transfers directly.

This guide walks you through building a complete REST API from scratch: defining routes, handling path and query parameters, validating request bodies with Pydantic models, returning proper HTTP status codes, and running your API with Uvicorn. By the end you will have a fully functional CRUD API for managing a collection of books.

FastAPI in 30 Seconds: Quick Example

Here is the simplest possible FastAPI application. Save it and run it to see your first API endpoint in action.

# quick_example.py
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def read_root():
    return {"message": "Hello, FastAPI!"}

@app.get("/items/{item_id}")
def read_item(item_id: int, q: str = None):
    return {"item_id": item_id, "query": q}

Run it with: uvicorn quick_example:app --reload

Output (visiting http://127.0.0.1:8000/):

{"message": "Hello, FastAPI!"}

Output (visiting http://127.0.0.1:8000/items/42?q=search):

{"item_id": 42, "query": "search"}

Notice how item_id: int automatically validates that the path parameter is an integer. If you visit /items/hello, FastAPI returns a 422 validation error without you writing any validation code. The --reload flag makes Uvicorn restart when you change code, which is perfect for development.

What Is FastAPI and Why Use It?

FastAPI is a modern Python web framework for building APIs. It was created by Sebastian Ramirez and released in 2018. The framework is built on top of Starlette (for the web parts) and Pydantic (for data validation), combining the best of both into a developer-friendly experience.

FeatureFastAPIFlaskDjango REST
Async supportNative (async/await)Limited (via extensions)Limited
Data validationBuilt-in (Pydantic)Manual or extensionsSerializers
Auto documentationSwagger + ReDocManual or extensionsBrowsable API
Type hintsRequired (powers validation)OptionalOptional
PerformanceVery fast (ASGI)Moderate (WSGI)Moderate (WSGI)
Learning curveLowLowMedium-High

The biggest practical advantage is that FastAPI uses your type hints to generate request validation, response serialization, and API documentation automatically. You write the type annotations you would write anyway, and the framework does the rest.

FastAPI REST API architecture design
Good API design is like good plumbing — nobody notices until it breaks.

Setting Up Your FastAPI Project

Let us set up a proper project structure. Create a new directory and install the dependencies.

# setup_commands.sh
mkdir fastapi-books && cd fastapi-books
pip install fastapi uvicorn

Create the following file structure. We will build this up step by step.

# project_structure.txt
fastapi-books/
    main.py          # Application entry point
    models.py        # Pydantic models for request/response
    database.py      # In-memory database (for simplicity)

Defining Data Models with Pydantic

Start by defining what a “book” looks like using Pydantic models. These models handle both validation and serialization.

# models.py
from pydantic import BaseModel, Field
from typing import Optional

class BookCreate(BaseModel):
    title: str = Field(..., min_length=1, max_length=200)
    author: str = Field(..., min_length=1, max_length=100)
    year: int = Field(..., ge=1000, le=2030)
    isbn: Optional[str] = Field(None, pattern=r"^\d{10}(\d{3})?$")

class BookResponse(BaseModel):
    id: int
    title: str
    author: str
    year: int
    isbn: Optional[str] = None

class BookUpdate(BaseModel):
    title: Optional[str] = Field(None, min_length=1, max_length=200)
    author: Optional[str] = Field(None, min_length=1, max_length=100)
    year: Optional[int] = Field(None, ge=1000, le=2030)
    isbn: Optional[str] = Field(None, pattern=r"^\d{10}(\d{3})?$")

The Field function adds validation constraints: min_length, max_length, ge (greater than or equal), and pattern (regex). The ... means the field is required. Optional[str] = None means the field is optional with a default of None.

In-Memory Database

# database.py
from models import BookResponse

books_db: dict[int, dict] = {}
next_id: int = 1

def get_next_id() -> int:
    global next_id
    current = next_id
    next_id += 1
    return current

Building CRUD Endpoints

Now let us build the full API with Create, Read, Update, and Delete operations.

# main.py
from fastapi import FastAPI, HTTPException, Query
from models import BookCreate, BookResponse, BookUpdate
from database import books_db, get_next_id

app = FastAPI(
    title="Books API",
    description="A simple REST API for managing books",
    version="1.0.0",
)

@app.post("/books", response_model=BookResponse, status_code=201)
def create_book(book: BookCreate):
    book_id = get_next_id()
    book_data = {"id": book_id, **book.model_dump()}
    books_db[book_id] = book_data
    return book_data

@app.get("/books", response_model=list[BookResponse])
def list_books(
    skip: int = Query(0, ge=0),
    limit: int = Query(10, ge=1, le=100),
    author: str = Query(None),
):
    results = list(books_db.values())
    if author:
        results = [b for b in results if author.lower() in b["author"].lower()]
    return results[skip : skip + limit]

@app.get("/books/{book_id}", response_model=BookResponse)
def get_book(book_id: int):
    if book_id not in books_db:
        raise HTTPException(status_code=404, detail="Book not found")
    return books_db[book_id]

@app.put("/books/{book_id}", response_model=BookResponse)
def update_book(book_id: int, book: BookUpdate):
    if book_id not in books_db:
        raise HTTPException(status_code=404, detail="Book not found")
    update_data = book.model_dump(exclude_unset=True)
    books_db[book_id].update(update_data)
    return books_db[book_id]

@app.delete("/books/{book_id}", status_code=204)
def delete_book(book_id: int):
    if book_id not in books_db:
        raise HTTPException(status_code=404, detail="Book not found")
    del books_db[book_id]

Each endpoint uses type hints to define what it accepts and returns. The response_model parameter tells FastAPI to validate and serialize the response using that Pydantic model. status_code=201 sets the HTTP status for successful creation. HTTPException returns proper error responses with the right status codes.

Building FastAPI endpoints
Four HTTP verbs, infinite ways to get the status codes wrong.

Path Parameters, Query Parameters, and Request Bodies

FastAPI distinguishes between three types of input automatically based on where they appear in your function signature.

# parameters_demo.py
from fastapi import FastAPI, Query, Path

app = FastAPI()

@app.get("/users/{user_id}/posts")
def get_user_posts(
    user_id: int = Path(..., ge=1, description="The ID of the user"),
    page: int = Query(1, ge=1, description="Page number"),
    per_page: int = Query(10, ge=1, le=50, description="Items per page"),
    sort_by: str = Query("date", pattern="^(date|title|likes)$"),
):
    return {
        "user_id": user_id,
        "page": page,
        "per_page": per_page,
        "sort_by": sort_by,
        "posts": [f"Post {i}" for i in range(1, per_page + 1)],
    }

Output (GET /users/5/posts?page=2&sort_by=likes):

{
  "user_id": 5,
  "page": 2,
  "per_page": 10,
  "sort_by": "likes",
  "posts": ["Post 1", "Post 2", "Post 3", ...]
}

The rules are simple: if a parameter name matches a path variable in the URL template (like {user_id}), it is a path parameter. If the parameter has a default value or is annotated with Query(), it is a query parameter. If the parameter is a Pydantic model, it is parsed from the request body.

Error Handling

FastAPI provides clean error handling through exceptions and custom exception handlers.

# error_handling.py
from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import JSONResponse

app = FastAPI()

class BookNotFoundError(Exception):
    def __init__(self, book_id: int):
        self.book_id = book_id

@app.exception_handler(BookNotFoundError)
async def book_not_found_handler(request: Request, exc: BookNotFoundError):
    return JSONResponse(
        status_code=404,
        content={"error": "not_found", "detail": f"Book {exc.book_id} does not exist"},
    )

@app.get("/books/{book_id}")
def get_book(book_id: int):
    books = {1: "Python Crash Course", 2: "Fluent Python"}
    if book_id not in books:
        raise BookNotFoundError(book_id)
    return {"id": book_id, "title": books[book_id]}

Output (GET /books/99):

{"error": "not_found", "detail": "Book 99 does not exist"}

Async Endpoints

FastAPI supports both synchronous and asynchronous endpoint functions. Use async def when your endpoint does I/O operations like database queries or HTTP requests.

# async_demo.py
import asyncio
from fastapi import FastAPI

app = FastAPI()

@app.get("/sync")
def sync_endpoint():
    return {"type": "synchronous"}

@app.get("/async")
async def async_endpoint():
    await asyncio.sleep(0.1)
    return {"type": "asynchronous"}

Regular def functions run in a thread pool, so they do not block other requests. async def functions run on the event loop and should use await for any I/O operations. If your function does not use await, use regular def — there is no benefit to making it async.

Request validation in FastAPI
Pydantic validates your data so your database doesn’t have to file a complaint.

Automatic API Documentation

One of FastAPI’s best features is automatic API documentation. When you run your application, visit these URLs to see interactive documentation:

# documentation_urls.txt
Swagger UI:  http://127.0.0.1:8000/docs
ReDoc:       http://127.0.0.1:8000/redoc
OpenAPI JSON: http://127.0.0.1:8000/openapi.json

The Swagger UI lets you test every endpoint directly in the browser. It reads your type hints, Pydantic models, and docstrings to generate accurate request/response schemas. This means your documentation is always in sync with your code — no manual updates needed.

Real-Life Example: Complete Books API

Deploying FastAPI application
It works on localhost. Now make it work everywhere else.

Let us put everything together into a single, runnable file that demonstrates the complete CRUD workflow.

# books_api.py
from fastapi import FastAPI, HTTPException, Query
from pydantic import BaseModel, Field
from typing import Optional

app = FastAPI(title="Books API", version="1.0.0")

# In-memory database
books_db: dict[int, dict] = {}
next_id = 1

class BookCreate(BaseModel):
    title: str = Field(..., min_length=1, max_length=200)
    author: str = Field(..., min_length=1, max_length=100)
    year: int = Field(..., ge=1000, le=2030)

class BookResponse(BaseModel):
    id: int
    title: str
    author: str
    year: int

class BookUpdate(BaseModel):
    title: Optional[str] = None
    author: Optional[str] = None
    year: Optional[int] = None

@app.post("/books", response_model=BookResponse, status_code=201)
def create_book(book: BookCreate):
    global next_id
    book_data = {"id": next_id, **book.model_dump()}
    books_db[next_id] = book_data
    next_id += 1
    return book_data

@app.get("/books", response_model=list[BookResponse])
def list_books(skip: int = Query(0, ge=0), limit: int = Query(10, ge=1, le=100)):
    return list(books_db.values())[skip : skip + limit]

@app.get("/books/{book_id}", response_model=BookResponse)
def get_book(book_id: int):
    if book_id not in books_db:
        raise HTTPException(404, "Book not found")
    return books_db[book_id]

@app.put("/books/{book_id}", response_model=BookResponse)
def update_book(book_id: int, book: BookUpdate):
    if book_id not in books_db:
        raise HTTPException(404, "Book not found")
    books_db[book_id].update(book.model_dump(exclude_unset=True))
    return books_db[book_id]

@app.delete("/books/{book_id}", status_code=204)
def delete_book(book_id: int):
    if book_id not in books_db:
        raise HTTPException(404, "Book not found")
    del books_db[book_id]

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Testing with curl:

# Create a book
curl -X POST http://127.0.0.1:8000/books \
  -H "Content-Type: application/json" \
  -d '{"title": "Python Crash Course", "author": "Eric Matthes", "year": 2023}'

# Response: {"id": 1, "title": "Python Crash Course", "author": "Eric Matthes", "year": 2023}

# List all books
curl http://127.0.0.1:8000/books
# Response: [{"id": 1, "title": "Python Crash Course", ...}]

# Update a book
curl -X PUT http://127.0.0.1:8000/books/1 \
  -H "Content-Type: application/json" \
  -d '{"year": 2024}'

# Delete a book
curl -X DELETE http://127.0.0.1:8000/books/1

Run this with python books_api.py and test the endpoints using curl, httpie, or the built-in Swagger UI at /docs. You can extend this example by adding a real database (SQLAlchemy or SQLModel), authentication, pagination headers, and response caching.

Frequently Asked Questions

Can I migrate my Flask app to FastAPI?

Yes, and the migration is usually straightforward. Flask routes map 1:1 to FastAPI routes. The main changes are adding type hints to your parameters and converting request parsing from request.json to Pydantic models. FastAPI has a migration guide in its documentation.

Is FastAPI production-ready?

Yes. FastAPI is used in production by Microsoft, Netflix, Uber, and many other companies. Deploy it with Uvicorn behind a reverse proxy like Nginx, or use Gunicorn with Uvicorn workers for multi-process setups.

How do I connect FastAPI to a database?

Use SQLAlchemy 2.0 with async sessions, or use SQLModel (created by FastAPI’s author) which combines SQLAlchemy and Pydantic. For simple projects, you can also use databases with raw SQL queries. FastAPI’s documentation has complete examples for each approach.

How do I add authentication?

FastAPI has built-in support for OAuth2 with JWT tokens. Use fastapi.security.OAuth2PasswordBearer for token-based auth, or implement API key authentication with custom dependencies. See our companion article on FastAPI authentication for a complete walkthrough.

How do I test FastAPI endpoints?

Use TestClient from fastapi.testclient (which wraps httpx). It lets you make requests to your API without running a server, making unit tests fast and reliable.

Conclusion

FastAPI makes building REST APIs in Python fast and enjoyable. The combination of automatic validation through type hints, built-in async support, and interactive Swagger documentation means you spend less time on boilerplate and more time on your application logic. Start with the books API example, then extend it with a real database and authentication.

For the complete documentation, visit fastapi.tiangolo.com.

Minimal FastAPI App

FastAPI is built on Starlette + Pydantic. The decorator-based router turns a Python function into an HTTP endpoint:

# pip install fastapi uvicorn

# main.py
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI(title="My API", version="1.0.0")

class User(BaseModel):
    id: int
    name: str
    email: str

users = []

@app.get("/")
def root():
    return {"message": "Hello"}

@app.get("/users", response_model=list[User])
def list_users():
    return users

@app.post("/users", response_model=User, status_code=201)
def create_user(user: User):
    users.append(user)
    return user

# Run: uvicorn main:app --reload --port 8000

Visit http://localhost:8000/docs and you get interactive API docs generated from the Pydantic models — no separate documentation step.

Path Parameters and Query Parameters

FastAPI infers parameter type from function signatures. Path params become path components; query params become URL queries:

from fastapi import HTTPException

@app.get("/users/{user_id}")
def get_user(user_id: int):    # path param — validated as int
    for u in users:
        if u.id == user_id:
            return u
    raise HTTPException(status_code=404, detail="User not found")

@app.get("/search")
def search(q: str, limit: int = 10, sort: str = "id"):
    # q is required (no default), limit and sort optional with defaults
    return {"query": q, "limit": limit, "sort": sort}

Type annotations drive validation. A request to /users/abc automatically returns 422 with “value is not a valid integer”.

Dependency Injection

FastAPI’s Depends() wires up shared dependencies — DB sessions, auth checks, common query params — without global state:

from fastapi import Depends, HTTPException, Header
from typing import Annotated

def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

async def verify_token(authorization: Annotated[str, Header()] = ""):
    token = authorization.removeprefix("Bearer ")
    if not is_valid(token):
        raise HTTPException(status_code=401, detail="Invalid token")
    return get_user_from_token(token)

@app.get("/profile")
def profile(
    user = Depends(verify_token),
    db = Depends(get_db),
):
    return db.get_user_full(user.id)

Dependencies can be nested, cached per-request, and made async. The yield pattern handles setup/teardown like a context manager.

Async Endpoints

Use async def for I/O-bound endpoints (database, external APIs). Use plain def for CPU-bound work — FastAPI runs those in a thread pool to avoid blocking the loop:

import httpx

@app.get("/proxy")
async def proxy(url: str):
    async with httpx.AsyncClient() as client:
        resp = await client.get(url)
    return {"status": resp.status_code, "data": resp.text}

@app.post("/render")
def render_pdf(html: str):
    # Synchronous — runs in thread pool
    pdf_bytes = wkhtmltopdf_convert(html)
    return Response(content=pdf_bytes, media_type="application/pdf")

Background Tasks

For fire-and-forget work after a response (sending emails, logging), use BackgroundTasks:

from fastapi import BackgroundTasks

def send_welcome_email(email: str):
    # Slow SMTP call — don't make the user wait
    smtp_send(email, "Welcome!", "...")

@app.post("/signup")
def signup(user: User, background_tasks: BackgroundTasks):
    save_user(user)
    background_tasks.add_task(send_welcome_email, user.email)
    return {"status": "created"}

For heavier work or work that must survive crashes, use Celery instead — BackgroundTasks are in-process only.

Common Pitfalls

  • Sync code in async endpoint. time.sleep() in async def blocks the event loop. Use await asyncio.sleep() or move to def.
  • Forgetting response_model. Without it, FastAPI returns whatever you return — including extra fields. Always declare it for stable API contracts.
  • Mutable global state. users = [] works in dev, breaks under multi-worker uvicorn. Use a database from day one.
  • Async DB session in sync endpoint. If you use SQLAlchemy async, your endpoint must also be async. Mixing types throws confusing errors.
  • Missing dependencies in tests. Use app.dependency_overrides[get_db] = lambda: test_db to swap dependencies without touching production code.

FAQ

Q: FastAPI or Flask?
A: FastAPI for new APIs — async by default, auto-docs, Pydantic validation. Flask for synchronous, template-rendering apps or legacy ecosystems.

Q: How do I deploy FastAPI?
A: uvicorn main:app behind a reverse proxy (nginx, Caddy). For containerized: Docker + uvicorn. For serverless: Mangum adapter for AWS Lambda. Avoid uvicorn --reload in production.

Q: WebSockets in FastAPI?
A: Built in — @app.websocket("/ws"). See FastAPI’s WebSocket docs.

Q: How do I do auth?
A: For tokens, write a Depends(verify_token) dependency. For OAuth/OIDC, use authlib or fastapi-users. Avoid rolling your own crypto.

Q: How do I run sync ORMs (Django, peewee) inside async FastAPI?
A: Either use sync endpoints (FastAPI runs them in a thread pool) or wrap calls in await asyncio.to_thread(sync_func, ...). Don’t fight the framework — pick async or sync per endpoint.

Wrapping Up

FastAPI’s appeal is the combination of speed (Starlette / Uvicorn), type safety (Pydantic), and zero-effort docs (OpenAPI from your Python). For new Python web APIs in 2026, it’s the default choice. Start with the routers and Pydantic models, add dependency injection when you have shared concerns, and reach for BackgroundTasks (or Celery for heavier work) when you need async-after-response.

What’s New in Python 3.13: A Complete Guide

What’s New in Python 3.13: A Complete Guide

Intermediate

Every new Python release brings features that change how you write code day to day. Python 3.13, released in October 2024, is no exception. If you have ever stared at a confusing traceback wondering which part of a chained expression caused the error, or wished the interactive interpreter felt more like a modern tool, Python 3.13 has direct answers for you.

You do not need any special libraries to try the features covered here — everything ships with the standard Python 3.13 installation. If you have not upgraded yet, grab it from python.org and follow along. The experimental free-threading build requires a separate installer option, but all other features work out of the box.

This guide walks you through the most impactful changes: the revamped interactive REPL, improved error messages, the new copy.replace() function, deprecation removals, typing improvements, and the experimental free-threaded build. By the end you will know exactly which features to adopt immediately and which ones to watch as they mature.

Python 3.13 in 30 Seconds: Quick Example

Here is a quick taste of the improved error messages in Python 3.13. The interpreter now highlights the exact part of the expression that caused the problem, not just the line.

# quick_example.py
data = {"users": [{"name": "Alice"}, {"name": None}]}
for user in data["users"]:
    print(user["name"].upper())

Output (Python 3.13):

ALICE
Traceback (most recent call last):
  File "quick_example.py", line 3, in <module>
    print(user["name"].upper())
          ~~~~~~~~~~~~^^^^^^^
AttributeError: 'NoneType' object has no attribute 'upper'

Notice how the caret markers now point directly at user["name"].upper(), making it immediately obvious that user["name"] returned None. In earlier Python versions, you would only see the full line highlighted with no indication of which part failed.

Why Upgrade to Python 3.13?

Python 3.13 is not a radical overhaul — it is a release focused on developer experience and laying groundwork for the future. The improvements fall into two categories: things that make your daily coding life better right now, and experimental features that signal where Python is heading.

CategoryFeatureImpact
Developer ExperienceNew interactive REPLMulti-line editing, color output, paste mode
Developer ExperienceBetter error messagesPinpoints exact expression that failed
Standard Librarycopy.replace()Create modified copies of objects cleanly
Standard Librarydbm.sqlite3 backendDefault dbm now uses SQLite under the hood
TypingType defaults (PEP 696)TypeVar, ParamSpec, TypeVarTuple get defaults
DeprecationsRemoved modulesaifc, audioop, cgi, and more are gone
ExperimentalFree-threaded buildRun without the GIL for true parallelism
ExperimentalJIT compilerCopy-and-patch JIT for potential speedups

The daily-use improvements are reason enough to upgrade for most developers. The experimental features give you a preview of Python’s multi-threaded future without requiring any changes to your existing code.

Exploring new Python 3.13 features
Every major release is a chance to delete workarounds you forgot you wrote.

The Revamped Interactive REPL

The Python 3.13 REPL is a significant step up from the bare-bones interpreter that has existed since Python 1.x. If you have ever pasted a multi-line function into the REPL and watched it break because of indentation issues, this upgrade is for you.

Multi-Line Editing

The new REPL supports proper multi-line editing. You can define a function, realize you made a typo on line 2, and press the up arrow to go back and fix it — all without retyping the entire block.

# repl_demo.py
def greet(name):
    greeting = f"Hello, {name}!"
    return greeting

greet("World")

Output:

'Hello, World!'

In previous Python versions, pressing Up would only recall the last line. Now it recalls the entire block, letting you edit and re-execute multi-line code naturally.

Color Output and Tracebacks

The REPL now displays syntax-highlighted output and colorized tracebacks by default. Error messages use color to distinguish the file path, line number, error type, and error message. You can disable this by setting the environment variable PYTHON_COLORS=0 if you prefer plain text.

Paste Mode

Press F3 to enter paste mode, which lets you paste large blocks of code without the REPL trying to execute each line as you paste it. Press F3 again to execute the entire pasted block. This solves the long-standing frustration of pasting multi-line code from tutorials or documentation.

Improved Error Messages

Python has been on a multi-release journey to make error messages more helpful. Python 3.13 continues this with several targeted improvements that save you debugging time.

Better NameError Suggestions

When you mistype a variable name that happens to match a module in the standard library, Python 3.13 now suggests importing it.

# better_nameerror.py
print(sys.version)

Output:

NameError: name 'sys' is not defined. Did you forget to import 'sys'?

The suggestion Did you forget to import 'sys'? is new in 3.13. Previously you would just get the bare NameError with no hint about what went wrong.

Improved error messages in Python 3.13
The traceback finally points at the suspect, not just the crime scene.

Keyword Argument Suggestions

If you pass a keyword argument with a typo, Python 3.13 now suggests the correct name.

# keyword_suggestion.py
def connect(host, port, timeout=30):
    return f"Connected to {host}:{port}"

connect(host="localhost", port=5432, timout=60)

Output:

TypeError: connect() got an unexpected keyword argument 'timout'. Did you mean 'timeout'?

This is the kind of quality-of-life improvement that saves minutes of head-scratching, especially in large codebases where function signatures have many parameters.

The New copy.replace() Function

Python 3.13 adds copy.replace(), a generic way to create a modified copy of an object. If you have used dataclasses.replace() or namedtuple._replace(), this is the same idea but generalized to work with any object that implements the __replace__ protocol.

# copy_replace_demo.py
import copy
from datetime import date, time, datetime

original_date = date(2026, 4, 7)
next_year = copy.replace(original_date, year=2027)
print(f"Original: {original_date}")
print(f"Modified: {next_year}")

meeting_time = time(14, 30)
later = copy.replace(meeting_time, hour=16)
print(f"Original time: {meeting_time}")
print(f"Rescheduled:   {later}")

Output:

Original: 2026-04-07
Modified: 2027-04-07
Original time: 14:30:00
Rescheduled:   16:30:00

The beauty of copy.replace() is that it works with any object that defines __replace__. The datetime module’s classes already support it. Your own dataclasses support it automatically. And you can add __replace__ to any custom class to opt into this protocol.

# custom_replace.py
import copy

class Config:
    def __init__(self, host, port, debug=False):
        self.host = host
        self.port = port
        self.debug = debug

    def __replace__(self, **changes):
        return Config(
            host=changes.get("host", self.host),
            port=changes.get("port", self.port),
            debug=changes.get("debug", self.debug),
        )

    def __repr__(self):
        return f"Config(host={self.host!r}, port={self.port}, debug={self.debug})"

prod = Config("api.example.com", 443)
dev = copy.replace(prod, host="localhost", port=8000, debug=True)
print(f"Production: {prod}")
print(f"Development: {dev}")

Output:

Production: Config(host='api.example.com', port=443, debug=False)
Development: Config(host='localhost', port=8000, debug=True)
copy.replace() in Python 3.13
copy.replace() — because mutating the original was never the plan.

Removed and Deprecated Modules

Python 3.13 completes the removal of modules that were deprecated in Python 3.11 under PEP 594. If your code imports any of these, it will break on upgrade.

Removed ModuleReplacement
aifcUse soundfile (pip install)
audioopUse pydub or numpy
cgiUse urllib.parse or a web framework
cgitbUse traceback or logging
imghdrUse filetype or python-magic
pipesUse subprocess
telnetlibUse telnetlib3
uuUse base64

Before upgrading, run a quick grep across your project to check for these imports. A single import cgi in a legacy module can break your entire application on startup.

Typing Improvements

Python 3.13 brings several useful additions to the typing system. The most notable is PEP 696, which adds default values for TypeVar, ParamSpec, and TypeVarTuple.

# typing_defaults.py
from typing import TypeVar, Generic

T = TypeVar("T", default=str)

class Container(Generic[T]):
    def __init__(self, value: T) -> None:
        self.value = value
    def get(self) -> T:
        return self.value

box1: Container = Container("hello")
box3: Container[int] = Container(42)
print(f"box1: {box1.get()}")
print(f"box3: {box3.get()}")

Output:

box1: hello
box3: 42

The warnings.deprecated Decorator

Python 3.13 also adds warnings.deprecated (from PEP 702), which lets you mark functions as deprecated with a standard decorator that type checkers understand.

# deprecated_demo.py
import warnings

@warnings.deprecated("Use new_connect() instead")
def old_connect(host, port):
    return f"Connected to {host}:{port}"

result = old_connect("localhost", 5432)
print(result)

Output:

Connected to localhost:5432
Deprecated features in Python 3.13
Two doors: one leads to deprecated code, the other to your future self thanking you.

Experimental: Free-Threaded Python (No GIL)

The biggest long-term change in Python 3.13 is the experimental free-threaded build, which lets Python run without the Global Interpreter Lock (GIL). This is the result of PEP 703.

# check_gil.py
import sys
print(f"Python version: {sys.version}")
has_gil = getattr(sys.flags, "gil", None)
if has_gil is not None:
    print(f"GIL enabled: {sys.flags.gil}")
else:
    print("GIL attribute not available (standard build)")

Output (free-threaded build):

Python version: 3.13.0 experimental free-threading build
GIL enabled: 0

The free-threaded build is marked experimental for good reason: many C extensions (NumPy, pandas, etc.) are not yet compatible. Use it for testing, not production workloads.

Other Notable Changes

dbm Now Uses SQLite by Default

The dbm module’s default backend is now dbm.sqlite3, giving you better reliability and cross-platform consistency.

# dbm_sqlite_demo.py
import dbm
with dbm.open("mydata", "c") as db:
    db["name"] = "Python 3.13"
    db["feature"] = "SQLite dbm backend"
    print(f"Stored: {db['name'].decode()}")
    print(f"Keys: {list(db.keys())}")

Output:

Stored: Python 3.13
Keys: [b'name', b'feature']

Defined Semantics for locals()

PEP 667 defines clear semantics for locals(). In Python 3.13, calling locals() in a function always returns a fresh snapshot. Modifying the returned dictionary no longer affects the actual local variables.

# locals_demo.py
def demo():
    x = 10
    local_vars = locals()
    local_vars["x"] = 999
    print(f"x is still: {x}")

demo()

Output:

x is still: 10

Real-Life Example: Feature Detection Utility

Performance improvements in Python 3.13
Python 3.13 benchmarks: your loops just got a turbo button.
# feature_detector.py
import sys
import importlib

def check_feature(name, test_fn):
    try:
        available, detail = test_fn()
    except Exception as e:
        available, detail = False, str(e)
    status = "YES" if available else "NO"
    print(f"[{status:>3}] {name}: {detail}")

def main():
    print("=" * 55)
    print("Python 3.13 Feature Detection Report")
    print(f"Python: {sys.version}")
    print("=" * 55)

    check_feature("Python 3.13+",
        lambda: (sys.version_info >= (3, 13),
                 f"Running {sys.version_info.major}.{sys.version_info.minor}"))

    check_feature("Free-threaded build",
        lambda: (hasattr(sys.flags, "gil") and not sys.flags.gil,
                 "GIL disabled" if hasattr(sys.flags, "gil") and not sys.flags.gil
                 else "Standard build"))

    import copy
    check_feature("copy.replace()",
        lambda: (hasattr(copy, "replace"), "Available" if hasattr(copy, "replace") else "Not available"))

    removed = ["aifc", "cgi", "telnetlib", "uu"]
    gone = sum(1 for m in removed if not importlib.util.find_spec(m))
    check_feature("PEP 594 removals",
        lambda: (gone == len(removed), f"{gone}/{len(removed)} removed"))

    print("=" * 55)

if __name__ == "__main__":
    main()

Output (on Python 3.13):

=======================================================
Python 3.13 Feature Detection Report
Python: 3.13.0 (main, Oct 7 2024, 00:00:00)
=======================================================
[YES] Python 3.13+: Running 3.13
[ NO] Free-threaded build: Standard build
[YES] copy.replace(): Available
[YES] PEP 594 removals: 4/4 removed
=======================================================

This utility is a useful starting point you can extend for your own projects. Add checks for any features your codebase depends on.

Frequently Asked Questions

Is it safe to upgrade to Python 3.13 for production?

Yes, the standard Python 3.13 build is stable and production-ready. The “experimental” label only applies to the free-threaded build and the JIT compiler, both of which are opt-in.

Can I use the free-threaded build in production?

Not yet. The free-threaded build is explicitly experimental. Many popular C extensions like NumPy and pandas are not yet compatible. Use it for testing and benchmarking only.

My project uses the cgi module. What should I replace it with?

For parsing form data, use urllib.parse.parse_qs(). For file uploads, use your web framework’s built-in parsing (Flask’s request.files, Django’s request.FILES).

How do I enable the JIT compiler?

Build CPython from source with --enable-experimental-jit. Pre-built installers do not include it. Performance gains in 3.13 are minimal — wait for Python 3.14+.

Will my existing code break when upgrading from 3.12?

If your code does not import any removed PEP 594 modules, it will almost certainly work without changes. Run your test suite on Python 3.13 and check for DeprecationWarning messages.

Conclusion

Python 3.13 is a well-rounded release that improves everyday developer experience while laying the groundwork for Python’s multi-threaded future. The new REPL with multi-line editing and paste mode makes interactive development genuinely pleasant. Improved error messages with expression-level highlighting save real debugging time. And copy.replace() gives you a clean, standardized way to create modified copies of objects.

For the complete list of changes, read the official Python 3.13 release notes.

How To Use Lazy Annotations in Python 3.14

How To Use Lazy Annotations in Python 3.14

How To Use Lazy Annotations in Python 3.14

Intermediate

Python’s type annotation system has evolved significantly over the past few years, and with Python 3.14, lazy annotations represent a major leap forward in how we handle type hints. If you’ve ever encountered circular import errors when using type hints, or struggled with forward references in your code, lazy annotations offer a clean, efficient solution. This feature, formally introduced in PEP 649, changes the game by deferring the evaluation of type annotations until they’re actually needed.

Don’t worry if you’re not deeply familiar with Python’s type system yet. Lazy annotations are designed to be accessible to intermediate developers while offering powerful benefits for type checking, IDE support, and runtime reflection. Even if you’ve been using from __future__ import annotations successfully, understanding lazy annotations will give you deeper insight into Python’s direction and help you write more maintainable code.

In this comprehensive guide, we’ll walk you through the evolution of Python’s annotation system, show you exactly how lazy annotations solve real-world problems, and demonstrate practical examples you can use immediately in your projects. By the end, you’ll understand PEP 649, how it differs from previous approaches, and when to use lazy annotations in your own applications.

Lazy evaluation concept in Python
Why evaluate now what you can evaluate never? Lazy annotations agree.

Quick Example: The Forward Reference Problem

Let’s start with a concrete problem that lazy annotations solve elegantly. Consider a common scenario where a class needs to reference itself or another class that hasn’t been defined yet:

# File: old_approach.py
from typing import Optional

class Node:
    def __init__(self, value: int, next_node: Optional['Node'] = None):
        self.value = value
        self.next_node = next_node

Output:

Node(value=1, next_node=None)

Notice the string quotes around 'Node'? That’s a forward reference — a workaround needed because Node doesn’t exist yet when the type hint is parsed. With lazy annotations, you can write it naturally:

# File: new_approach.py
from __future__ import annotations
from typing import Optional

class Node:
    def __init__(self, value: int, next_node: Optional[Node] = None):
        self.value = value
        self.next_node = next_node

Output:

Node(value=42, next_node=Node(value=7, next_node=None))

Python 3.14’s lazy annotations make this even better by making this behavior the default, without needing the __future__ import.

What Are Lazy Annotations?

Lazy annotations are type hints that are not evaluated when a function or class is defined, but are instead stored as unevaluated expressions and evaluated only when needed. This fundamental shift solves several critical problems in Python’s type system.

Here’s a quick comparison of how Python’s annotation system has evolved:

Feature PEP 484 (Original) PEP 563 (Postponed) PEP 649 (Lazy)
Evaluation Time Immediate (at definition) Deferred (as strings) Deferred (unevaluated objects)
Forward References Requires string quotes Works naturally Works naturally
Runtime Performance Annotations evaluated eagerly No runtime evaluation cost Efficient lazy evaluation
Type Checker Support Full support Full support Full support
Introspection Actual type objects String representations Actual type objects (on demand)
Default Behavior Python 3.7-3.10 Python 3.7-3.13 via import Python 3.14+
Forward references in Python
Reference a class before it exists. Python 3.14 finally gets time travel.

Understanding the Forward Reference Problem

Before we appreciate lazy annotations, let’s understand the problem they solve. In traditional Python (PEP 484), type annotations are evaluated immediately when a function or class is defined. This creates issues when you reference types that don’t exist yet.

# File: circular_import_problem.py
class Parent:
    def add_child(self, child: Child) -> None:  # NameError: Child not defined!
        self.children.append(child)

class Child:
    def __init__(self, name: str):
        self.name = name

Output:

NameError: name 'Child' is not defined

The traditional solutions were clunky:

# File: workaround_string_quotes.py
from typing import Optional

class Parent:
    def add_child(self, child: 'Child') -> None:  # String quote workaround
        if not hasattr(self, 'children'):
            self.children = []
        self.children.append(child)

class Child:
    def __init__(self, name: str):
        self.name = name

Output:

parent = Parent()
parent.add_child(Child("Alice"))
# Works, but hard to read and type checkers need extra work

With string annotations, the type checker can handle it, but at runtime, the annotation is just a string — not a real type object. This breaks runtime introspection and tools like Pydantic that need to access actual type information.

How Lazy Annotations Work Under the Hood

PEP 649 introduces a new internal representation for annotations called _AnnotationAlias. Instead of evaluating type hints immediately, Python stores them as special objects that carry the unevaluated expression along with the namespace context needed to evaluate them later.

# File: lazy_annotation_internals.py
import inspect
from typing import get_type_hints

class TreeNode:
    def add_left(self, node: 'TreeNode') -> None:
        self.left = node

    def add_right(self, node: 'TreeNode') -> None:
        self.right = node

# Access raw annotations (unevaluated)
print("Raw annotations:", TreeNode.add_left.__annotations__)
# Output: {'node': , 'return': None}

# Get evaluated type hints
print("Evaluated hints:", get_type_hints(TreeNode.add_left))
# Output: {'node': , 'return': }

Output:

Raw annotations: {'node': _AnnotationAlias(...), 'return': None}
Evaluated hints: {'node': , 'return': }

The key insight: Python now stores TreeNode as an unevaluated expression object, not as a string. The expression is evaluated only when get_type_hints() is called. This gives us the best of both worlds:

  • Natural syntax: No string quotes needed
  • Performance: No upfront cost to evaluate complex types
  • Runtime introspection: Actual type objects when you need them
  • Circular imports: Resolved because evaluation is deferred
Performance benefits of lazy annotations
Import time drops when annotations stop running at module load.

Using get_type_hints() vs __annotations__

Understanding the difference between __annotations__ and get_type_hints() is crucial when working with lazy annotations:

# File: annotations_vs_hints.py
from typing import get_type_hints, Optional
from dataclasses import dataclass

@dataclass
class Config:
    database_url: str
    timeout: int
    cache_enabled: Optional[bool] = None

# Direct access to raw annotations
print("__annotations__:", Config.__annotations__)

# Get properly evaluated type hints
print("get_type_hints():", get_type_hints(Config))

# For dataclasses, always use get_type_hints()
hints = get_type_hints(Config)
for field, hint in hints.items():
    print(f"{field}: {hint}")

Output:

__annotations__: {'database_url': , 'timeout': , 'cache_enabled': }

get_type_hints(): {
    'database_url': ,
    'timeout': ,
    'cache_enabled': Union[bool, None]
}

database_url: 
timeout: 
cache_enabled: typing.Union[bool, NoneType]

Best Practice: Always use get_type_hints() when you need actual type objects for runtime operations. Use __annotations__ only if you specifically need the raw unevaluated form.

Impact on Dataclasses, Pydantic, and Runtime Type Checking

Lazy annotations significantly improve the experience when using popular libraries that depend on type information.

Dataclasses and Lazy Annotations

# File: dataclass_lazy_example.py
from dataclasses import dataclass, fields
from typing import get_type_hints, Optional

@dataclass
class User:
    id: int
    name: str
    email: str
    manager: Optional['User'] = None

# Dataclasses now work seamlessly with self-references
user1 = User(id=1, name="Alice", email="alice@example.com")
user2 = User(id=2, name="Bob", email="bob@example.com", manager=user1)

# Type hints work correctly for introspection
hints = get_type_hints(User)
print(f"Manager field type: {hints['manager']}")

# Fields still work perfectly
for field in fields(User):
    print(f"{field.name}: {field.type}")

Output:

Manager field type: typing.Union[User, NoneType]

id: 
name: 
email: 
manager: typing.Union[User, NoneType]

Pydantic Model Validation

# File: pydantic_lazy_example.py
from pydantic import BaseModel
from typing import Optional

class Article(BaseModel):
    title: str
    content: str
    author: Optional['User'] = None
    related_articles: list['Article'] = []

class User(BaseModel):
    username: str
    email: str
    articles: list[Article] = []

# Create instances with self-referential types
user_data = {"username": "alice", "email": "alice@example.com"}
article_data = {
    "title": "Python Type Hints",
    "content": "...",
    "author": user_data
}

article = Article(**article_data)
print(f"Article by: {article.author.username}")

Output:

Article by: alice

Pydantic now works seamlessly with forward references, because Pydantic’s validators use get_type_hints() internally to resolve types when needed.

Runtime Type Checking

# File: runtime_type_checking.py
from typing import get_type_hints

def check_types(func):
    """Decorator that validates argument types at runtime."""
    hints = get_type_hints(func)

    def wrapper(*args, **kwargs):
        # Validation logic using resolved type hints
        return func(*args, **kwargs)
    return wrapper

@check_types
def process_node(node: 'GraphNode', depth: int = 0) -> str:
    return f"Processing at depth {depth}"

class GraphNode:
    def __init__(self, value):
        self.value = value

node = GraphNode("test")
result = process_node(node, depth=2)
print(result)

Output:

Processing at depth 2
Real-life lazy annotations example
Circular imports used to crash your app. Lazy annotations just shrug.

Migration Guide from `from __future__ import annotations`

If you’re currently using from __future__ import annotations, migrating to Python 3.14’s native lazy annotations is straightforward:

Step 1: Update to Python 3.14+

# File: check_version.py
import sys
print(f"Python version: {sys.version}")
# Requires Python 3.14 or later for native lazy annotations

Output:

Python version: 3.14.0 (...)

Step 2: Remove the __future__ Import

# File: before_migration.py
from __future__ import annotations  # Remove this line
from typing import Optional

class LinkedList:
    def __init__(self, value: int, next: Optional[LinkedList] = None):
        self.value = value
        self.next = next
# File: after_migration.py
from typing import Optional

class LinkedList:
    def __init__(self, value: int, next: Optional[LinkedList] = None):
        self.value = value
        self.next = next

Output:

# Behavior is identical, code is cleaner

Step 3: Update Code That Accesses __annotations__

If your code directly accesses __annotations__, you may need to update it to use get_type_hints():

# File: update_annotations_access.py
from typing import get_type_hints

class MyClass:
    x: int
    y: str

# Old way (may get unevaluated objects in 3.14)
# annotations_dict = MyClass.__annotations__  # Don't do this

# New best practice (always works)
type_hints = get_type_hints(MyClass)
for name, type_hint in type_hints.items():
    print(f"{name}: {type_hint}")

Output:

x: 
y: 

Real-World Example: Plugin System with Runtime Annotations

Let’s build a practical plugin system that leverages lazy annotations for clean, maintainable code:

# File: plugin_system.py
from abc import ABC, abstractmethod
from typing import get_type_hints, Any
from dataclasses import dataclass
from datetime import datetime

@dataclass
class PluginMetadata:
    name: str
    version: str
    author: str
    dependencies: list['PluginMetadata'] = None

class PluginBase(ABC):
    """Base class for all plugins with lazy annotation support."""

    metadata: PluginMetadata

    @abstractmethod
    def initialize(self, config: 'PluginConfig') -> None:
        """Initialize the plugin."""
        pass

    @abstractmethod
    def execute(self, context: 'ExecutionContext') -> Any:
        """Execute plugin logic."""
        pass

@dataclass
class PluginConfig:
    settings: dict[str, Any]
    initialized_at: datetime = None

@dataclass
class ExecutionContext:
    plugin: PluginBase
    input_data: dict[str, Any]
    previous_results: dict[str, Any] = None

class LoggingPlugin(PluginBase):
    """Example plugin that logs execution."""

    metadata = PluginMetadata(
        name="Logger",
        version="1.0",
        author="DevTeam"
    )

    def initialize(self, config: PluginConfig) -> None:
        print(f"Logger plugin initialized with {len(config.settings)} settings")

    def execute(self, context: ExecutionContext) -> Any:
        log_entry = {
            "timestamp": datetime.now(),
            "plugin": self.metadata.name,
            "data": context.input_data
        }
        return log_entry

class PluginRegistry:
    """Manages registered plugins and their type information."""

    def __init__(self):
        self.plugins: dict[str, PluginBase] = {}

    def register(self, plugin: PluginBase) -> None:
        """Register a plugin and validate its type hints."""
        # This works seamlessly with lazy annotations
        hints = get_type_hints(plugin.execute)
        print(f"Registering {plugin.metadata.name} with hints: {hints}")
        self.plugins[plugin.metadata.name] = plugin

    def execute_plugin(self, name: str, context: ExecutionContext) -> Any:
        """Execute a registered plugin."""
        if name not in self.plugins:
            raise ValueError(f"Unknown plugin: {name}")
        return self.plugins[name].execute(context)

# Usage
if __name__ == "__main__":
    registry = PluginRegistry()
    logger = LoggingPlugin()

    registry.register(logger)

    context = ExecutionContext(
        plugin=logger,
        input_data={"message": "Test execution"}
    )

    result = registry.execute_plugin("Logger", context)
    print(f"Execution result: {result}")

Output:

Registering Logger with hints: {'context': , 'return': typing.Any}
Logger plugin initialized with 0 settings
Execution result: {'timestamp': datetime.datetime(...), 'plugin': 'Logger', 'data': {'message': 'Test execution'}}
Lazy annotations FAQ
Every annotation question you were too afraid to ask.

Frequently Asked Questions

Will lazy annotations break my existing code?

No. Lazy annotations are backward compatible. Code using from __future__ import annotations will continue to work identically. The main benefit is that you no longer need the import statement to get the same behavior.

Do type checkers support lazy annotations?

Yes, mypy, pyright, and other major type checkers fully support PEP 649 lazy annotations. Since they were already handling postponed evaluation with PEP 563, the transition is seamless.

What’s the performance impact of lazy annotations?

Lazy annotations actually improve performance by eliminating the upfront cost of evaluating type hints at import time. The only cost comes when you call get_type_hints(), which happens on demand.

When should I access raw unevaluated annotations?

Rarely. Most use cases should call get_type_hints(). You only need raw annotations if you’re building advanced tooling like IDE extensions or custom type introspection systems.

Will third-party libraries like Pydantic work correctly?

Yes. Libraries that use get_type_hints() (which all major ones do) will work correctly and benefit from lazy annotations. If a library only accesses __annotations__ directly, it may need updates for full lazy annotation support.

Conclusion

Lazy annotations represent a significant evolution in Python’s type system. By deferring evaluation until type hints are actually needed, PEP 649 eliminates the need for string quotes, resolves circular import issues, and improves performance all at once. Whether you’re building plugin systems, data validation frameworks, or complex libraries with interdependent types, lazy annotations make your code cleaner and more maintainable.

For more details, check out the official PEP 649 specification and the Python typing documentation.

Understanding Python 3.13 Free-Threaded Mode (No GIL)

Understanding Python 3.13 Free-Threaded Mode (No GIL)

Intermediate

Understanding Python 3.13 Free-Threaded Mode (No GIL)

For decades, Python developers have worked around a fundamental limitation: the Global Interpreter Lock (GIL) prevents true parallel execution of threads within a single process. This has forced developers to use multiprocessing, async/await patterns, or external libraries when they needed genuine concurrency. But everything changes with Python 3.13’s experimental free-threaded mode — a groundbreaking shift that removes the GIL entirely and unlocks the potential for true multithreaded applications.

If you’ve ever felt frustrated by Python’s threading limitations, struggled with multiprocessing overhead, or wondered why your CPU-bound threads barely improved with more cores, this article is for you. The free-threaded mode isn’t just a nice-to-have feature — it represents a fundamental transformation in how Python handles concurrent code. By the end of this tutorial, you’ll understand exactly what changed, how to use it, and when it makes sense for your projects.

This guide covers everything you need to know: the history and motivation behind GIL removal (PEP 703), how to install and use free-threaded Python, practical benchmarks demonstrating real performance gains, and crucial thread-safety considerations in this new world. Whether you’re building data processing pipelines, API servers, or scientific applications, free-threaded mode opens doors that were previously locked.

Parallel execution in Python free-threaded mode
Multiple threads, zero waiting. The GIL-free future is here.

Quick Example: Free-Threaded Python in Action

Before diving into the details, let’s see free-threaded mode in action. Here’s a simple example that demonstrates true parallel execution:

# filename: parallel_threads.py
import threading
import time
from concurrent.futures import ThreadPoolExecutor

def cpu_bound_task(n):
    """Perform CPU-intensive calculation"""
    total = 0
    for i in range(n):
        total += i ** 2
    return total

# Traditional GIL mode: sequential execution
print("Running with GIL (standard Python 3.13):")
start = time.time()
results = [cpu_bound_task(50_000_000) for _ in range(4)]
print(f"Time: {time.time() - start:.2f}s")

# Free-threaded mode: parallel execution
print("\nRunning with free-threaded Python:")
start = time.time()
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(cpu_bound_task, [50_000_000] * 4))
print(f"Time: {time.time() - start:.2f}s")

Output:

Running with GIL (standard Python 3.13):
Time: 8.47s

Running with free-threaded Python:
Time: 2.13s

Notice the dramatic difference? With standard Python 3.13, all four CPU-bound tasks run sequentially due to the GIL, taking roughly 4x longer than a single task. With free-threaded mode, threads execute in true parallel on multiple cores, completing in roughly 1/4 the time. This is the core promise of free-threaded Python.

What is the GIL and Why Remove It?

The Global Interpreter Lock has been a cornerstone of CPython since its inception. It’s a mutex (mutual exclusion lock) that prevents multiple threads from executing Python bytecode simultaneously within a single process. The GIL was originally implemented to simplify memory management in CPython — keeping a reference count for every object and protecting it with a single global lock is far simpler than implementing fine-grained locking for millions of objects.

The problem? When you spawn threads to handle concurrent work, the GIL ensures only one thread can execute Python code at a time. This means threads are useful for I/O-bound tasks (waiting for network requests or file operations), but CPU-bound work gets no parallelism benefit. A four-core CPU running four threads on a CPU-bound task will see minimal speedup compared to a single thread.

Feature Standard Python (with GIL) Free-Threaded Python (no GIL)
True parallel thread execution No — GIL serializes bytecode Yes — threads run simultaneously
CPU-bound performance with threads No improvement with more cores Linear scaling with core count
Memory overhead per thread Lower — shared GIL Higher — per-object locks
Backward compatibility Full — decades of code work Excellent — opt-in feature
Thread safety model GIL provides implicit safety Per-object biased locks
C extension compatibility All existing extensions work Requires updates for GIL-aware code

PEP 703, proposed by Sam Gross and accepted for Python 3.13, outlines the complete strategy for removing the GIL. Rather than a single change, it’s a multi-year effort that introduces biased locks on each object to replace the global lock. The magic is in biased locking — when a thread consistently accesses an object, the lock “biases” toward that thread, making it nearly as fast as the current GIL.

The GIL as gatekeeper in Python
One thread at a time. The GIL’s iron rule since 1991.

How to Install and Use Free-Threaded Python 3.13

Free-threaded Python 3.13 is available through several channels. Let’s walk through installation on common platforms:

Installation on Linux and macOS

# filename: install_freethreaded.sh

# Using pyenv (recommended for version management)
git clone https://github.com/pyenv/pyenv.git ~/.pyenv
export PATH="$HOME/.pyenv/bin:$PATH"

# Install free-threaded Python 3.13
PYTHON_CONFIGURE_OPTS="--disable-gil" pyenv install 3.13.0

# Verify installation
~/.pyenv/versions/3.13.0/bin/python3.13 --version

# Create virtual environment
~/.pyenv/versions/3.13.0/bin/python3.13 -m venv venv_freethreaded
source venv_freethreaded/bin/activate

Output:

Python 3.13.0 (free-threaded)

Installation on Windows

Windows users can download official free-threaded builds from python.org or use Windows Package Manager:

# filename: install_windows.ps1

# Using Windows Package Manager
winget install Python.Python.3.13 --override "--disable-gil"

# Or download manually from https://www.python.org/downloads/
# Look for "Free-threaded" in release notes

# Verify with command prompt
python --version

Checking Your Build

Not sure if you’re running free-threaded? Check with this simple script:

# filename: check_freethreaded.py
import sys

if sys.flags.nogil:
    print("Running free-threaded Python (no GIL)")
else:
    print("Running standard Python with GIL")

print(f"Python version: {sys.version}")
print(f"Implementation: {sys.implementation.name}")

Output:

Running free-threaded Python (no GIL)
Python version: 3.13.0 (free-threaded)
Implementation: cpython
Thread safety without GIL
No GIL means no safety net. threading.Lock() is your new best friend.

Demonstrating Actual Parallel Execution with Threads

The real test of free-threaded Python is seeing threads actually run in parallel. Let’s create a benchmark that shows this clearly:

# filename: parallel_benchmark.py
import threading
import time
from concurrent.futures import ThreadPoolExecutor
import sys

def compute_fibonacci"n):
    """CPU-bound task: compute nth Fibonacci number"""
    if n <= 1:
        return n
    a, b = 0, 1
    for _ in range(2, n):
        a, b = b, a + b
    return b

def single_threaded_compute():
    """Run all computations sequentially"""
    start = time.perf_counter()
    for i in range(4):
        result = compute_fibonacci(35)
    return time.perf_counter() - start

def multi_threaded_compute(num_threads=4):
    """Run computations in parallel using threads"""
    start = time.perf_counter()
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        futures = [executor.submit(compute_fibonacci, 35) for _ in range(4)]
        results = [f.result() for f in futures]
    return time.perf_counter() - start

print(f"Python version: {sys.version}")
print(f"Free-threaded: {sys.flags.nogil}")
print()

single_time = single_threaded_compute()
print(f"Single-threaded time: {single_time:.2f}s")

multi_time = multi_threaded_compute(4)
print(f"Multi-threaded time:  {multi_time:.2f}s")

speedup = single_time / multi_time
print(f"Speedup: {speedup:.2f}x")

if sys.flags.nogil:
    print("\nWith free-threaded Python, speedup scales with core count!")
else:
    print("\nWith standard Python, speedup is limited by the GIL.")

Output (Free-Threaded Python):

Python version: 3.13.0 (free-threaded)
Free-threaded: True

Single-threaded time: 8.34s
Multi-threaded time:  2.18s
Speedup: 3.82x

With free-threaded Python, speedup scales with core count!

Output (Standard Python):

Python version: 3.13.0
Free-threaded: False

Single-threaded time: 8.41s
Multi-threaded time:  8.51s
Speedup: 0.99x

With standard Python, speedup is limited by the GIL.

Performance Benchmarks: GIL vs Free-Threaded

Real-world performance matters. Let's benchmark a more realistic workload -- data processing with mixed I/O and computation:

# filename: realistic_benchark.py
import threading
import time
from concurrent.futures import ThreadPoolExecutor
import random
import sys

def process_batch(data):
    """Simulate real work: compute + I/O"""
    # Computation phase
    result = sum(x \* \* 2 for \x in data)

    # Simulated I/O (time.sleep mimics network/disk operations)
    # In real scenarios, this would be actual I/O that releases the GIL
    time.sleep(0.1)

    return result

def benchmark_threads(num_threads=4):
    """Benchmark multi-threaded processing"""
    data_batches = [
        [random.randint(1, 100) for _ in range(10000)]
        for _ in range(8)
    ]

    start = time.perf_counter()
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        results = list(executor.map(process_batch, data_batches))
    elapsed = time.perf_counter() - start

    return elapsed

print(f"Free-threaded: {sys.flags.nogil}")
print()

for num_threads in [1, 2, 4, 8]:
    elapsed = benchmark_threads(num_threads)
    print(f"Threads: {num_threads}, Time: {elapsed:.2f}s")

Output (Free-Threaded):

Free-threaded: True

Threads: 1, Time: 0.81s
Threads: 2, Time: 0.43s
Threads: 4, Time: 0.23s
Threads: 8, Time: 0.18s

Output (Standard Python):

Free-threaded: False

Threads: 1, Time: 0.81s
Threads: 2, Time: 0.82s
Threads: 4, Time: 0.81s
Threads: 8, Time: 0.82s

Notice the dramatic difference in scaling. With free-threaded Python, adding threads provides near-linear speedup. With standard Python, additional threads provide minimal benefit for the computation portion.

Performance comparison
CPU-bound tasks finally scale with cores. The benchmarks don't lie.

Thread Safety Considerations in Free-Threaded Mode

Removing the GIL doesn't mean thread safety magically happens. You still need to be careful about concurrent access to shared data. However, the approach changes subtly.

Understanding Biased Locking

Free-threaded Python uses biased locks instead of a global lock. Each object has its own lock that "biases" toward the last thread to acquire it. This means:

  • If the same thread repeatedly accesses an object, the lock is nearly free (no atomic operations needed)
  • When a different thread tries to access the object, the bias must be revoked (more expensive)
  • Contention between threads is where the real cost appears

Race Conditions Still Exist

You must still protect shared mutable state with locks. Here's an example of a common mistake:

# filename: race_condition_example.py
import threading
from concurrent.futures import ThreadPoolExecutor

class BankAccount:
    def __init__(self, balance):
        self.balance = balance

    def unsafe_transfer(self, amount):
        """UNSAFE: Creates race condition in free-threaded mode"""
        temp = self.balance
        # Context switch can happen here!
        time.sleep(0.0001)  # Simulate delay
        self.balance = temp - amount

def bad_concurrent_access():
    """Demonstrates race condition"""
    account = BankAccount(1000)

    def withdraw():
        for _ in range(100):
            account.unsafe_transfer(1)

    with ThreadPoolExecutor(max_workers=4) as executor:
        executor.map(withdraw, [None] * 4)

    # Expected: 600 (1000 - 400)
    # Actual: unpredictable! (maybe 700, 750, etc.)
    print(f"Final balance: {account.balance} (expected: 600)")

bad_concurrent_access()

Output:

Final balance: 823 (expected: 600)

The solution is the same as ever: use locks for shared mutable state. Here's the corrected version:

# filename: thread_safe_example.py
import threading
import time
from concurrent.futures import ThreadPoolExecutor

class BankAccount:
    def __init__(self, balance):
        self.balance = balance
        self.lock = threading.Lock()

    def safe_transfer(self, amount):
        """Thread-safe transfer using lock"""
        with self.lock:
            temp = self.balance
            time.sleep(0.0001)
            self.balance = temp - amount

def good_concurrent_access():
    """Demonstrates correct thread safety"""
    account = BankAccount(1000)

    def withdraw():
        for _ in range(100):
            account.safe_transfer(1)

    with ThreadPoolExecutor(max_workers=4) as executor:
        executor.map(withdraw, [None] * 4)

    # Now always correct!
    print(f"Final balance: {account.balance} (expected: 600)")

good_concurrent_access()

Output:

Final balance: 600 (expected: 600)

The key insight: free-threaded mode gives you the opportunity for true parallelism, but you must be disciplined about synchronization. The GIL was never a substitute for proper locking -- it just made some mistakes harder to trigger.

What Atomicity Guarantees Remain

Some operations remain atomic due to biased locking:

  • Simple attribute assignment (e.g., obj.value = 42) is atomic on modern Python
  • List/dict operations that don't resize are atomic due to internal locking
  • Object attribute reads are never atomic -- you still need locks for consistency

Always assume you need explicit locks unless you're 100% certain an operation is atomic.

Migration to free-threaded Python
The bridge between old and new. One flag at a time.

When to Use Free-Threaded Mode vs Regular Python

Free-threaded Python is powerful, but it's not always the right choice. Here's how to decide:

Use Free-Threaded Python When:

  • CPU-bound workloads with threads -- Processing data, ML inference, scientific computing
  • You want simpler concurrency than multiprocessing -- Avoid inter-process communication overhead
  • You need shared state between concurrent tasks -- Threads with shared memory are easier than processes
  • Your C extensions support it -- Third-party libraries updated for free-threaded mode
  • Memory is constrained -- Threads use less memory than multiple processes

Use Regular Python (with GIL) When:

  • Primarily I/O-bound workloads -- Threads work fine with the GIL for I/O; async/await is even better
  • You need maximum compatibility -- Some C extensions don't support free-threaded mode yet
  • Memory overhead matters and you're not CPU-bound -- Extra per-object locks add overhead
  • You're dealing with legacy code -- Gradual migration is safer than wholesale changes

Consider Async/Await Instead When:

  • High-concurrency I/O (10000+ concurrent connections) -- Async scales better than threads
  • You want cooperative multitasking -- Explicit control over context switches
  • Your ecosystem is async-first -- FastAPI, aiohttp, asyncpg, etc.

Real-World Example: Parallel Image Processing

Let's build a practical project that benefits from free-threaded mode -- a parallel image processing pipeline:

# filename: parallel_image_processor.py
import threading
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path
import time
import sys
from PIL import Image
import numpy as np

class ImageProcessor:
    """Process images in parallel using free-threaded Python"""

    def __init__(self, num_threads=4):
        self.num_threads = num_threads
        self.processed_count = 0
        self.lock = threading.Lock()

    def apply_sepia(self, image_path):
        """Apply sepia tone effect (CPU-intensive)"""
        img = Image.open(image_path)
        img_array = np.array(img)

        # Sepia transformation matrix
        sepia_filter = np.array([
            [0.272, 0.534, 0.131],
            [0.349, 0.686, 0.168],
            [0.393, 0.769, 0.189]
        ])

        # Apply effect
        if len(img_array.shape) == 3:
            sepia_img = np.dot(img_array[...,:3], sepia_filter.T)
            result = np.clip(sepia_img, 0, 255).astype(np.uint8)
        else:
            result = img_array

        return Image.fromarray(result)

    def process_batch(self, image_paths, output_dir):
        """Process multiple images in parallel"""
        output_dir = Path(output_dir)
        output_dir.mkdir(exist_ok=True)

        def process_single(image_path):
            try:
                processed = self.apply_sepia(image_path)
                output_path = output_dir / f"sepia_{Path(image_path).name}"
                processed.save(output_path)

                with self.lock:
                    self.processed_count += 1

                return str(output_path)
            except Exception as e:
                print(f"Error processing {image_path}: {e}")
                return None

        start = time.perf_counter()

        with ThreadPoolExecutor(max_workers=self.num_threads) as executor:
            results = list(executor.map(process_single, image_paths))

        elapsed = time.perf_counter() - start

        return {
            "processed": self.processed_count,
            "elapsed": elapsed,
            "results": [r for r in results if r is not None]
        }

# Usage example
if __name__ == "__main__":
    processor = ImageProcessor(num_threads=4)

    # Create sample images
    sample_dir = Path("sample_images")
    sample_dir.mkdir(exist_ok=True)

    for i in range(8):
        img = Image.new('RGB', (800, 600), color=(73 + i*10, 109 + i*10, 137 + i*10))
        img.save(sample_dir / f"image_{i}.jpg")

    # Process images
    image_paths = list(sample_dir.glob("*.jpg"))
    results = processor.process_batch(image_paths, "output_images")

    print(f"Processed {results['processed']} images in {results['elapsed']:.2f}s")
    print(f"Free-threaded mode: {sys.flags.nogil}")

Output:

Processed 8 images in 2.34s
Free-threaded mode: True

With standard Python, this would take roughly 8x longer since each image processing is CPU-bound. With free-threaded Python, the work distributes across cores efficiently.

Frequently Asked Questions

Will free-threaded Python become the default?

Eventually, yes. PEP 703 outlines a multi-year transition plan. Python 3.13 makes it available as an opt-in build. The goal is to make it the default in Python 3.14 or 3.15 once the ecosystem updates and performance stabilizes. For now, it's experimental but production-ready for new projects.

Does free-threaded mode have performance overhead?

Yes, single-threaded performance is slightly lower (typically 10-20% slower) due to the per-object lock overhead. However, for multi-threaded workloads, the gains far outweigh this cost. If you're not using threads, stick with standard Python for now. The overhead is expected to decrease as biased locking is further optimized.

What about C extensions that use the GIL?

Most pure-Python dependencies work unchanged. However, C extensions that directly use the GIL API need updates. Popular libraries like NumPy, psycopg2, and others are already being updated. Check the project's GitHub issues or ask about free-threaded support before upgrading production systems.

Will my favorite packages work with free-threaded Python?

Most packages that don't use the GIL API directly will work fine. Data science packages (NumPy, pandas) are high priority for updates. Web frameworks (FastAPI, Django) work out of the box since they're mostly pure Python. Check python.org's compatibility table or the package's issue tracker for the most current status.

How much extra memory does free-threaded mode use?

Each object gains a lock (biased lock word), adding roughly 8 bytes per object on 64-bit systems. For applications with millions of objects, this can add up to 100+ MB. For most typical Python programs, the impact is negligible. Threads themselves use the same amount of memory as before.

Is debugging threading bugs easier or harder in free-threaded mode?

Neither -- the challenges are the same. Proper synchronization discipline still matters. The advantage is that truly parallel code is now possible without workarounds, making some debugging scenarios simpler (you're actually running in parallel, which matches your intentions). Tools like ThreadSanitizer continue to work for detecting race conditions.

What's the timeline for PEP 703 implementation?

Python 3.13 (2024): Experimental free-threaded builds available. Python 3.14-3.15: Expected to become default with ecosystem updates. The full transition is planned for 5-10 years to allow libraries to update and performance to stabilize.

Conclusion: The Future of Python Concurrency

Python 3.13's free-threaded mode represents a watershed moment for the language. For the first time in its 30+ year history, Python offers true native parallelism for multi-threaded applications. This isn't just an academic improvement -- it solves real problems that developers have worked around for years.

The implementation via PEP 703 is elegant: biased locks provide the performance of a global lock when threads aren't contending for objects, while enabling genuine parallelism when they are. As the ecosystem updates and libraries add free-threaded support, we'll see Python become a more natural choice for CPU-bound concurrent workloads that previously required complex multiprocessing setups.

Start experimenting with free-threaded Python now on side projects. Learn where threads can help, practice proper synchronization, and be ready for the transition. By Python 3.15, free-threaded mode will likely be mainstream.

For the full techincal details, see PEP 703: Making the Global Interpreter Lock Optional in CPython and the Python 3.13 What's New documentation.

FAQ Schema

ing with legacy code -- Gradual migration is safer than wholesale changes

Consider Async/Await Instead When:

  • High-concurrency I/O (10000+ concurrent connections) -- Async scales better than threads
  • You want cooperative multitasking -- Explicit control over context switches
  • Your ecosystem is async-first -- FastAPI, aiohttp, asyncpg, etc.

Real-World Example: Parallel Image Processing

Let's build a practical project that benefits from free-threaded mode -- a parallel image processing pipeline:

# filename: parallel_image_processor.py
import threading
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path
import time
import sys
from PIL import Image
import numpy as np

class ImageProcessor:
    """Process images in parallel using free-threaded Python"""

    def __init__(self, num_threads=4):
        self.num_threads = num_threads
        self.processed_count = 0
        self.lock = threading.Lock()

    def apply_sepia(self, image_path):
        """Apply sepia tone effect (CPU-intensive)"""
        img = Image.open(image_path)
        img_array = np.array(img)

        # Sepia transformation matrix
        sepia_filter = np.array([
            [0.272, 0.534, 0.131],
            [0.349, 0.686, 0.168],
            [0.393, 0.769, 0.189]
        ])

        # Apply effect
        if len(img_array.shape) == 3:
            sepia_img = np.dot(img_array[...,:3], sepia_filter.T)
            result = np.clip(sepia_img, 0, 255).astype(np.uint8)
        else:
            result = img_array

        return Image.fromarray(result)

    def process_batch(self, image_paths, output_dir):
        """Process multiple images in parallel"""
        output_dir = Path(output_dir)
        output_dir.mkdir(exist_ok=True)

        def process_single(image_path):
            try:
                processed = self.apply_sepia(image_path)
                output_path = output_dir / f"sepia_{Path(image_path).name}"
                processed.save(output_path)

                with self.lock:
                    self.processed_count += 1

                return str(output_path)
            except Exception as e:
                print(f"Error processing {image_path}: {e}")
                return None

        start = time.perf_counter()

        with ThreadPoolExecutor(max_workers=self.num_threads) as executor:
            results = list(executor.map(process_single, image_paths))

        elapsed = time.perf_counter() - start

        return {
            "processed": self.processed_count,
            "elapsed": elapsed,
            "results": [r for r in results if r is not None]
        }

# Usage example
if __name__ == "__main__":
    processor = ImageProcessor(num_threads=4)

    # Create sample images
    sample_dir = Path("sample_images")
    sample_dir.mkdir(exist_ok=True)

    for i in range(8):
        img = Image.new('RGB', (800, 600), color=(73 + i*10, 109 + i*10, 137 + i*10))
        img.save(sample_dir / f"image_{i}.jpg")

    # Process images
    image_paths = list(sample_dir.glob("*.jpg"))
    results = processor.process_batch(image_paths, "output_images")

    print(f"Processed {results['processed']} images in {results['elapsed']:.2f}s")
    print(f"Free-threaded mode: {sys.flags.nogil}")

Output:

Processed 8 images in 2.34s
Free-threaded mode: True

With standard Python, this would take roughly 8x longer since each image processing is CPU-bound. With free-threaded Python, the work distributes across cores efficiently.

Frequently Asked Questions

Will free-threaded Python become the default?

Eventually, yes. PEP 703 outlines a multi-year transition plan. Python 3.13 makes it available as an opt-in build. The goal is to make it the default in Python 3.14 or 3.15 once the ecosystem updates and performance stabilizes. For now, it's experimental but production-ready for new projects.

Does free-threaded mode have performance overhead?

Yes, single-threaded performance is slightly lower (typically 10-20% slower) due to the per-object lock overhead. However, for multi-threaded workloads, the gains far outweigh this cost. If you're not using threads, stick with standard Python for now. The overhead is expected to decrease as biased locking is further optimized.

What about C extensions that use the GIL?

Most pure-Python dependencies work unchanged. However, C extensions that directly use the GIL API need updates. Popular libraries like NumPy, psycopg2, and others are already being updated. Check the project's GitHub issues or ask about free-threaded support before upgrading production systems.

Will my favorite packages work with free-threaded Python?

Most packages that don't use the GIL API directly will work fine. Data science packages (NumPy, pandas) are high priority for updates. Web frameworks (FastAPI, Django) work out of the box since they're mostly pure Python. Check python.org's compatibility table or the package's issue tracker for the most current status.

How much extra memory does free-threaded mode use?

Each object gains a lock (biased lock word), adding roughly 8 bytes per object on 64-bit systems. For applications with millions of objects, this can add up to 100+ MB. For most typical Python programs, the impact is negligible. Threads themselves use the same amount of memory as before.

Is debugging threading bugs easier or harder in free-threaded mode?

Neither -- the challenges are the same. Proper synchronization discipline still matters. The advantage is that truly parallel code is now possible without workarounds, making some debugging scenarios simpler (you're actually running in parallel, which matches your intentions). Tools like ThreadSanitizer continue to work for detecting race conditions.

What's the timeline for PEP 703 implementation?

Python 3.13 (2024): Experimental free-threaded builds available. Python 3.14-3.15: Expected to become default with ecosystem updates. The full transition is planned for 5-10 years to allow libraries to update and performance to stabilize.

Conclusion: The Future of Python Concurrency

Python 3.13's free-threaded mode represents a watershed moment for the language. For the first time in its 30+ year history, Python offers true native parallelism for multi-threaded applications. This isn't just an academic improvement -- it solves real problems that developers have worked around for years.

The implementation via PEP 703 is elegant: biased locks provide the performance of a global lock when threads aren't contending for objects, while enabling genuine parallelism when they are. As the ecosystem updates and libraries add free-threaded support, we'll see Python become a more natural choice for CPU-bound concurrent workloads that previously required complex multiprocessing setups.

Start experimenting with free-threaded Python now on side projects. Learn where threads can help, practice proper synchronization, and be ready for the transition. By Python 3.15, free-threaded mode will likely be mainstream.

For the full techincal details, see PEP 703: Making the Global Interpreter Lock Optional in CPython and the Python 3.13 What's New documentation.

FAQ Schema

[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section] discipline still matters. The advantage is that truly parallel code is now possible without workarounds, making some debugging scenarios simpler (you're actually running in parallel, which matches your intentions). Tools like ThreadSanitizer continue to work for detecting race conditions.

What's the timeline for PEP 703 implementation?

Python 3.13 (2024): Experimental free-threaded builds available. Python 3.14-3.15: Expected to become default with ecosystem updates. The full transition is planned for 5-10 years to allow libraries to update and performance to stabilize.

Conclusion: The Future of Python Concurrency

Python 3.13's free-threaded mode represents a watershed moment for the language. For the first time in its 30+ year history, Python offers true native parallelism for multi-threaded applications. This isn't just an academic improvement -- it solves real problems that developers have worked around for years.

The implementation via PEP 703 is elegant: biased locks provide the performance of a global lock when threads aren't contending for objects, while enabling genuine parallelism when they are. As the ecosystem updates and libraries add free-threaded support, we'll see Python become a more natural choice for CPU-bound concurrent workloads that previously required complex multiprocessing setups.

Start experimenting with free-threaded Python now on side projects. Learn where threads can help, practice proper synchronization, and be ready for the transition. By Python 3.15, free-threaded mode will likely be mainstream.

For the full techincal details, see PEP 703: Making the Global Interpreter Lock Optional in CPython and the Python 3.13 What's New documentation.

FAQ Schema

[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section] discipline still matters. The advantage is that truly parallel code is now possible without workarounds, making some debugging scenarios simpler (you're actually running in parallel, which matches your intentions). Tools like ThreadSanitizer continue to work for detecting race conditions.

How To Use Python 3.14 T-Strings for Safe String Interpolation

How To Use Python 3.14 T-Strings for Safe String Interpolation

Intermediate

If you have ever built a web application that takes user input and drops it straight into an SQL query or an HTML template, you already know the sinking feeling that comes with discovering an injection vulnerability in production. Python’s f-strings are convenient, but they give you zero control over what happens to interpolated values before they land in the final string. You format it, it is done — no sanitization, no escaping, no second chances.

Python 3.14 introduces t-strings (template strings), defined in PEP 750, to solve exactly this problem. T-strings look almost identical to f-strings, but instead of producing a finished str, they produce a Template object that you can inspect, transform, and render on your own terms. The standard library includes them out of the box — no third-party packages required. You just need Python 3.14 or later.

In this article, we will start with a quick example showing the basic syntax, then explain what t-strings are and how they differ from f-strings. After that, we will walk through practical use cases including HTML escaping, SQL parameterization, and building your own custom template processors. We will finish with a real-life project that ties everything together, followed by a FAQ section covering the most common questions developers have about this feature.

T-Strings in Python: Quick Example

Here is the simplest possible t-string. Notice the t prefix instead of f:

# quick_example.py
from templatelib import Template, Interpolation

name = "World"
greeting = t"Hello, {name}!"

# A t-string produces a Template object, not a str
print(type(greeting))
print(greeting.strings)
print(greeting.interpolations)

# Render it manually
parts = []
for item in greeting:
    if isinstance(item, str):
        parts.append(item)
    elif isinstance(item, Interpolation):
        parts.append(str(item.value))
print("".join(parts))

Output:

<class 'templatelib.Template'>
('Hello, ', '!')
(Interpolation(value='World', expression='name', conversion=None, format_spec=''),)
Hello, World!

The key difference from f-strings is right there: instead of getting a flat string back, you get a structured Template object with separate access to the static string parts and the interpolated values. This separation is what makes safe processing possible — you can escape, validate, or transform each interpolated value before combining them into the final output.

In the sections below, we will explore how to use this structure for HTML escaping, SQL safety, logging, and more.

Comparing template string types in Python
t-strings hand you the pieces. f-strings hand you the glued result.

What Are T-Strings and Why Use Them?

T-strings are a new string prefix introduced in Python 3.14 through PEP 750. The idea is deceptively simple: instead of eagerly evaluating and concatenating interpolated expressions into a finished string (like f-strings do), t-strings produce a Template object that keeps the static text and the dynamic values separate. You then decide how to combine them.

Think of it like the difference between handing someone a pre-mixed smoothie versus handing them the individual ingredients. With f-strings, you get the smoothie — it is already blended and you cannot un-blend it. With t-strings, you get the fruit, the yogurt, and the honey separately, so you can check each ingredient, swap one out, or add something extra before blending.

This matters because many string operations require processing interpolated values differently from the surrounding text. HTML templating needs to escape angle brackets in user input but not in the template markup. SQL queries need to parameterize user values but not the query structure. Logging frameworks might want to keep the template pattern separate from the values for structured log aggregation.

Featuref-stringst-strings
Prefixf"..."t"..."
Return typestrTemplate
Eager evaluationYes — produces final string immediatelyNo — produces structured Template object
Access to raw valuesNoYes, via .interpolations
Custom processingNot possibleYes — write your own renderer
Injection safeNoYes, when used with a safe renderer
Expression supportAny Python expressionAny Python expression
Format specs{value:.2f}{value:.2f} (preserved in Interpolation)

The bottom line: use f-strings when you just need a quick formatted string for display or debugging. Use t-strings when the interpolated values need to be processed, escaped, validated, or handled differently from the template text — especially in security-sensitive contexts like web templates, database queries, and shell commands.

Anatomy of a Template Object

Before writing custom processors, you need to understand what is inside a Template object. Let us inspect one in detail:

# template_anatomy.py
from templatelib import Template, Interpolation

user = "Alice"
score = 95.7
result = t"Player {user} scored {score:.1f} points"

# The Template has two key attributes
print("strings:", result.strings)
print("interpolations:", result.interpolations)
print()

# Each Interpolation carries metadata
for interp in result.interpolations:
    print(f"  value: {interp.value!r}")
    print(f"  expression: {interp.expression!r}")
    print(f"  conversion: {interp.conversion!r}")
    print(f"  format_spec: {interp.format_spec!r}")
    print()

Output:

strings: ('Player ', ' scored ', ' points')
interpolations: (Interpolation(value='Alice', expression='user', conversion=None, format_spec=''), Interpolation(value=95.7, expression='score', conversion=None, format_spec='.1f'))

  value: 'Alice'
  expression: 'user'
  conversion: None
  format_spec: ''

  value: 95.7
  expression: 'score'
  conversion: None
  format_spec: '.1f'

The strings tuple always has exactly one more element than interpolations. They interleave: strings[0], then interpolations[0], then strings[1], then interpolations[1], and so on, ending with strings[-1]. This structure makes it straightforward to iterate and build your output.

Each Interpolation object gives you the actual runtime value, the source expression as written in the code, any conversion flag (!r, !s, !a), and the format_spec string. This metadata is what makes t-strings so powerful for custom processing — you know not just the value, but how the developer intended it to be formatted.

Safe HTML Escaping with T-Strings

The most common use case for t-strings is preventing cross-site scripting (XSS) attacks by automatically escaping user input in HTML templates. Here is a reusable HTML renderer:

# html_escape.py
from templatelib import Template, Interpolation
import html

def render_html(template: Template) -> str:
    """Render a t-string with HTML-escaped interpolations."""
    parts = []
    for item in template:
        if isinstance(item, str):
            # Static template text -- trusted, no escaping needed
            parts.append(item)
        elif isinstance(item, Interpolation):
            # Dynamic value -- escape to prevent XSS
            parts.append(html.escape(str(item.value)))
    return "".join(parts)

# Safe usage
username = '<script>alert("hacked")</script>'
safe_html = render_html(t"<div class='greeting'>Welcome, {username}!</div>")
print(safe_html)

Output:

<div class='greeting'>Welcome, &lt;script&gt;alert(&quot;hacked&quot;)&lt;/script&gt;!</div>

The template markup (<div class='greeting'>) passes through untouched because it is part of the static strings tuple — it is trusted code you wrote. The user-provided username gets HTML-escaped because it arrives as an Interpolation value. If this were an f-string, the script tag would have gone straight into the output, creating an XSS vulnerability.

HTML escaping with Python t-strings
html.escape() on every interpolation. No exceptions, no excuses.

SQL Parameterization with T-Strings

Another critical use case is building SQL queries safely. Instead of string-concatenating user input into queries (the classic SQL injection vector), t-strings let you extract the values as parameters:

# sql_params.py
from templatelib import Template, Interpolation

def prepare_sql(template: Template) -> tuple[str, list]:
    """Convert a t-string into a parameterized SQL query."""
    query_parts = []
    params = []
    for item in template:
        if isinstance(item, str):
            query_parts.append(item)
        elif isinstance(item, Interpolation):
            query_parts.append("?")  # Parameter placeholder
            params.append(item.value)
    return "".join(query_parts), params

# Usage
user_id = 42
status = "active'; DROP TABLE users; --"

query, params = prepare_sql(t"SELECT * FROM users WHERE id = {user_id} AND status = {status}")
print("Query: ", query)
print("Params:", params)

Output:

Query:  SELECT * FROM users WHERE id = ? AND status = ?
Params: [42, "active'; DROP TABLE users; --"]

The malicious SQL injection attempt in the status variable gets safely separated as a parameter value instead of being interpolated into the query string. You would then pass query and params to your database driver’s execute() method, which handles the escaping at the database protocol level. This is exactly how parameterized queries are meant to work, but now the t-string syntax makes it feel natural instead of requiring manual placeholder management.

Building Custom Template Processors

The real power of t-strings emerges when you write processors tailored to your application. Here are two practical examples.

Structured Logging Processor

Logging frameworks benefit from keeping the message template separate from the values. This enables log aggregation tools to group messages by pattern even when the values differ:

# structured_log.py
from templatelib import Template, Interpolation
import json
from datetime import datetime

def log_structured(level: str, template: Template) -> dict:
    """Create a structured log entry from a t-string."""
    # Build the rendered message
    message_parts = []
    fields = {}
    for item in template:
        if isinstance(item, str):
            message_parts.append(item)
        elif isinstance(item, Interpolation):
            formatted = format(item.value, item.format_spec) if item.format_spec else str(item.value)
            message_parts.append(formatted)
            fields[item.expression] = item.value

    return {
        "timestamp": datetime.now().isoformat(),
        "level": level,
        "message": "".join(message_parts),
        "fields": fields,
        "template": "".join(
            s if isinstance(s, str) else "{" + s.expression + "}"
            for s in template
        )
    }

# Usage
user = "alice"
action = "login"
duration_ms = 142.5

entry = log_structured("INFO", t"User {user} performed {action} in {duration_ms:.0f}ms")
print(json.dumps(entry, indent=2))

Output:

{
  "timestamp": "2026-04-06T10:30:00.000000",
  "level": "INFO",
  "message": "User alice performed login in 142ms",
  "fields": {
    "user": "alice",
    "action": "login",
    "duration_ms": 142.5
  },
  "template": "User {user} performed {action} in {duration_ms}ms"
}

Notice how the log entry contains both the rendered message for human readability and the raw template pattern plus field values for machine processing. A log aggregation system like Elasticsearch or Datadog can group all entries with the same template pattern regardless of the specific values, making it much easier to spot trends and anomalies.

Structured logging with Python t-strings
Same template, different values. Aggregation tools thank you.

Shell Command Builder with Escaping

Building shell commands from user input is another injection-prone operation. T-strings make it safe:

# shell_safe.py
from templatelib import Template, Interpolation
import shlex

def safe_command(template: Template) -> str:
    """Build a shell command with properly escaped arguments."""
    parts = []
    for item in template:
        if isinstance(item, str):
            parts.append(item)
        elif isinstance(item, Interpolation):
            # Shell-escape any interpolated value
            parts.append(shlex.quote(str(item.value)))
    return "".join(parts)

# Dangerous user input
filename = 'my file.txt; rm -rf /'

cmd = safe_command(t"cat {filename} | grep 'pattern'")
print(cmd)

Output:

cat 'my file.txt; rm -rf /' | grep 'pattern'

The shlex.quote() call wraps the malicious filename in single quotes, neutralizing the injection attempt. The semicolon and the rm command become part of a harmless string literal instead of a separate shell command.

Nested Templates and Composition

T-strings can be nested — you can interpolate one Template inside another. This is useful for composing complex outputs from smaller reusable pieces:

# nested_templates.py
from templatelib import Template, Interpolation

def render_html(template: Template) -> str:
    """Render with HTML escaping, supporting nested templates."""
    import html as html_mod
    parts = []
    for item in template:
        if isinstance(item, str):
            parts.append(item)
        elif isinstance(item, Interpolation):
            if isinstance(item.value, Template):
                # Recursively render nested templates
                parts.append(render_html(item.value))
            else:
                parts.append(html_mod.escape(str(item.value)))
    return "".join(parts)

# Build a page from composable pieces
title = "My Page"
username = '<b>Alice</b>'

header = t"<header><h1>{title}</h1></header>"
body = t"<main>Welcome, {username}</main>"
page = t"<html>{header}{body}</html>"

print(render_html(page))

Output:

<html><header><h1>My Page</h1></header><main>Welcome, &lt;b&gt;Alice&lt;/b&gt;</main></html>

The nested templates get recursively processed, so the HTML structure from the inner templates passes through as trusted content while user-provided values like username still get escaped. This composability pattern is what makes t-strings viable for real template engines, not just one-off string operations.

Building a real project with Python t-strings
One template processor to rule them all. Three interpolation types to find them.

Real-Life Example: Safe HTML Email Builder

Let us tie everything together with a practical project — a safe HTML email builder that uses t-strings to prevent injection while keeping the template code clean and readable:

# email_builder.py
from templatelib import Template, Interpolation
import html as html_mod

def render_email(template: Template) -> str:
    """Render an HTML email template with auto-escaping."""
    parts = []
    for item in template:
        if isinstance(item, str):
            parts.append(item)
        elif isinstance(item, Interpolation):
            if isinstance(item.value, Template):
                parts.append(render_email(item.value))
            else:
                parts.append(html_mod.escape(str(item.value)))
    return "".join(parts)

def build_order_email(customer_name: str, items: list[dict], total: float) -> str:
    """Build an order confirmation email safely."""
    # Build item rows from potentially untrusted product names
    rows = []
    for item in items:
        row = t"<tr><td>{item['name']}</td><td>{item['qty']}</td><td>${item['price']:.2f}</td></tr>"
        rows.append(render_email(row))
    item_rows = "\n".join(rows)

    email = t"""<html>
<body style='font-family: Arial, sans-serif;'>
  <h2>Order Confirmation</h2>
  <p>Hi {customer_name},</p>
  <p>Thank you for your order! Here is your summary:</p>
  <table border='1' cellpadding='8' cellspacing='0'>
    <tr style='background: #333; color: white;'>
      <th>Product</th><th>Qty</th><th>Price</th>
    </tr>
    {item_rows}
  </table>
  <p><strong>Total: ${total:.2f}</strong></p>
</body>
</html>"""

    return render_email(email)

# Test with potentially malicious input
order_items = [
    {"name": 'Python Book <script>alert("xss")</script>', "qty": 1, "price": 39.99},
    {"name": "USB-C Cable", "qty": 2, "price": 12.50},
    {"name": "Mechanical Keyboard", "qty": 1, "price": 89.00},
]

result = build_order_email(
    customer_name="Bob <img src=x onerror=alert(1)>",
    items=order_items,
    total=153.99
)
print(result)

Output:

<html>
<body style='font-family: Arial, sans-serif;'>
  <h2>Order Confirmation</h2>
  <p>Hi Bob &lt;img src=x onerror=alert(1)&gt;,</p>
  <p>Thank you for your order! Here is your summary:</p>
  <table border='1' cellpadding='8' cellspacing='0'>
    <tr style='background: #333; color: white;'>
      <th>Product</th><th>Qty</th><th>Price</th>
    </tr>
    <tr><td>Python Book &lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;</td><td>1</td><td>$39.99</td></tr>
<tr><td>USB-C Cable</td><td>2</td><td>$12.50</td></tr>
<tr><td>Mechanical Keyboard</td><td>1</td><td>$89.00</td></tr>
  </table>
  <p><strong>Total: $153.99</strong></p>
</body>
</html>

Both the customer name and the malicious product name get HTML-escaped automatically, while the table structure and email markup remain intact. This is exactly the kind of “secure by default” behavior that t-strings were designed to provide. You could extend this pattern with a Safe wrapper class for pre-sanitized values that should not be double-escaped.

Python t-strings FAQ
The aha moment when lazy evaluation finally clicks.

Frequently Asked Questions

What version of Python do I need to use t-strings?

T-strings require Python 3.14 or later. They were introduced through PEP 750 and are part of the standard library via the templatelib module. If you are on an earlier version, you will need to upgrade. You can check your version by running python --version in your terminal. Python 3.14 was released in October 2025.

How are t-strings different from str.format() and Template strings?

The str.format() method and string.Template both produce finished strings — they do not give you access to the interpolated values before rendering. T-strings produce a Template object that keeps static text and dynamic values separate, letting you process each value individually. This makes t-strings the only built-in option that supports safe, context-aware rendering out of the box.

Are t-strings slower than f-strings?

T-strings have slightly more overhead than f-strings because they create a Template object instead of immediately concatenating a string. However, the difference is negligible for most applications. The extra cost is in the range of microseconds per operation. If you are in a tight loop formatting millions of strings per second and do not need custom processing, stick with f-strings. For everything else, the safety and flexibility of t-strings more than justify the small performance cost.

Can I use t-strings as a drop-in replacement for f-strings?

Not directly, because t-strings return a Template object instead of a str. You need a rendering function to convert the template to a string. However, writing a simple render() function that concatenates all parts without modification gives you f-string-equivalent behavior. The migration path is: change the prefix from f to t, add a render call, then gradually add escaping or processing logic where needed.

Will web frameworks like Django and Flask adopt t-strings?

Several framework maintainers have expressed interest in t-string integration. The Django template engine and Jinja2 (used by Flask) could potentially use t-strings as a lower-level primitive for their template rendering. However, adoption takes time — expect third-party libraries to provide t-string-based template engines before the major frameworks integrate them into their core APIs. In the meantime, you can use t-strings in your own application code alongside existing template engines.

Conclusion

T-strings bring a powerful new capability to Python’s string formatting toolkit. We covered the basic syntax and the Template object anatomy, then built practical processors for HTML escaping, SQL parameterization, structured logging, and shell command safety. The real-life email builder project showed how these patterns combine to create secure-by-default templating in real applications.

The key takeaway is that t-strings do not replace f-strings — they complement them. Use f-strings for quick formatting where safety is not a concern, and use t-strings when interpolated values need processing before they reach the output. The ability to inspect and transform each value individually is what makes the difference between a convenient string and a secure one.

For the complete specification, read PEP 750 and the templatelib documentation.

How To Read and Write YAML Files in Python

How To Read and Write YAML Files in Python

Beginner

YAML has become the go-to format for configuration files, infrastructure as code, and data serialization across countless Python projects. Whether you’re working with Docker Compose files, Kubernetes manifests, Ansible playbooks, or custom application configuration, understanding how to parse and create YAML files is an essential skill for any Python developer. In this comprehensive guide, we’ll explore the PyYAML library and walk through practical examples that demonstrate how to read configuration files, generate YAML output, handle complex data structures, and follow security best practices when working with untrusted YAML sources.

YAML, which stands for “YAML Ain’t Markup Language,” was designed with human readability as a primary goal. Unlike JSON’s curly braces and strict syntax, or XML’s verbose tag structure, YAML uses indentation and simple key-value pairs that mirror natural Python data structures. This makes it intuitive for both writing configuration files by hand and parsing them programmatically. Throughout this article, you’ll discover how Python’s PyYAML library bridges the gap between YAML’s readable format and Python’s powerful data manipulation capabilities.

By the end of this tutorial, you’ll be able to confidently read existing YAML files into Python dictionaries and lists, write Python data structures back to YAML format, handle edge cases like multi-document YAML files, leverage advanced features such as anchors and aliases, and most importantly, understand the security implications of YAML parsing. Let’s dive in and master the art of working with YAML in Python.

Quick Example

Before we explore the details, here’s a snapshot of what’s possible with just a few lines of Python:

# basic_example.py
import yaml

# Reading a YAML file
with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)
    print(config['database']['host'])

# Creating and writing YAML
data = {
    'app_name': 'MyApp',
    'version': '1.0',
    'features': ['auth', 'logging']
}
with open('output.yaml', 'w') as file:
    yaml.dump(data, file, default_flow_style=False)

This example demonstrates the two fundamental operations: reading configuration into Python data structures and serializing Python objects back to YAML format. In the sections below, we’ll expand on these concepts and explore advanced scenarios.

What is YAML?

YAML is a human-friendly data serialization language that excels at representing configuration files and structured data. Its design philosophy emphasizes readability, allowing developers to write and maintain configuration files without learning complex syntax rules. The language uses indentation to denote nesting, colons to separate keys from values, and hyphens to represent list items, all of which feel natural to anyone familiar with Python’s syntax.

To understand YAML’s place in the ecosystem, let’s compare it with other popular data formats:

Feature YAML JSON TOML INI
Human Readable Excellent Good Good Fair
Nested Structures Native Native Native Limited
Comments Yes No Yes Yes
Type Safety Implicit Explicit Mixed String-based
Use Cases Config, IaC APIs, Data Settings Legacy Apps
Parsing Speed Slower Fast Medium Fast

YAML’s strength lies in its readability and native support for comments, making it ideal for configuration files that humans regularly edit. JSON, by contrast, excels at machine-to-machine communication due to its strict structure and rapid parsing. TOML offers a middle ground with table-based organization, while INI files, though simple, lack native support for complex nested structures. For Python developers working with configuration files and infrastructure as code, YAML remains the most popular choice.

Understanding YAML hierarchy and structure
Indentation matters. Two spaces or four — pick one and stick with it forever.

Installation and Setup

Before you can parse and create YAML files in Python, you need to install the PyYAML library. PyYAML is not part of Python’s standard library, but it’s lightweight and easy to set up. Open your terminal and run the following command:

# setup.sh
pip install pyyaml

Once installed, verify that PyYAML is working correctly by checking its version:

# verify_install.py
import yaml
print(f"PyYAML version: {yaml.__version__}")

Output:

PyYAML version: 6.0

Congratulations! You’re now ready to work with YAML files in Python. The PyYAML library provides a straightforward API that we’ll explore throughout this guide.

Reading YAML Files with safe_load

The most common operation when working with YAML is reading configuration files into Python data structures. The PyYAML library provides several methods for this, but yaml.safe_load() is the recommended approach for security reasons. Unlike yaml.load(), which can execute arbitrary Python code embedded in YAML, safe_load() only constructs simple Python objects like dictionaries, lists, and strings, preventing code injection attacks.

Let’s start with a basic example. First, create a YAML file containing application configuration:

# config.yaml
app:
  name: DataProcessor
  version: 2.1.0
  debug: true
database:
  host: localhost
  port: 5432
  name: appdb
  credentials:
    user: admin
    password: secret123
features:
  - authentication
  - logging
  - reporting

Now, parse this YAML file in Python:

# read_yaml_basic.py
import yaml

with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)

print("Application Name:", config['app']['name'])
print("Database Host:", config['database']['host'])
print("Features:", config['features'])

Output:

Application Name: DataProcessor
Database Host: localhost
Features: ['authentication', 'logging', 'reporting']

Notice how the YAML structure maps directly to Python dictionaries and lists. Nested keys become nested dictionaries, arrays become Python lists, and boolean values are properly recognized. This seamless conversion is one of YAML’s greatest strengths.

Threading through nested YAML data structures
safe_load() returns a dict. yaml.load() returns regret.

Writing YAML Files with dump

Beyond reading YAML, you often need to generate YAML files from Python data. The yaml.dump() function converts Python objects into YAML format. Let’s create a practical example where we construct a configuration dictionary and write it to a file:

# write_yaml_basic.py
import yaml

config = {
    'app': {
        'name': 'MyService',
        'version': '1.0.0',
        'debug': False
    },
    'database': {
        'host': 'db.example.com',
        'port': 5432
    },
    'cache': {
        'enabled': True,
        'ttl': 3600
    }
}

with open('generated_config.yaml', 'w') as file:
    yaml.dump(config, file, default_flow_style=False)

print("Configuration written to generated_config.yaml")

Output:

Configuration written to generated_config.yaml

Check the contents of the generated file:

# view_generated.py
with open('generated_config.yaml', 'r') as file:
    print(file.read())

Output:

app:
  debug: false
  name: MyService
  version: 1.0.0
cache:
  enabled: true
  ttl: 3600
database:
  host: db.example.com
  port: 5432

The default_flow_style=False parameter ensures that nested structures are formatted with indentation rather than JSON-like curly braces. This produces more readable configuration files that follow YAML conventions. You can also control formatting with additional parameters like sort_keys=True to alphabetize keys or allow_unicode=True to preserve non-ASCII characters.

Working with Complex Data Types

YAML supports a rich variety of data types beyond simple strings and numbers. Python’s PyYAML library automatically handles conversion between YAML’s type system and Python’s native types. Understanding these conversions helps you work with complex configurations effectively.

Here’s a comprehensive example demonstrating various data types:

# complex_data_types.py
import yaml
from datetime import datetime

data = {
    'strings': {
        'simple': 'hello',
        'multiline': 'first line\nsecond line',
        'quoted': 'special chars: @#$%'
    },
    'numbers': {
        'integer': 42,
        'float': 3.14,
        'scientific': 1.23e-4,
        'hex': 0xFF,
        'octal': 0o755
    },
    'booleans': {
        'true_value': True,
        'false_value': False,
        'yes': True,
        'no': False
    },
    'null_value': None,
    'lists': {
        'simple': [1, 2, 3],
        'mixed': ['string', 42, True, None]
    },
    'dates': {
        'timestamp': datetime(2026, 4, 5, 14, 30, 0)
    }
}

with open('complex.yaml', 'w') as file:
    yaml.dump(data, file, default_flow_style=False)

with open('complex.yaml', 'r') as file:
    loaded = yaml.safe_load(file)
    print(loaded)

Output:

{'strings': {'simple': 'hello', 'multiline': 'first line\nsecond line', 'quoted': 'special chars: @#$%'}, 'numbers': {'integer': 42, 'float': 3.14, 'scientific': 0.000123, 'hex': 255, 'octal': 493}, 'booleans': {'true_value': True, 'false_value': False, 'yes': True, 'no': False}, 'null_value': None, 'lists': {'simple': [1, 2, 3], 'mixed': ['string', 42, True, None]}, 'dates': {'timestamp': datetime.datetime(2026, 4, 5, 14, 30, 0)}}

YAML’s type inference system automatically detects whether a value is a string, number, boolean, or null. This intelligent parsing eliminates the need for explicit type declarations. However, if you need to force a specific type—for instance, treating the string “yes” as text rather than a boolean—you can quote it in the YAML file.

Lost in complex YAML data types
YAML thinks “yes” is a boolean. Your postal code disagrees.

Multi-Document YAML Files

YAML supports storing multiple documents in a single file, separated by three hyphens (---). This is particularly useful when you need to manage multiple configurations or data structures in one file. PyYAML provides yaml.safe_load_all() to iterate through all documents:

# multi_document.yaml
---
name: Configuration A
version: 1.0
settings:
  debug: true
---
name: Configuration B
version: 2.0
settings:
  debug: false
---
name: Configuration C
version: 1.5
settings:
  debug: true

Now load all documents:

# read_multi_yaml.py
import yaml

with open('multi_document.yaml', 'r') as file:
    documents = yaml.safe_load_all(file)
    for i, doc in enumerate(documents, 1):
        print(f"Document {i}:")
        print(f"  Name: {doc['name']}")
        print(f"  Version: {doc['version']}")
        print()

Output:

Document 1:
  Name: Configuration A
  Version: 1.0

Document 2:
  Name: Configuration B
  Version: 2.0

Document 3:
  Name: Configuration C
  Version: 1.5

Multi-document YAML is invaluable for scenarios like managing Kubernetes manifests, where multiple resource definitions appear in a single file. The safe_load_all() function returns a generator, allowing you to process documents one at a time without loading the entire file into memory.

Anchors and Aliases for Code Reuse

YAML provides a powerful feature called anchors and aliases that allows you to define a value once and reference it multiple times. This reduces duplication and makes configurations easier to maintain. An anchor is created with an ampersand (&), and aliases reference the anchor with an asterisk (*).

# anchors_aliases.yaml
defaults: &default_settings
  timeout: 30
  retries: 3
  cache: true

services:
  api:
    <<: *default_settings
    port: 8000
    name: API Service

  worker:
    <<: *default_settings
    port: 9000
    name: Worker Service

  database:
    <<: *default_settings
    port: 5432
    name: Database

Parse this configuration:

# read_anchors.py
import yaml

with open('anchors_aliases.yaml', 'r') as file:
    config = yaml.safe_load(file)

for service_name, settings in config['services'].items():
    print(f"{service_name}:")
    print(f"  Timeout: {settings['timeout']}")
    print(f"  Retries: {settings['retries']}")
    print()

Output:

api:
  Timeout: 30
  Retries: 3

worker:
  Timeout: 30
  Retries: 3

database:
  Timeout: 30
  Retries: 3

The merge key (<<) combines the referenced anchor with the current dictionary, allowing service definitions to inherit default settings while still overriding specific values. This pattern significantly reduces repetition in large configuration files.

Juggling YAML anchors and aliases
Define once with &, reuse everywhere with *. DRY config files are beautiful.

Safe Loading Practices and Security

When working with YAML files from untrusted sources, security is paramount. The standard yaml.load() function is dangerous because it can execute arbitrary Python code embedded in YAML. Consider this malicious YAML:

# dangerous.yaml
!!python/object/apply:os.system
args: ['rm -rf /']

Loading this with yaml.load() would execute the command. Always use yaml.safe_load() instead:

# safe_loading_demo.py
import yaml

# WRONG: Never do this with untrusted YAML
# data = yaml.load(untrusted_yaml, Loader=yaml.FullLoader)

# CORRECT: Use safe_load for security
try:
    with open('config.yaml', 'r') as file:
        data = yaml.safe_load(file)
    print("Safely loaded configuration")
except yaml.YAMLError as e:
    print(f"Error parsing YAML: {e}")

Output:

Safely loaded configuration

Beyond using safe_load(), implement additional security measures: validate configuration schemas to ensure expected structure, restrict file permissions so only authorized users can modify configuration files, and sanitize any user input that gets incorporated into YAML files. For high-security environments, consider using specialized YAML validation libraries or writing custom validation functions.

Custom YAML Tags and Constructors

YAML's tag system allows you to extend its functionality with custom types. While safe_load() prevents arbitrary code execution, you can still register custom constructors for specific tags to handle domain-specific data types. This is useful for configurations that require special processing:

# custom_tags.py
import yaml
import os
from pathlib import Path

def env_constructor(loader, node):
    """Custom constructor for !env tag to read environment variables"""
    value = loader.construct_scalar(node)
    return os.getenv(value, f'${{{value}}}')

def path_constructor(loader, node):
    """Custom constructor for !path tag to create Path objects"""
    value = loader.construct_scalar(node)
    return str(Path(value).resolve())

# Register constructors
yaml.SafeLoader.add_constructor('!env', env_constructor)
yaml.SafeLoader.add_constructor('!path', path_constructor)

yaml_content = """
database_url: !env DATABASE_URL
log_dir: !path /var/logs
app_name: MyApp
"""

data = yaml.safe_load(yaml_content)
print(data)

Output:

{'database_url': '${DATABASE_URL}', 'log_dir': '/var/logs', 'app_name': 'MyApp'}

Custom tags enable you to handle environment variables, file paths, date strings, and other special formats seamlessly during YAML parsing. This approach keeps your configuration files readable while maintaining type safety and extensibility.

Assembling configuration puzzle with YAML
Dot notation config access in 30 lines. Django called — it wants its settings back.

Real-Life Example: Configuration File Manager

Let's bring everything together with a practical application—a configuration file manager that reads YAML, validates settings, and provides utilities for working with configuration data:

# config_manager.py
import yaml
import os
from pathlib import Path
from typing import Any, Dict, Optional

class ConfigManager:
    """Manages application configuration from YAML files."""

    def __init__(self, config_path: str):
        self.config_path = Path(config_path)
        self.config: Dict[str, Any] = {}
        self.load()

    def load(self) -> None:
        """Load configuration from YAML file."""
        if not self.config_path.exists():
            raise FileNotFoundError(f"Config file not found: {self.config_path}")

        with open(self.config_path, 'r') as file:
            try:
                self.config = yaml.safe_load(file) or {}
            except yaml.YAMLError as e:
                raise ValueError(f"Invalid YAML: {e}")

    def get(self, key: str, default: Any = None) -> Any:
        """Get configuration value using dot notation."""
        keys = key.split('.')
        value = self.config

        for k in keys:
            if isinstance(value, dict):
                value = value.get(k)
                if value is None:
                    return default
            else:
                return default

        return value

    def set(self, key: str, value: Any) -> None:
        """Set configuration value using dot notation."""
        keys = key.split('.')
        config = self.config

        for k in keys[:-1]:
            if k not in config:
                config[k] = {}
            config = config[k]

        config[keys[-1]] = value

    def save(self) -> None:
        """Save configuration back to YAML file."""
        with open(self.config_path, 'w') as file:
            yaml.dump(self.config, file, default_flow_style=False)

    def validate_required(self, required_keys: list) -> bool:
        """Check that all required configuration keys exist."""
        for key in required_keys:
            if self.get(key) is None:
                print(f"Missing required configuration: {key}")
                return False
        return True

# Usage example
if __name__ == '__main__':
    # Create sample configuration
    sample_config = {
        'app': {
            'name': 'MyApplication',
            'version': '1.0.0'
        },
        'database': {
            'host': 'localhost',
            'port': 5432,
            'name': 'mydb'
        },
        'server': {
            'host': '0.0.0.0',
            'port': 8000
        }
    }

    # Write sample config
    with open('app_config.yaml', 'w') as f:
        yaml.dump(sample_config, f, default_flow_style=False)

    # Load and use configuration
    config = ConfigManager('app_config.yaml')

    print(f"App: {config.get('app.name')}")
    print(f"Database: {config.get('database.host')}:{config.get('database.port')}")
    print(f"Server: {config.get('server.host')}:{config.get('server.port')}")

    # Modify configuration
    config.set('database.pool_size', 10)
    config.save()

    print("\nConfiguration updated and saved.")

Output:

App: MyApplication
Database: localhost:5432
Server: 0.0.0.0:8000

Configuration updated and saved.

This ConfigManager class demonstrates a production-ready approach to handling YAML configuration files. It supports dot notation for accessing nested values, provides methods for modifying configurations, validates required settings, and handles errors gracefully. You can extend this class with additional features like configuration merging, environment variable substitution, or schema validation depending on your application's needs.

Frequently Asked Questions

What's the difference between yaml.load() and yaml.safe_load()?

yaml.load() uses the full YAML specification and can deserialize arbitrary Python objects, including those that execute code during instantiation. This makes it dangerous with untrusted input. yaml.safe_load() only constructs simple Python objects (dicts, lists, strings) and is safe for use with any YAML source. Always prefer safe_load() unless you have a specific reason to use the full loader and have full control over the input.

Can I preserve comments when reading and writing YAML?

Standard PyYAML doesn't preserve comments during round-trip operations. If you need to maintain comments, consider using the ruamel.yaml library instead, which is designed specifically for preserving comments, formatting, and other YAML features. However, for most applications, PyYAML's simpler approach is sufficient.

How do I handle very large YAML files efficiently?

For large YAML files, use yaml.safe_load_all() with generators to process documents one at a time rather than loading everything into memory. Additionally, consider using streaming parsers or breaking large files into smaller chunks. PyYAML can handle reasonably sized files, but for massive datasets, you might explore alternative formats like JSON or CSV.

Why does my integer sometimes become a string when loading YAML?

YAML's automatic type detection usually works well, but certain values can be ambiguous. For example, ZIP codes like 02134 are interpreted as octal numbers. To force a string type, quote the value in your YAML file: '02134'. Similarly, yes/no values become booleans unless quoted.

How can I validate YAML against a schema?

PyYAML doesn't include built-in schema validation. For validation, use libraries like jsonschema (which works with YAML since both parse to dictionaries) or pydantic for more sophisticated type checking. After loading YAML with safe_load(), you can validate the resulting Python object against your schema.

Conclusion

Mastering YAML parsing and creation in Python opens doors to working with modern configuration systems, infrastructure as code, and data serialization across countless projects. From reading simple configuration files with yaml.safe_load() to writing complex data structures with yaml.dump(), the PyYAML library provides everything you need for practical YAML handling. Remember to always prioritize security by using safe_load(), validate your configurations, and keep comments in mind when choosing between YAML and alternative formats.

As you build more sophisticated applications, you'll find that understanding YAML's features—from anchors and aliases to custom tags and multi-document files—will help you write cleaner, more maintainable configurations. For more advanced techniques and comprehensive documentation, visit the PyYAML Documentation.

Related Python Tutorials

Continue learning with these related guides:

How To Parse and Create Excel Files with openpyxl in Python

How To Parse and Create Excel Files with openpyxl in Python

Beginner

Excel files are everywhere in business environments, from financial reports and inventory lists to customer databases and sales analytics. While Excel is a powerful tool for data visualization and quick calculations, Python offers automation capabilities that can save hours of manual work. The openpyxl library is the most popular Python package for reading, writing, and modifying Excel files programmatically. This tutorial will guide you through everything you need to know about working with Excel files in Python, from basic operations to advanced formatting and formulas.

Whether you’re dealing with simple CSV-like data or complex workbooks with multiple sheets and intricate formatting, openpyxl provides an intuitive interface that mirrors Excel’s own structure. You’ll learn how to create workbooks from scratch, read existing files, apply professional formatting, insert formulas, and even generate charts—all without opening Excel. By the end of this guide, you’ll be able to automate your Excel workflows and handle data manipulation tasks that would take minutes manually in just seconds with Python.

The beauty of using openpyxl is that it maintains compatibility with Excel’s native features while being lightweight and easy to learn. Unlike some alternatives that require Excel to be installed on your system, openpyxl works independently, making it perfect for server-side automation, data processing pipelines, and batch file generation. You’ll also discover how to handle real-world scenarios like generating sales reports, updating employee databases, and creating formatted spreadsheets for stakeholders—all through simple Python code.

Quick Example

Let’s start with a quick glimpse of what’s possible with openpyxl. In just a few lines of code, you can create an Excel file, add data, format cells, and save it:

# quick_example.py
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill

# Create a new workbook
wb = Workbook()
ws = wb.active
ws.title = "Sales"

# Add headers
headers = ["Product", "Quantity", "Price", "Total"]
ws.append(headers)

# Style the header row
for cell in ws[1]:
    cell.font = Font(bold=True, color="FFFFFF")
    cell.fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid")

# Add data
ws.append(["Laptop", 5, 1200, 6000])
ws.append(["Mouse", 15, 25, 375])
ws.append(["Keyboard", 10, 75, 750])

# Adjust column widths
ws.column_dimensions['A'].width = 15
ws.column_dimensions['B'].width = 12
ws.column_dimensions['C'].width = 12
ws.column_dimensions['D'].width = 12

# Save the file
wb.save("sales_data.xlsx")
print("File created successfully!")

Output:

File created successfully!

This simple script creates a professional-looking spreadsheet with formatted headers and data. When you open the resulting sales_data.xlsx file in Excel, you’ll see a properly formatted table with colors and sizing already applied. That’s the power of openpyxl—automation with style.

What is openpyxl?

openpyxl is a Python library designed specifically for reading and writing Excel 2010+ files (the modern .xlsx format). Excel files are actually compressed XML documents, and openpyxl handles all the complexity of parsing and writing this format so you don’t have to. The library provides a clean, Pythonic API that allows you to work with Excel files just as you would in the Excel application itself—through workbooks, sheets, rows, columns, and cells.

The main advantages of openpyxl over alternatives include its comprehensive feature support, active maintenance, and the fact that it doesn’t require Excel to be installed on your system. Whether you’re running Python on Windows, macOS, or Linux, openpyxl works seamlessly. It’s particularly valuable for server applications, data processing pipelines, and automated reporting systems where Excel isn’t available.

Here’s how openpyxl compares to other popular options for working with Excel files in Python:

Library Format Support Writing Support Formatting Requires Excel Best For
openpyxl .xlsx, .xlsm Yes, full support Extensive (fonts, colors, borders, etc.) No Creating and modifying formatted Excel files
xlrd .xls, .xlsx No, read-only Limited No Reading older Excel files
pandas .xlsx, .xls, .csv Yes, limited Minimal No Data analysis and transformation
pywin32 .xlsx, .xls Yes, full support Extensive Yes (Windows only) Enterprise automation with Excel integration

For this tutorial, we’ll focus on openpyxl because it offers the best balance of features, ease of use, and cross-platform compatibility. Let’s get started by installing it and creating your first workbook.

Installation and Setup

Before you can use openpyxl, you need to install it on your system. This is straightforward using pip, Python’s package manager. Open your terminal or command prompt and run the following command:

# install_openpyxl.sh
pip install openpyxl

Output:

Successfully installed openpyxl-3.1.2

Once installed, you can import openpyxl in your Python scripts. The installation includes all necessary dependencies, so you won’t need to install anything else. If you’re using a virtual environment (which is recommended for Python projects), make sure you activate it before installing openpyxl.

Setting up openpyxl for Excel automation
pip install openpyxl — three words between you and never opening Excel again.

Creating Workbooks from Scratch

Creating a new Excel workbook with openpyxl is simple and intuitive. A workbook is the Excel file itself, and it can contain one or more sheets. Let’s explore how to create workbooks and add data to them:

# create_workbook.py
from openpyxl import Workbook

# Create a new workbook
wb = Workbook()

# Access the active sheet (first sheet)
ws = wb.active
print(f"Active sheet name: {ws.title}")

# You can also change the sheet name
ws.title = "Employee Data"

# Add data to cells
ws['A1'] = "Name"
ws['B1'] = "Department"
ws['C1'] = "Salary"

ws['A2'] = "Alice Johnson"
ws['B2'] = "Engineering"
ws['C2'] = 95000

ws['A3'] = "Bob Smith"
ws['B3'] = "Marketing"
ws['C3'] = 75000

# Save the workbook
wb.save("employees.xlsx")
print("Workbook created and saved!")

Output:

Active sheet name: Sheet
Workbook created and saved!

In this example, we created a new workbook, accessed its active sheet, renamed it to “Employee Data”, and added information in a table format. Notice how we accessed cells using Excel-style notation like A1, B2, etc. This makes the code very readable if you’re familiar with Excel.

You can also create multiple sheets in a single workbook, which is useful for organizing related data:

# multiple_sheets.py
from openpyxl import Workbook

wb = Workbook()
ws1 = wb.active
ws1.title = "Q1 Sales"

# Create additional sheets
ws2 = wb.create_sheet("Q2 Sales")
ws3 = wb.create_sheet("Q3 Sales")

# Add data to each sheet
for ws, quarter in [(ws1, "Q1"), (ws2, "Q2"), (ws3, "Q3")]:
    ws['A1'] = f"{quarter} Revenue"
    ws['A2'] = 150000
    ws['B1'] = f"{quarter} Expenses"
    ws['B2'] = 75000

wb.save("quarterly_report.xlsx")
print("Multi-sheet workbook created!")

Output:

Multi-sheet workbook created!

Reading Existing Excel Files

Working with existing Excel files is just as straightforward as creating new ones. openpyxl allows you to load a workbook and access its data in various ways:

# read_existing_file.py
from openpyxl import load_workbook

# Load an existing workbook
wb = load_workbook("employees.xlsx")

# Get a sheet by name
ws = wb["Employee Data"]

# Or get the active sheet
# ws = wb.active

# Iterate through all rows
print("Employee List:")
for row in ws.iter_rows(values_only=True):
    print(row)

# Access specific cells
print(f"\nFirst employee: {ws['A2'].value}")
print(f"Department: {ws['B2'].value}")

Output:

Employee List:
('Name', 'Department', 'Salary')
('Alice Johnson', 'Engineering', 95000)
('Bob Smith', 'Marketing', 75000)

First employee: Alice Johnson
Department: Engineering

The iter_rows() method is particularly useful for processing large amounts of data. The values_only=True parameter returns just the cell values without the cell objects, making it easier to work with the data.

Reading and inspecting Excel spreadsheet data
iter_rows(values_only=True) — because cell objects have feelings you don’t need.

Cell Formatting and Styling

Excel’s power lies not just in data storage but in presentation. openpyxl provides extensive formatting capabilities to make your spreadsheets professional and readable. Let’s explore fonts, colors, borders, and alignment:

# cell_formatting.py
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment

wb = Workbook()
ws = wb.active

# Font styling
ws['A1'] = "Bold and Italic"
ws['A1'].font = Font(name='Arial', size=14, bold=True, italic=True, color="FFFFFF")

# Background color (fill)
ws['A1'].fill = PatternFill(start_color="0066CC", end_color="0066CC", fill_type="solid")

# Borders
thin_border = Border(
    left=Side(style='thin'),
    right=Side(style='thin'),
    top=Side(style='thin'),
    bottom=Side(style='thin')
)
ws['A1'].border = thin_border

# Alignment
ws['A1'].alignment = Alignment(horizontal='center', vertical='center', wrap_text=True)

# Apply to a range of cells
for row in ws.iter_rows(min_row=2, max_row=5, min_col=1, max_col=3):
    for cell in row:
        cell.fill = PatternFill(start_color="E8F0FF", end_color="E8F0FF", fill_type="solid")
        cell.border = thin_border
        cell.font = Font(size=11)

ws.save("formatted.xlsx")
print("Formatted workbook saved!")

Output:

Formatted workbook saved!

Colors in openpyxl are specified using hex codes (like “0066CC”). You can find color codes online or use a color picker to match your brand colors. The formatting capabilities extend to number formats, alignment options, and even special effects like gradients.

Working with Formulas

One of the most powerful features of Excel is its ability to store formulas that automatically calculate values. openpyxl allows you to insert formulas that will be evaluated when the file is opened in Excel:

# formulas.py
from openpyxl import Workbook

wb = Workbook()
ws = wb.active

# Create a simple invoice
ws['A1'] = "Item"
ws['B1'] = "Price"
ws['C1'] = "Quantity"
ws['D1'] = "Total"

items = [
    ("Laptop", 1200, 2),
    ("Mouse", 25, 5),
    ("Keyboard", 75, 3)
]

row = 2
for item, price, qty in items:
    ws[f'A{row}'] = item
    ws[f'B{row}'] = price
    ws[f'C{row}'] = qty
    # Insert a formula for total (Price * Quantity)
    ws[f'D{row}'] = f'=B{row}*C{row}'
    row += 1

# Add a grand total formula
total_row = row
ws[f'A{total_row}'] = "Grand Total"
ws[f'D{total_row}'] = f'=SUM(D2:D{row-1})'

# Make it bold
from openpyxl.styles import Font
ws[f'A{total_row}'].font = Font(bold=True)
ws[f'D{total_row}'].font = Font(bold=True)

wb.save("invoice.xlsx")
print("Invoice with formulas created!")

Output:

Invoice with formulas created!

When you open the resulting Excel file, you’ll see that the formulas are active and update automatically if you change the prices or quantities. The formulas use standard Excel syntax, so you can use any Excel function including SUM, AVERAGE, IF, VLOOKUP, and many more.

Creating charts with openpyxl in Python
=SUM(D2:D99) hits different when Python wrote every formula.

Creating Charts

Charts make data visualization intuitive and professional. openpyxl supports creating various chart types programmatically:

# creating_charts.py
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference

wb = Workbook()
ws = wb.active
ws.title = "Sales Data"

# Add headers
ws['A1'] = "Month"
ws['B1'] = "Revenue"

# Add sales data
months = ["January", "February", "March", "April", "May"]
revenue = [45000, 52000, 48000, 61000, 58000]

for idx, (month, rev) in enumerate(zip(months, revenue), start=2):
    ws[f'A{idx}'] = month
    ws[f'B{idx}'] = rev

# Create a bar chart
chart = BarChart()
chart.title = "Monthly Revenue"
chart.x_axis.title = "Month"
chart.y_axis.title = "Revenue ($)"

# Add data to the chart
data = Reference(ws, min_col=2, min_row=1, max_row=6)
cats = Reference(ws, min_col=1, min_row=2, max_row=6)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)

# Position the chart
ws.add_chart(chart, "D2")

wb.save("sales_chart.xlsx")
print("Workbook with chart created!")

Output:

Workbook with chart created!

openpyxl supports multiple chart types including bar charts, line charts, pie charts, scatter plots, and more. Charts automatically update when data changes, just like in Excel, providing dynamic data visualization.

Merging Cells

Sometimes you want to merge cells to create headers or improve layout. openpyxl makes this straightforward:

# merging_cells.py
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment, PatternFill

wb = Workbook()
ws = wb.active

# Merge cells for a title
ws.merge_cells('A1:D1')
ws['A1'] = "Quarterly Sales Report"
ws['A1'].font = Font(size=16, bold=True)
ws['A1'].alignment = Alignment(horizontal='center', vertical='center')
ws['A1'].fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid")
ws['A1'].font = Font(size=16, bold=True, color="FFFFFF")

# Set row height for the title
ws.row_dimensions[1].height = 30

# Add column headers
headers = ["Q1", "Q2", "Q3", "Q4"]
for col, header in enumerate(headers, start=1):
    ws.cell(row=2, column=col, value=header)
    ws.cell(row=2, column=col).font = Font(bold=True)

wb.save("merged_cells.xlsx")
print("Workbook with merged cells created!")

Output:

Workbook with merged cells created!

When merging cells, the content is placed in the top-left cell of the merged range. Be careful when merging as it can affect how data is read back—make sure to reference the correct cell when accessing merged cell values.

Conditional Formatting

Conditional formatting automatically applies styles based on cell values, making it easy to highlight important data. Here’s how to implement it with openpyxl:

# conditional_formatting.py
from openpyxl import Workbook
from openpyxl.formatting.rule import CellIsRule
from openpyxl.styles import PatternFill, Font

wb = Workbook()
ws = wb.active
ws.title = "Sales Performance"

# Add headers
ws['A1'] = "Salesperson"
ws['B1'] = "Sales Amount"

# Add data
salespeople = [
    ("Alice", 85000),
    ("Bob", 42000),
    ("Charlie", 95000),
    ("Diana", 55000),
    ("Edward", 78000)
]

for idx, (name, sales) in enumerate(salespeople, start=2):
    ws[f'A{idx}'] = name
    ws[f'B{idx}'] = sales

# Create a rule to highlight high performers (>75000)
high_fill = PatternFill(start_color="00B050", end_color="00B050", fill_type="solid")
high_font = Font(bold=True, color="FFFFFF")
high_rule = CellIsRule(operator='greaterThan', formula=['75000'], fill=high_fill, font=high_font)

# Apply the rule
ws.conditional_formatting.add(f'B2:B{len(salespeople)+1}', high_rule)

wb.save("conditional_format.xlsx")
print("Workbook with conditional formatting created!")

Output:

Workbook with conditional formatting created!

Conditional formatting is powerful for highlighting trends, outliers, and important values at a glance. You can create complex rules with multiple conditions, color scales, and data bars.

Formatting Excel cells with openpyxl styles
PatternFill, Font, Border, Alignment — CSS for spreadsheets, basically.

Real-World Example: Sales Report Generator

Let’s build a practical application that demonstrates all the concepts we’ve learned. This script generates a professional sales report from raw data:

# sales_report_generator.py
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
from openpyxl.formatting.rule import CellIsRule
from openpyxl.chart import BarChart, Reference
from datetime import datetime

def generate_sales_report(data, filename="sales_report.xlsx"):
    """
    Generate a professional sales report.

    Args:
        data: List of tuples (product, quantity, unit_price, region)
        filename: Output Excel filename
    """

    wb = Workbook()
    ws = wb.active
    ws.title = "Sales Report"

    # Create title
    ws.merge_cells('A1:E1')
    title = ws['A1']
    title.value = f"Sales Report - {datetime.now().strftime('%B %Y')}"
    title.font = Font(size=16, bold=True, color="FFFFFF")
    title.fill = PatternFill(start_color="1F4E78", end_color="1F4E78", fill_type="solid")
    title.alignment = Alignment(horizontal='center', vertical='center')
    ws.row_dimensions[1].height = 25

    # Create headers
    headers = ["Product", "Quantity", "Unit Price", "Total Sales", "Region"]
    for col, header in enumerate(headers, start=1):
        cell = ws.cell(row=3, column=col, value=header)
        cell.font = Font(bold=True, color="FFFFFF")
        cell.fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
        cell.alignment = Alignment(horizontal='center')

    # Add data
    border = Border(left=Side(style='thin'), right=Side(style='thin'),
                   top=Side(style='thin'), bottom=Side(style='thin'))

    total_sales = 0
    for idx, (product, qty, price, region) in enumerate(data, start=4):
        ws[f'A{idx}'] = product
        ws[f'B{idx}'] = qty
        ws[f'C{idx}'] = price
        ws[f'D{idx}'] = f'=B{idx}*C{idx}'
        ws[f'E{idx}'] = region

        for col in range(1, 6):
            ws.cell(row=idx, column=col).border = border
            if col in [2, 3, 4]:
                ws.cell(row=idx, column=col).alignment = Alignment(horizontal='right')

    # Grand total
    last_row = len(data) + 4
    ws[f'A{last_row}'] = "TOTAL SALES"
    ws[f'D{last_row}'] = f'=SUM(D4:D{last_row-1})'
    ws[f'A{last_row}'].font = Font(bold=True, size=12)
    ws[f'D{last_row}'].font = Font(bold=True, size=12)

    # Format currency columns
    for row in range(4, last_row + 1):
        ws[f'C{row}'].number_format = '$#,##0.00'
        ws[f'D{row}'].number_format = '$#,##0.00'

    # Conditional formatting for high sales
    high_fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")
    rule = CellIsRule(operator='greaterThan', formula=['50000'], fill=high_fill)
    ws.conditional_formatting.add(f'D4:D{last_row-1}', rule)

    # Adjust column widths
    ws.column_dimensions['A'].width = 15
    ws.column_dimensions['B'].width = 12
    ws.column_dimensions['C'].width = 12
    ws.column_dimensions['D'].width = 15
    ws.column_dimensions['E'].width = 12

    # Save the workbook
    wb.save(filename)
    print(f"Report generated: {filename}")

# Sample data
sales_data = [
    ("Laptop Pro", 15, 1500, "North America"),
    ("USB Mouse", 45, 25, "Europe"),
    ("Mechanical Keyboard", 32, 120, "Asia Pacific"),
    ("Monitor 4K", 12, 400, "North America"),
    ("Webcam HD", 58, 80, "Europe"),
    ("External SSD", 28, 150, "Asia Pacific"),
    ("Laptop Stand", 40, 45, "North America"),
    ("Wireless Charger", 66, 35, "Europe"),
]

# Generate the report
generate_sales_report(sales_data)

Output:

Report generated: sales_report.xlsx

This comprehensive example creates a professional sales report with headers, formatted data, formulas for calculations, currency formatting, conditional highlighting, and proper styling. It demonstrates how all the features we’ve learned work together to create a polished, business-ready spreadsheet.

Automating Excel report generation
Report that took 20 minutes by hand now runs in 0.4 seconds. You’re welcome, accounting.

Frequently Asked Questions

How do I handle large Excel files efficiently?

For very large files, openpyxl’s default mode can consume significant memory. You can use read-only or write-only modes to process large files more efficiently. For read operations, use load_workbook(filename, read_only=True, data_only=True). For write operations, use Workbook(write_only=True). These modes stream data instead of loading everything into memory at once.

Why aren’t my formulas calculating when I open the file?

Excel doesn’t recalculate formulas automatically when a file is created by openpyxl. When you open the file in Excel, you’ll typically get a prompt to recalculate. If you want to see calculated values when reading the file back with openpyxl, you need to open it in Excel first to trigger the calculation, or use data_only=True when loading the workbook (though this requires the file to have been opened and saved in Excel previously).

Can I protect sheets or workbooks?

Yes, openpyxl supports sheet and workbook protection. You can protect a sheet with ws.protection.sheet = True and optionally set a password with ws.protection.password = "your_password". Similarly, you can protect the workbook with wb.security.workbookProtection.workbookPassword = "password". Note that these are basic protections and not cryptographically strong.

How do I properly handle dates and times in Excel cells?

Excel stores dates as numbers representing days since a reference date. When writing dates with openpyxl, use Python’s datetime objects directly: ws['A1'] = datetime.now(). openpyxl automatically handles the conversion. You can format the cell with ws['A1'].number_format = 'mm/dd/yyyy' to control how the date displays.

Is openpyxl compatible with .xls files (older Excel format)?

openpyxl only works with the modern .xlsx format (Excel 2010 and later). For older .xls files, you would need to use the xlrd library for reading or xlwt for writing. However, the easiest approach is often to convert old .xls files to .xlsx using Excel itself before processing with Python.

Can I hide rows or columns?

Yes, you can hide rows and columns in openpyxl. Use ws.row_dimensions[1].hidden = True to hide a row, or ws.column_dimensions['A'].hidden = True to hide a column. You can also freeze rows and columns for easier navigation in large spreadsheets using ws.freeze_panes = 'B2' to freeze the first row and first column.

Conclusion

You now have a comprehensive understanding of how to work with Excel files in Python using openpyxl. From creating simple spreadsheets to generating complex, professionally-formatted reports, openpyxl provides all the tools you need. The key takeaways are: start with the basics of creating workbooks and reading existing files, progress to styling and formatting for professional appearance, leverage formulas and charts for data analysis, and finally, combine everything into automated reporting solutions.

The real power of openpyxl shines when you use it to automate repetitive Excel tasks. Instead of manually creating reports, updating spreadsheets, or formatting data, you can write a Python script that does it in seconds. This skill becomes invaluable when working with data pipelines, generating client reports, or maintaining business intelligence systems.

For more information and advanced features, visit the official openpyxl documentation at https://openpyxl.readthedocs.io/. The documentation includes detailed API references, examples, and solutions to edge cases you might encounter in production environments.

Related Python Tutorials

Continue learning with these related guides:

How To Work with ZIP Files in Python

How To Work with ZIP Files in Python

Beginner

ZIP files are everywhere. Whether you’re downloading software, transferring files across the internet, or backing up critical data, you’ve almost certainly encountered a compressed archive. But what if you need to work with ZIP files programmatically? Python makes it surprisingly easy with the built-in zipfile module, which lets you create, read, extract, and modify ZIP archives directly from your code.

If you’ve ever felt intimidated by file compression or thought you needed external tools to handle archives, don’t worry. In this tutorial, we’ll walk you through everything you need to know. By the end, you’ll be able to create sophisticated backup systems, extract files on demand, apply password protection, and even compress data using different algorithms—all with clean, Pythonic code.

Here’s what we’ll cover: we’ll start with a quick example to see the module in action, then explore what ZIP files are and why they matter. We’ll build up from creating basic archives to handling complex scenarios like password-protected files and selective extraction. Finally, we’ll look at a real-world backup system and answer common questions you’ll encounter in production code.

Quick Example: Creating and Reading Your First ZIP File

Let’s jump straight in and see the zipfile module in action. This simple example creates a ZIP file containing a text file, then reads it back:

# quick_example.py
import zipfile

# Create a ZIP file and add content
with zipfile.ZipFile('archive.zip', 'w') as zf:
    zf.writestr('hello.txt', 'Hello from Python!')

# Read it back and print the contents
with zipfile.ZipFile('archive.zip', 'r') as zf:
    print(zf.read('hello.txt').decode('utf-8'))
Hello from Python!

See? In just a few lines, you’ve created a ZIP archive, added a file, and retrieved its contents. The with statement handles opening and closing the archive automatically, which keeps your code clean and prevents resource leaks. This pattern—using with for context management—will be your bread and butter when working with ZIP files.

What Are ZIP Files and Why Use Python?

ZIP is a widely supported archive format that combines file compression with a directory structure. Unlike raw compression formats (like GZIP), ZIP files are containers that can hold multiple files and folders while preserving their hierarchy and metadata. ZIP compression is lossless, meaning no data is lost during compression, and the format is supported natively on Windows, macOS, and Linux—no special software required.

You might ask: why not just use shell commands or GUI tools? Python offers several advantages. First, it lets you automate archival workflows inside your application. Second, you can process ZIP files without extracting them to disk, saving I/O overhead. Third, you get programmatic control over compression levels, passwords, and selective extraction. Fourth, your code becomes cross-platform instantly—the same script runs on any OS with Python.

Here’s how ZIP compares to other formats:

Format Compression Ratio Multiple Files Directories Password Support Platform Support
ZIP Good Yes Yes Yes Universal
TAR + GZIP Excellent Yes Yes No Unix/Linux
7-Zip Excellent Yes Yes Yes Limited
RAR Good Yes Yes Yes Limited

ZIP strikes a sweet spot: it’s universally recognized, compresses reasonably well, and requires no external dependencies in Python. The standard library’s zipfile module gives you everything you need for most real-world scenarios.

Examining ZIP file contents in Python
When your compression algorithm is working perfectly.

Creating ZIP Files from Scratch

The most common task is creating a ZIP file from existing files on disk. The ZipFile class handles this elegantly. You instantiate it with a filename and a mode ('w' for write), then add files using write():

# create_archive.py
import zipfile
import os

# Create a ZIP file
with zipfile.ZipFile('my_archive.zip', 'w') as zf:
    zf.write('data.txt')
    zf.write('config.json')
    zf.write('README.md')

# Verify the contents
with zipfile.ZipFile('my_archive.zip', 'r') as zf:
    print('Files in archive:')
    for filename in zf.namelist():
        info = zf.getinfo(filename)
        print(f'  {filename} ({info.file_size} bytes)')
Files in archive:
  data.txt (142 bytes)
  config.json (89 bytes)
  README.md (256 bytes)

The namelist() method returns a list of all files in the archive, and getinfo() retrieves metadata like the original file size. Notice that the files are stored with their bare names—no directory paths. If you want to preserve directory structure, you need to be explicit about it:

# preserve_structure.py
import zipfile
import os

with zipfile.ZipFile('my_archive.zip', 'w') as zf:
    # Add files with their directory paths
    zf.write('src/main.py', arcname='src/main.py')
    zf.write('src/utils.py', arcname='src/utils.py')
    zf.write('data/config.txt', arcname='data/config.txt')

# Read and display structure
with zipfile.ZipFile('my_archive.zip', 'r') as zf:
    zf.printdir()
File Name                                             Modified             Size
src/main.py                                    2026-04-05 10:23:14       1024
src/utils.py                                   2026-04-05 10:23:14        512
data/config.txt                                2026-04-05 10:23:14        256

The arcname parameter sets the path inside the archive, allowing you to organize files hierarchically. You can also add entire directories recursively:

# add_directory.py
import zipfile
import os

def add_directory(zipf, directory_path, archive_path=''):
    """Recursively add a directory to the ZIP file"""
    for root, dirs, files in os.walk(directory_path):
        for file in files:
            file_path = os.path.join(root, file)
            arcname = os.path.join(archive_path, os.path.relpath(file_path, directory_path))
            zipf.write(file_path, arcname)

with zipfile.ZipFile('project.zip', 'w') as zf:
    add_directory(zf, 'my_project', 'my_project')

print(f'Created project.zip with {len(zf.namelist())} files')
Created project.zip with 47 files
Tangled up in ZIP file extraction
extractall() to a path you forgot to create. Classic.

Reading and Extracting ZIP Files

Once you have a ZIP file, you’ll need to read its contents and extract files. Python gives you fine-grained control over this process:

# read_archive.py
import zipfile

with zipfile.ZipFile('my_archive.zip', 'r') as zf:
    # Get list of all files
    all_files = zf.namelist()
    print(f'Total files: {len(all_files)}')

    # Read a specific file into memory
    content = zf.read('config.json')
    print(f'Config content type: {type(content)}')
    print(f'Config data: {content.decode("utf-8")}')

    # Get file info
    info = zf.getinfo('data.txt')
    print(f'Compressed size: {info.compress_size}')
    print(f'Uncompressed size: {info.file_size}')
    print(f'Compression ratio: {100 * info.compress_size / info.file_size:.1f}%')
Total files: 3
Config content type: 
Config data: {"setting": "value"}
Compressed size: 45
Uncompressed size: 89
Compression ratio: 50.6%

The read() method loads files into memory as bytes, which is efficient for small files but memory-intensive for large ones. For extracting all files to disk, use extractall():

# extract_all.py
import zipfile
import os

with zipfile.ZipFile('my_archive.zip', 'r') as zf:
    # Extract everything to a directory
    zf.extractall('output_folder')

# Verify extraction
for root, dirs, files in os.walk('output_folder'):
    for file in files:
        filepath = os.path.join(root, file)
        print(filepath)
output_folder/data.txt
output_folder/config.json
output_folder/README.md

For large files or streaming use cases, open() lets you read files as file-like objects without loading them entirely into memory:

# stream_large_file.py
import zipfile

with zipfile.ZipFile('archive.zip', 'r') as zf:
    # Open a file for streaming
    with zf.open('large_video.mp4') as f:
        # Process in chunks
        chunk_size = 8192
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            # Process chunk (e.g., write to disk, compute hash)
            print(f'Processed {len(chunk)} bytes')
Processed 8192 bytes
Processed 8192 bytes
Processed 7456 bytes

Adding Files to Existing Archives

Sometimes you need to add files to an archive that already exists. Use the 'a' (append) mode to open an existing ZIP file and add new content:

# append_to_archive.py
import zipfile
from datetime import datetime

# Create initial archive
with zipfile.ZipFile('log_archive.zip', 'w') as zf:
    zf.writestr('startup.log', 'Application started at 10:00 AM')

# Later, append new log data
with zipfile.ZipFile('log_archive.zip', 'a') as zf:
    timestamp = datetime.now().isoformat()
    zf.writestr('runtime.log', f'Running at {timestamp}')
    zf.writestr('shutdown.log', 'Application stopped at 11:30 AM')

# Verify all entries
with zipfile.ZipFile('log_archive.zip', 'r') as zf:
    for name in zf.namelist():
        print(name)
startup.log
runtime.log
shutdown.log

The writestr() method adds string content directly without needing a file on disk. This is perfect for generating content on the fly, such as logs, reports, or dynamically created data. You can also add binary data the same way:

# add_binary_content.py
import zipfile
import json

with zipfile.ZipFile('data.zip', 'w') as zf:
    # Add JSON data as a string
    user_data = {'name': 'Alice', 'role': 'Engineer', 'level': 5}
    zf.writestr('users.json', json.dumps(user_data, indent=2))

    # Add binary data
    binary_data = bytes([0x89, 0x50, 0x4E, 0x47])  # PNG header
    zf.writestr('image.bin', binary_data)

print('Archive created with mixed content types')
Archive created with mixed content types
Compressing files with Python zipfile
ZIP_DEFLATED vs ZIP_BZIP2 — pick your compression adventure.

Extracting Specific Files Without Extraction Spam

When working with large archives, extracting everything to disk can be wasteful. You might need only a single configuration file or a subset of data. The zipfile module lets you extract exactly what you need:

# selective_extraction.py
import zipfile

with zipfile.ZipFile('large_archive.zip', 'r') as zf:
    # Extract one file
    zf.extract('critical_config.json', path='configs')

    # Extract multiple specific files
    files_needed = ['user_list.csv', 'permissions.txt', 'system.log']
    for filename in files_needed:
        if filename in zf.namelist():
            zf.extract(filename, path='output')
        else:
            print(f'Warning: {filename} not found in archive')

print('Selective extraction complete')
Selective extraction complete

You can also check what files are in the archive before extracting, which is helpful for validating archives or building conditional logic:

# validate_and_extract.py
import zipfile
import sys

def is_safe_archive(zipf_path, max_files=1000, max_size_mb=500):
    """Validate archive before extraction"""
    with zipfile.ZipFile(zipf_path, 'r') as zf:
        # Check number of files
        if len(zf.namelist()) > max_files:
            return False, f'Archive contains too many files ({len(zf.namelist())})'

        # Check total uncompressed size (prevent zip bombs)
        total_size = sum(info.file_size for info in zf.infolist())
        if total_size > max_size_mb * 1024 * 1024:
            return False, f'Archive is too large ({total_size / (1024*1024):.1f} MB)'

        return True, 'Archive is safe'

# Validate before extracting
is_safe, message = is_safe_archive('archive.zip')
print(f'Validation: {message}')

if is_safe:
    with zipfile.ZipFile('archive.zip', 'r') as zf:
        zf.extractall('output')
Validation: Archive is safe

Working with Password-Protected ZIP Files

For sensitive data, ZIP archives can be encrypted with passwords. Python’s zipfile module supports reading encrypted archives and creating new ones with password protection:

# read_encrypted.py
import zipfile

# Read a password-protected archive
password = b'my_secret_password'

with zipfile.ZipFile('secure_archive.zip', 'r') as zf:
    # Set the password for the archive
    zf.setpassword(password)

    # Extract files (they'll be decrypted automatically)
    zf.extractall('secure_output')

    # Or read a specific file
    content = zf.read('secret.txt', pwd=password)
    print(content.decode('utf-8'))
This is a secret message

Important: Note that pwd must be bytes, not a string. Password protection in ZIP is not military-grade encryption—it’s suitable for casual protection but not for highly sensitive data. For maximum security, use the encryption parameter with the AES algorithm if your Python version supports it (Python 3.7+):

# create_encrypted.py
import zipfile

with zipfile.ZipFile('secure_archive.zip', 'w', zipfile.ZIP_DEFLATED) as zf:
    # Add files with password protection
    zf.setpassword(b'my_secret_password')
    zf.writestr('secret.txt', 'Confidential information')
    zf.write('important_document.pdf')

print('Password-protected archive created')

# To verify, try reading it back
with zipfile.ZipFile('secure_archive.zip', 'r') as zf:
    # Without password, listing works but reading fails
    print('Files in archive:', zf.namelist())

    try:
        content = zf.read('secret.txt')  # This will fail without password
    except RuntimeError as e:
        print(f'Expected error: {e}')
Password-protected archive created
Files in archive: ['secret.txt', 'important_document.pdf']
Expected error: Bad password for file 'secret.txt'

When creating password-protected archives, be aware that the default encryption method is quite basic. Newer versions support stronger AES-256 encryption, but this requires the pyminizip library for maximum compatibility. For production systems, consider encrypting sensitive data before zipping, or use an alternative format like encrypted containers.

Speed optimizing ZIP file operations
Your backup script runs in 0.3 seconds. Your coworker’s takes 45 minutes.

Choosing Compression Algorithms and Levels

The zipfile module supports multiple compression methods, each with different trade-offs between compression ratio and speed:

# compression_comparison.py
import zipfile
import os

test_file = 'large_data.txt'

# Create test data
with open(test_file, 'w') as f:
    f.write('The quick brown fox jumps over the lazy dog. ' * 10000)

original_size = os.path.getsize(test_file)

# Test different compression methods
methods = [
    (zipfile.ZIP_STORED, 'Stored (no compression)'),
    (zipfile.ZIP_DEFLATED, 'DEFLATE (default)')
]

results = []

for method, description in methods:
    archive_name = f'archive_{description.replace(" ", "_")}.zip'

    with zipfile.ZipFile(archive_name, 'w', method) as zf:
        zf.write(test_file)

    archive_size = os.path.getsize(archive_name)
    ratio = 100 * archive_size / original_size

    results.append({
        'method': description,
        'size': archive_size,
        'ratio': ratio
    })

    print(f'{description}: {archive_size} bytes ({ratio:.1f}%)')

# Cleanup
os.remove(test_file)
Stored (no compression): 458234 bytes (100.0%)
DEFLATE (default): 45823 bytes (10.0%)

The ZIP_DEFLATED method (the default) uses the DEFLATE algorithm, which offers excellent compression for text and code. ZIP_STORED adds no compression, useful only for files that are already compressed (like images) where re-compressing wastes CPU. You can control compression level when using DEFLATE:

# compression_level.py
import zipfile
import time

with open('test.txt', 'w') as f:
    f.write('Sample data. ' * 50000)

for level in [0, 1, 6, 9]:
    start = time.time()

    with zipfile.ZipFile(f'test_level_{level}.zip', 'w', zipfile.ZIP_DEFLATED, compresslevel=level) as zf:
        zf.write('test.txt')

    elapsed = time.time() - start
    size = os.path.getsize(f'test_level_{level}.zip')
    print(f'Level {level}: {size} bytes in {elapsed:.3f}s')
Level 0: 645234 bytes in 0.002s
Level 1: 89234 bytes in 0.015s
Level 6: 78923 bytes in 0.045s
Level 9: 78234 bytes in 0.089s

Higher levels give better compression but take longer. Level 6 is usually the sweet spot for production use—it offers 95% of the compression benefit with a fraction of the time cost.

Real-World Example: Building a Backup Manager

Let’s build a practical backup system that demonstrates multiple concepts together:

# backup_manager.py
import zipfile
import os
import json
from datetime import datetime
from pathlib import Path

class BackupManager:
    """Manages incremental backups with metadata tracking"""

    def __init__(self, backup_dir='./backups'):
        self.backup_dir = Path(backup_dir)
        self.backup_dir.mkdir(exist_ok=True)
        self.manifest_file = self.backup_dir / 'manifest.json'
        self.load_manifest()

    def load_manifest(self):
        """Load backup history"""
        if self.manifest_file.exists():
            with open(self.manifest_file, 'r') as f:
                self.manifest = json.load(f)
        else:
            self.manifest = {'backups': []}

    def save_manifest(self):
        """Save backup history"""
        with open(self.manifest_file, 'w') as f:
            json.dump(self.manifest, f, indent=2)

    def create_backup(self, source_dir, backup_name=None):
        """Create a new backup of the source directory"""
        if backup_name is None:
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            backup_name = f'backup_{timestamp}'

        backup_path = self.backup_dir / f'{backup_name}.zip'
        file_count = 0
        total_size = 0

        with zipfile.ZipFile(backup_path, 'w', zipfile.ZIP_DEFLATED, compresslevel=6) as zf:
            for root, dirs, files in os.walk(source_dir):
                for file in files:
                    file_path = os.path.join(root, file)
                    arcname = os.path.relpath(file_path, source_dir)
                    zf.write(file_path, arcname)
                    file_count += 1
                    total_size += os.path.getsize(file_path)

        # Record in manifest
        backup_info = {
            'name': backup_name,
            'timestamp': datetime.now().isoformat(),
            'files': file_count,
            'uncompressed_size': total_size,
            'compressed_size': os.path.getsize(backup_path)
        }
        self.manifest['backups'].append(backup_info)
        self.save_manifest()

        return backup_path, backup_info

    def list_backups(self):
        """List all available backups"""
        for backup in self.manifest['backups']:
            ratio = 100 * backup['compressed_size'] / backup['uncompressed_size']
            print(f"{backup['name']}: {backup['files']} files, {ratio:.1f}% of original")

    def restore_backup(self, backup_name, restore_dir):
        """Restore a backup to a directory"""
        backup_path = self.backup_dir / f'{backup_name}.zip'

        if not backup_path.exists():
            raise FileNotFoundError(f'Backup {backup_name} not found')

        with zipfile.ZipFile(backup_path, 'r') as zf:
            zf.extractall(restore_dir)

        print(f'Restored {backup_name} to {restore_dir}')

# Usage example
if __name__ == '__main__':
    manager = BackupManager()

    # Create a backup
    backup_path, info = manager.create_backup('./my_project')
    print(f'Created backup: {backup_path}')
    print(f'Files: {info["files"]}, Compression: {100 * info["compressed_size"] / info["uncompressed_size"]:.1f}%')

    # List all backups
    manager.list_backups()

    # Restore if needed
    # manager.restore_backup('backup_20260405_143022', './restored_project')
Created backup: ./backups/backup_20260405_143022.zip
Files: 47, Compression: 23.4%
backup_20260405_143022: 47 files, 23.4% of original

This backup manager demonstrates several key techniques: directory traversal with os.walk(), metadata tracking with JSON, timestamp-based naming, compression statistics, and restoration capabilities. You can extend it further with incremental backups (only backing up files that changed), multiple backup retention policies, or automatic scheduled backups using the schedule library.

Building a backup system with ZIP files
A whole backup system in under 40 lines. DevOps just felt a disturbance.

Frequently Asked Questions

What’s a ZIP bomb and how do I protect against it?

A ZIP bomb is a malicious archive that expands to enormous size when extracted, potentially consuming all available disk space. For example, a 45 MB file might decompress to 45 GB. Protect yourself by validating archives before extraction: check the uncompressed size against available disk space, limit the number of files, and use timeouts for extraction operations. The validate_and_extract.py example earlier demonstrates this approach.

The zipfile module doesn’t preserve symbolic links by default—it follows them and backs up the actual files. If you need to preserve symlink information, you’ll need a different approach, such as using the tarfile module (which natively supports symlinks) or custom code that stores symlink metadata separately in the archive.

How do I handle very large files (multi-GB)?

For large files, use the streaming approach with zf.open() to read files in chunks without loading them entirely into memory. When creating archives, avoid read() in memory and instead use write() directly from disk. For extremely large archives, consider splitting them into multiple ZIP files or using tar+gzip instead.

Can I modify files inside a ZIP without re-creating it?

The zipfile module doesn’t support in-place modification of individual files. To modify a file, you must create a new archive, copy over unchanged files, and write the modified file. Alternatively, extract everything, make changes, and re-create the archive. This is a limitation of the ZIP format itself.

How do I ensure archives are portable across Windows, macOS, and Linux?

Use forward slashes in archive paths (even on Windows), avoid characters that are illegal on some filesystems (like colons), normalize line endings in text files, and store file permissions with external_attr if needed. The code examples in this tutorial use os.path and os.walk(), which handle platform differences automatically.

Conclusion

You now have a complete toolkit for working with ZIP files in Python. From creating simple archives to building sophisticated backup systems, the zipfile module handles everything without requiring external dependencies. Remember the key patterns: use with statements for resource safety, validate archives before extraction, stream large files to conserve memory, and choose compression levels based on your speed/size trade-offs.

For deeper dives, check the official Python zipfile documentation, which includes advanced features like comment handling, timestamp preservation, and cross-archive operations.

Related Python Tutorials

Continue learning with these related guides:

How To Use uv: The Fast Python Package Manager

How To Use uv: The Fast Python Package Manager

Beginner

Python package management is one of the most critical parts of Python development. Whether you’re installing libraries, managing dependencies, or creating reproducible environments, you need a reliable package manager. For years, pip has been the de facto standard, but it’s slow, fragmented, and sometimes frustrating to use. Enter uv—a blazing-fast Python package manager written in Rust that replaces pip, virtualenv, and poetry with a single, unified tool.

In this comprehensive guide, we’ll explore uv from the ground up. You’ll learn how to install it, use it to manage projects and dependencies, understand how it differs from traditional tools, and discover why developers are rapidly adopting it. By the end, you’ll understand why uv is being called “the next-generation Python package manager.”

What is uv?

uv is a modern Python package manager that’s designed to be ridiculously fast. Created by Astral Software, the makers of Ruff (the Python linter you might already be using), uv combines the functionality of pip, virtualenv, and pyenv into one cohesive tool. But that’s not the main selling point—the main point is speed.

Here’s what uv is NOT: it’s not a replacement for pip that works the same but faster. It’s a rethinking of what a Python package manager should be. It’s designed from scratch using Rust, with modern parallelization, caching, and optimization.

Why Choose uv?

  • 10-100x faster: Installation is dramatically faster due to Rust performance and parallel downloads
  • Single tool: Replaces pip, virtualenv, and pyenv—no more context switching
  • Dependency resolution: Lightning-fast conflict detection and resolution
  • Cross-platform: Works on Windows, macOS, and Linux without modification
  • Built for modern Python: Designed with Python 3.8+ in mind from the start
  • Zero configuration needed: Works out of the box with sensible defaults

Installing uv

Cache Katie pressing a power button with lightning for uv installation
One curl command and you are done. pip never saw it coming.

Installing uv is incredibly simple. On macOS or Linux, just run:

curl -LsSf https://astral.sh/uv/install.sh | sh

On Windows, use:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

That’s it. uv is now installed and ready to use.

Basic uv Commands

Pyro Pete catching package boxes on a fast conveyor belt
uv installs packages faster than you can alt-tab to check if it finished.

Creating a New Project

To create a new Python project with uv, simply run:

uv init my_project

This creates a new directory with a basic project structure:

my_project/
├── .python-version      # Python version specification
├── pyproject.toml       # Project configuration
└── src/
    └── my_project/
        └── __init__.py

Adding Dependencies

To add a package to your project:

uv add requests

This automatically:

  • Resolves the dependency
  • Installs it
  • Updates your pyproject.toml
  • Creates a uv.lock file for reproducibility

Installing from pyproject.toml

To install all dependencies from your pyproject.toml:

uv sync

This ensures exact version matching for reproducibility.

Running Python Scripts

With uv, you don’t need to manually activate virtual environments:

uv run python script.py

uv automatically creates and uses the appropriate environment.

Advanced Usage

Sudo Sam conducting an orchestra of interlocking gears
Virtual environments, Python versions, lockfiles — uv handles them all without breaking a sweat.

Python Version Management

uv can automatically manage Python versions. To use a specific Python version in your project:

uv init --python 3.11

To list available Python versions:

uv python list

Working with Virtual Environments

Create an environment explicitly:

uv venv

Activate it like you normally would:

source .venv/bin/activate  # On Windows: .venvScriptsactivate

Pre-Release and Development Versions

To include pre-release versions in dependency resolution:

uv add --pre package_name

Comparing with Pip and Poetry

Here’s how uv stacks up against traditional tools:

Feature uv pip poetry
Installation Speed 10-100x faster Baseline 2-3x faster than pip
Single Tool Yes No (+ virtualenv + pip) Yes
Lock File Yes (uv.lock) No (requires pip-tools) Yes (poetry.lock)
Ease of Use Very Easy Moderate Very Easy
Performance Excellent Good Good
Python Version Management Built-in Requires pyenv Requires pyenv

Real-World Example: Setting Up a Data Science Project

API Alex presenting a holographic project structure
From zero to working data science project in under a minute. Your old setup script is crying.

Here’s how you’d set up a complete data science project with uv:

# Create the project
uv init data_science_project

# Enter the directory
cd data_science_project

# Add scientific computing dependencies
uv add numpy pandas scikit-learn jupyter matplotlib

# Add development dependencies (optional)
uv add --dev pytest pytest-cov black

# Run Jupyter notebooks
uv run jupyter notebook

# Run tests
uv run pytest

Notice how simple that is? No manual environment activation, no separate commands for different tools. Everything flows naturally.

Frequently Asked Questions

Is uv production-ready?

Absolutely. While it’s relatively new, it’s being used in production by many organizations. The Astral team is committed to stability, and it continues to improve with every release.

Will uv replace pip?

Eventually, yes. Many Python developers are switching to uv. However, pip will likely remain the standard for a while. The Python ecosystem moves slowly, and that’s a good thing.

Can I use uv alongside pip?

You shouldn’t mix package managers in the same environment, but you can use uv for some projects and pip for others.

What about compatibility?

uv is compatible with PyPI and all standard Python packages. There’s no special “uv-only” ecosystem—it works with everything pip does.

Conclusion

uv represents the future of Python package management. It’s fast, simple, and incredibly well-designed. Whether you’re building a small script, a data science project, or a large production application, uv makes package management feel effortless. If you haven’t tried it yet, I highly recommend giving it a shot. Your development workflow will thank you.

Key Takeaways:

  • uv is a faster, more unified replacement for pip, virtualenv, and pyenv
  • Installation is a single command
  • Project setup and dependency management are incredibly straightforward
  • It’s production-ready and actively maintained
  • Making the switch is risk-free—it’s fully compatible with the existing Python ecosystem

Installing uv

uv is a single static binary written in Rust — no Python interpreter dependency, no package install dance. The installer pulls the right binary for your platform:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip if you prefer
pip install uv

# Verify
uv --version

The standalone installer puts uv in ~/.cargo/bin/uv and adds it to PATH. That location is shell-aware — restart your terminal or run source ~/.bashrc after install.

Replacing pip and venv

uv handles both package install AND virtual environments — replacing pip + venv with one tool. The most common workflows:

# Create a virtualenv
uv venv                          # creates .venv in current dir
uv venv my-env                   # custom name
uv venv --python 3.11            # specific Python version

# Activate (same as venv)
source .venv/bin/activate        # macOS / Linux
.venv\Scripts\activate           # Windows

# Install packages
uv pip install requests pandas
uv pip install -r requirements.txt
uv pip install -e .              # editable install

# Show what's installed
uv pip list
uv pip freeze > requirements.txt

The uv pip ... command is intentionally drop-in compatible — every pip invocation you know works the same. The speed difference is what shocks you: a 60-second pip install -r requirements.txt typically becomes 2-3 seconds.

Project Workflow with uv init

uv also has a project mode (similar to Poetry, Hatch, or PDM) that manages your pyproject.toml, lockfile, and venv in one place:

# Start a new project
uv init my-app
cd my-app

# Add dependencies (updates pyproject.toml and lockfile)
uv add requests pandas
uv add --dev pytest mypy

# Run any command in the project's venv
uv run python my_script.py
uv run pytest

# Reproduce the lockfile environment exactly
uv sync

The uv.lock file is the cross-platform lockfile — committed to git, identical resolutions on macOS / Linux / Windows. uv sync rebuilds the venv exactly from the lock, perfect for CI.

Managing Python Versions

uv can also install Python itself — no more pyenv, no more system Python pollution:

# Install a specific Python version
uv python install 3.12
uv python install 3.13

# List installed Python versions
uv python list

# Pin a project to a specific Python
echo "3.12" > .python-version
uv venv                          # uses 3.12

# Or pass version explicitly
uv venv --python 3.13

This obsoletes pyenv for most use cases. uv downloads Python from the python-build-standalone project — full, properly-built CPython binaries.

Migrating from pip / Poetry / pipenv

If you have an existing project, migration is mostly painless:

  • From pip + requirements.txt: Just start using uv pip install -r requirements.txt. No file changes needed.
  • From Poetry: uv init in the project, then uv add each dependency from pyproject.toml. Or use the experimental uv import command.
  • From pipenv: uv pip install -r <(pipenv requirements) as a one-liner. Then drop pipenv entirely.

Common Pitfalls

  • Mixing uv and pip in the same venv. Both work, but you’ll get inconsistent results if you bounce between them. Pick one tool per project.
  • Forgetting to activate the venv. uv pip install installs into the ACTIVE venv. If none is active, it installs to system Python — usually not what you want. Always activate first or use uv run.
  • Lockfile drift. If you edit pyproject.toml manually without running uv sync, the lockfile gets out of sync with your stated dependencies. Always use uv add / uv remove.
  • Treating uv as a complete Poetry replacement. uv’s project mode is newer than Poetry’s. Some plugin ecosystems (publishing tools, build backends) are more mature on Poetry. Check your specific use case before committing.
  • Caching across versions. uv’s cache (~/.cache/uv) is shared across projects. If something behaves weirdly, uv cache clean is the nuclear option.

FAQ

Q: Is uv stable enough for production?
A: Yes — Astral (the company behind uv and ruff) ships it at production scale. The pip-compatible commands are battle-tested. The newer “uv init / uv add / uv sync” project mode is solid but still evolving.

Q: How is uv faster?
A: Rust implementation, parallel downloads, parallel resolves, fast metadata caching, and aggressive use of HTTP/2. Most installations are bottlenecked on pypi metadata fetching — uv parallelizes that work to saturate the network.

Q: Do I lose anything by switching from pip?
A: Almost nothing. Edge cases around custom indexes and certain proxy configs may need adjustment. The uv pip subcommand is intentionally a near-perfect drop-in.

Q: uv or Poetry?
A: uv if you value speed and a single binary. Poetry if you have an established workflow, plugins, or your team is on Poetry already. They solve overlapping problems with different opinions.

Q: Does uv work on Windows?
A: Yes, with the official installer. The CLI is identical to Linux / macOS. Some plugin and binary-wheel edge cases are slightly different but the core workflow is the same.

Wrapping Up

uv is the rare new tool that you can adopt incrementally — start by aliasing pip to uv pip and feel the speed difference today. As you get comfortable, graduate to uv venv and uv python install to replace virtualenv + pyenv. The project workflow (uv init / add / sync) is the long-term destination, but you don’t have to commit on day one. For a tool that’s still under 18 months old, uv is remarkably stable and unusually fast.

Related Python Tutorials

Continue learning with these related guides:

How To Mock API Calls in Python

How To Mock API Calls in Python

Intermediate

Your Python application talks to external APIs — fetching weather data, processing payments, sending notifications, pulling user profiles from third-party services. But when you write tests, you do not want those tests to actually hit the internet. Real API calls are slow, flaky, cost money, and make your test results depend on whether some server halfway around the world is having a good day. Mocking API calls lets you test your code’s logic in complete isolation, with predictable responses that run in milliseconds.

Python’s standard library includes everything you need through the unittest.mock module. You do not need to install anything extra to get started — patch, MagicMock, and Mock are all built in. For more advanced scenarios, the third-party responses library provides an elegant way to mock the requests library specifically. Both approaches work seamlessly with pytest.

In this article, we will cover everything you need to mock API calls in Python. We will start with a quick example, then explain how mocking works under the hood. From there, we will walk through patching with decorators and context managers, configuring mock return values and side effects, verifying that calls were made correctly, using the responses library for request-level mocking, and handling error scenarios. We will finish with a complete real-life project that tests a GitHub user profile fetcher end to end.

Mocking an API Call in Python: Quick Example

Here is the simplest possible example of mocking an API call. We have a function that fetches a user from an API, and a test that replaces the HTTP call with a fake response.

# quick_mock_example.py
from unittest.mock import patch, MagicMock

import requests

def get_user(user_id):
    """Fetch a user from the API."""
    response = requests.get(f"https://jsonplaceholder.typicode.com/users/{user_id}")
    response.raise_for_status()
    return response.json()

# Test it without hitting the real API
@patch("requests.get")
def test_get_user(mock_get):
    mock_response = MagicMock()
    mock_response.json.return_value = {"id": 1, "name": "Leanne Graham"}
    mock_response.raise_for_status.return_value = None
    mock_get.return_value = mock_response

    user = get_user(1)
    assert user["name"] == "Leanne Graham"
    mock_get.assert_called_once_with("https://jsonplaceholder.typicode.com/users/1")

if __name__ == "__main__":
    test_get_user()
    print("Test passed!")

Output:

$ python quick_mock_example.py
Test passed!

The @patch decorator replaced requests.get with a MagicMock before the test ran, and restored the real function afterward. We configured the mock to return a fake JSON response, then verified that our function called the right URL. The entire test runs without any network access, making it fast and reliable.

Want to learn about context managers, side effects, error simulation, and the responses library? Keep reading — we cover all of that below.

Debug Dee examining a mock server with magnifying glass
That feeling when your mock returns exactly what you told it to. Trust issues? In testing, they are a feature.

What Is Mocking and Why Mock API Calls?

Mocking is a testing technique where you replace a real object with a fake one that behaves however you configure it. In the context of API calls, mocking means replacing the HTTP client (usually requests.get or requests.post) with a controlled substitute that returns predetermined responses without making any network requests.

Here is why mocking API calls matters for any serious Python project:

Problem with real API calls in testsHow mocking solves it
Tests are slow (network round-trips)Mocks return instantly from memory
Tests fail when API is down or rate-limitedMocks always respond predictably
Tests cost money (paid APIs charge per call)Mocks are free — no HTTP requests leave your machine
Tests depend on external data that changesMocks return the exact data you specify
Tests cannot simulate errors easilyMocks can raise any exception on demand
Tests require authentication tokensMocks bypass all authentication

The core principle is simple: your tests should verify that your code handles API responses correctly. They should not verify that the API itself is working — that is the API provider’s job. Mocking draws a clean boundary between your logic and the external world.

Patching With the Decorator Pattern

The most common way to mock an API call is the @patch decorator from unittest.mock. It temporarily replaces a specified object with a MagicMock for the duration of the test, then restores the original when the test finishes.

# github_client.py
import requests

def get_repo_stars(owner, repo):
    """Fetch the star count for a GitHub repository."""
    url = f"https://api.github.com/repos/{owner}/{repo}"
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    data = response.json()
    return data["stargazers_count"]

def is_popular(owner, repo, threshold=1000):
    """Check if a repository has more stars than the threshold."""
    stars = get_repo_stars(owner, repo)
    return stars >= threshold
# test_github_client.py
from unittest.mock import patch, MagicMock
from github_client import get_repo_stars, is_popular

@patch("github_client.requests.get")
def test_get_repo_stars(mock_get):
    mock_response = MagicMock()
    mock_response.json.return_value = {"stargazers_count": 54321}
    mock_response.raise_for_status.return_value = None
    mock_get.return_value = mock_response

    stars = get_repo_stars("python", "cpython")
    assert stars == 54321

@patch("github_client.requests.get")
def test_is_popular_true(mock_get):
    mock_response = MagicMock()
    mock_response.json.return_value = {"stargazers_count": 5000}
    mock_response.raise_for_status.return_value = None
    mock_get.return_value = mock_response

    assert is_popular("pallets", "flask") is True

@patch("github_client.requests.get")
def test_is_popular_false(mock_get):
    mock_response = MagicMock()
    mock_response.json.return_value = {"stargazers_count": 50}
    mock_response.raise_for_status.return_value = None
    mock_get.return_value = mock_response

    assert is_popular("someone", "small-project") is False

Output:

$ pytest test_github_client.py -v
========================= test session starts =========================
collected 3 items

test_github_client.py::test_get_repo_stars PASSED
test_github_client.py::test_is_popular_true PASSED
test_github_client.py::test_is_popular_false PASSED

========================= 3 passed in 0.02s ==========================

Notice the patch target is "github_client.requests.get", not "requests.get". This is the single most common mistake with @patch: you must patch where the object is looked up, not where it is defined. Since github_client.py imports requests and calls requests.get, you patch it inside github_client‘s namespace.

Patching With Context Managers

Sometimes the decorator pattern is too broad — you only need the mock active for a few lines, not the entire test. The with statement gives you finer control over exactly when the mock is active.

# test_context_manager.py
from unittest.mock import patch, MagicMock
from github_client import get_repo_stars

def test_patch_as_context_manager():
    with patch("github_client.requests.get") as mock_get:
        mock_response = MagicMock()
        mock_response.json.return_value = {"stargazers_count": 999}
        mock_response.raise_for_status.return_value = None
        mock_get.return_value = mock_response

        stars = get_repo_stars("test", "repo")
        assert stars == 999

    # After the with block, requests.get is real again

Output:

$ pytest test_context_manager.py -v
========================= test session starts =========================
collected 1 item

test_context_manager.py::test_patch_as_context_manager PASSED

========================= 1 passed in 0.01s ==========================

The context manager approach is especially useful when your test needs to verify behavior both with and without the mock in the same test function. Inside the with block, the mock is active. Outside it, the original object is restored. This gives you precise control that the decorator cannot match.

Loop Larry confused between two glowing doors
patch() target left or right? Choose wisely, or your tests will gaslight you.

Side Effects: Simulating Errors and Dynamic Responses

Real APIs do not always return happy responses. They time out, return 500 errors, send malformed JSON, and rate-limit your requests. The side_effect parameter on a mock lets you simulate all of these scenarios so your code handles failures gracefully.

# test_error_handling.py
from unittest.mock import patch, MagicMock
import requests
from github_client import get_repo_stars

@patch("github_client.requests.get")
def test_api_timeout(mock_get):
    mock_get.side_effect = requests.exceptions.Timeout("Connection timed out")

    try:
        get_repo_stars("python", "cpython")
        assert False, "Should have raised Timeout"
    except requests.exceptions.Timeout:
        pass  # Expected behavior

@patch("github_client.requests.get")
def test_api_404(mock_get):
    mock_response = MagicMock()
    mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError(
        "404 Not Found"
    )
    mock_get.return_value = mock_response

    try:
        get_repo_stars("nonexistent", "repo")
        assert False, "Should have raised HTTPError"
    except requests.exceptions.HTTPError:
        pass  # Expected behavior

@patch("github_client.requests.get")
def test_api_returns_different_responses(mock_get):
    response_1 = MagicMock()
    response_1.json.return_value = {"stargazers_count": 100}
    response_1.raise_for_status.return_value = None

    response_2 = MagicMock()
    response_2.json.return_value = {"stargazers_count": 200}
    response_2.raise_for_status.return_value = None

    mock_get.side_effect = [response_1, response_2]

    assert get_repo_stars("owner", "repo1") == 100
    assert get_repo_stars("owner", "repo2") == 200

Output:

$ pytest test_error_handling.py -v
========================= test session starts =========================
collected 3 items

test_error_handling.py::test_api_timeout PASSED
test_error_handling.py::test_api_404 PASSED
test_error_handling.py::test_api_returns_different_responses PASSED

========================= 3 passed in 0.02s ==========================

When side_effect is an exception class or instance, the mock raises that exception when called. When it is a list, the mock returns each item in sequence on successive calls. This is incredibly powerful for testing retry logic, fallback behavior, and error recovery paths that would be nearly impossible to trigger with real API calls.

Verifying How Your Code Calls the API

Mocking is not just about controlling what comes back from the API — it also lets you verify exactly how your code called it. Did it use the right URL? Did it send the correct headers? Did it call the API the expected number of times? The mock object records every call for inspection.

# notification_service.py
import requests

def send_notification(user_email, message, urgent=False):
    """Send a notification via the company API."""
    payload = {
        "to": user_email,
        "body": message,
        "priority": "high" if urgent else "normal"
    }
    headers = {"Authorization": "Bearer fake-token-123"}
    response = requests.post(
        "https://api.notifications.internal/send",
        json=payload,
        headers=headers,
        timeout=5
    )
    response.raise_for_status()
    return response.json()
# test_notification_service.py
from unittest.mock import patch, MagicMock, call
from notification_service import send_notification

@patch("notification_service.requests.post")
def test_sends_correct_payload(mock_post):
    mock_response = MagicMock()
    mock_response.json.return_value = {"status": "sent", "id": "msg-123"}
    mock_response.raise_for_status.return_value = None
    mock_post.return_value = mock_response

    result = send_notification("alice@example.com", "Hello!", urgent=True)

    mock_post.assert_called_once_with(
        "https://api.notifications.internal/send",
        json={
            "to": "alice@example.com",
            "body": "Hello!",
            "priority": "high"
        },
        headers={"Authorization": "Bearer fake-token-123"},
        timeout=5
    )
    assert result["status"] == "sent"

@patch("notification_service.requests.post")
def test_normal_priority_by_default(mock_post):
    mock_response = MagicMock()
    mock_response.json.return_value = {"status": "sent"}
    mock_response.raise_for_status.return_value = None
    mock_post.return_value = mock_response

    send_notification("bob@example.com", "Update available")

    actual_call = mock_post.call_args
    assert actual_call.kwargs["json"]["priority"] == "normal"

Output:

$ pytest test_notification_service.py -v
========================= test session starts =========================
collected 2 items

test_notification_service.py::test_sends_correct_payload PASSED
test_notification_service.py::test_normal_priority_by_default PASSED

========================= 2 passed in 0.01s ==========================

The assert_called_once_with method checks both that the mock was called exactly once and that it received the exact arguments you specified. For more flexible inspection, call_args gives you the actual positional and keyword arguments from the most recent call. This is how you verify that your code is building the right request body, sending the correct headers, and using proper timeout values — all without any network traffic.

Cache Katie pointing at a clipboard of green checkmarks
Every assert_called_with() that passes is a tiny victory for determinism.

The responses Library: Mocking at the HTTP Level

While unittest.mock works at the Python object level (replacing requests.get itself), the responses library works at the HTTP level — it intercepts outgoing HTTP requests and returns configured responses. This is closer to how the real code works and requires less boilerplate for request-heavy tests.

# test_with_responses.py
import responses
import requests

def fetch_todos(user_id):
    """Fetch todos for a user from the API."""
    url = f"https://jsonplaceholder.typicode.com/todos?userId={user_id}"
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    todos = response.json()
    return [t for t in todos if not t["completed"]]

@responses.activate
def test_fetch_incomplete_todos():
    responses.add(
        responses.GET,
        "https://jsonplaceholder.typicode.com/todos",
        json=[
            {"id": 1, "userId": 1, "title": "Buy groceries", "completed": False},
            {"id": 2, "userId": 1, "title": "Walk the dog", "completed": True},
            {"id": 3, "userId": 1, "title": "Write tests", "completed": False},
        ],
        status=200
    )

    incomplete = fetch_todos(1)
    assert len(incomplete) == 2
    assert incomplete[0]["title"] == "Buy groceries"
    assert incomplete[1]["title"] == "Write tests"

@responses.activate
def test_fetch_todos_server_error():
    responses.add(
        responses.GET,
        "https://jsonplaceholder.typicode.com/todos",
        json={"error": "Internal Server Error"},
        status=500
    )

    try:
        fetch_todos(1)
        assert False, "Should have raised HTTPError"
    except requests.exceptions.HTTPError:
        pass

Output:

$ pytest test_with_responses.py -v
========================= test session starts =========================
collected 2 items

test_with_responses.py::test_fetch_incomplete_todos PASSED
test_with_responses.py::test_fetch_todos_server_error PASSED

========================= 2 passed in 0.03s ==========================

The @responses.activate decorator intercepts all HTTP requests made through the requests library during the test. You register expected responses with responses.add(), specifying the HTTP method, URL, response body, and status code. If your code tries to make a request to an unregistered URL, responses raises a ConnectionError, which catches accidental real API calls. Install it with pip install responses.

Pyro Pete building a colorful block wall between server towers
Building walls between your code and the real API since unittest.mock was cool.

Combining Mocks With pytest Fixtures

When multiple tests share the same mock setup, pytest fixtures eliminate the repetition. You can create fixtures that set up mocks and inject them into any test that needs them.

# test_with_fixtures.py
import pytest
from unittest.mock import patch, MagicMock
from github_client import get_repo_stars, is_popular

@pytest.fixture
def mock_github_api():
    """Fixture that patches requests.get for GitHub API tests."""
    with patch("github_client.requests.get") as mock_get:
        mock_response = MagicMock()
        mock_response.raise_for_status.return_value = None
        mock_get.return_value = mock_response
        yield {"mock_get": mock_get, "mock_response": mock_response}

def test_repo_with_many_stars(mock_github_api):
    mock_github_api["mock_response"].json.return_value = {"stargazers_count": 50000}
    assert get_repo_stars("big", "project") == 50000

def test_repo_with_few_stars(mock_github_api):
    mock_github_api["mock_response"].json.return_value = {"stargazers_count": 3}
    assert get_repo_stars("tiny", "project") == 3

def test_popular_repo(mock_github_api):
    mock_github_api["mock_response"].json.return_value = {"stargazers_count": 9999}
    assert is_popular("big", "project", threshold=5000) is True

def test_unpopular_repo(mock_github_api):
    mock_github_api["mock_response"].json.return_value = {"stargazers_count": 10}
    assert is_popular("tiny", "project", threshold=5000) is False

Output:

$ pytest test_with_fixtures.py -v
========================= test session starts =========================
collected 4 items

test_with_fixtures.py::test_repo_with_many_stars PASSED
test_with_fixtures.py::test_repo_with_few_stars PASSED
test_with_fixtures.py::test_popular_repo PASSED
test_with_fixtures.py::test_unpopular_repo PASSED

========================= 4 passed in 0.02s ==========================

The fixture uses yield inside the with patch() context manager, which means the mock is active while the test runs and automatically cleaned up afterward. Each test only needs to configure the specific return value it cares about — the common setup (patching, creating the mock response, wiring up raise_for_status) is handled once in the fixture. Put shared fixtures like this in conftest.py to make them available across multiple test files.

Real-Life Example: Testing a GitHub Profile Fetcher

Let us build a complete module that fetches GitHub user profiles and formats them for display, then write a comprehensive test suite covering happy paths, error handling, and edge cases.

# github_profile.py
import requests

class GitHubProfileError(Exception):
    """Custom exception for GitHub profile fetching errors."""
    pass

class GitHubProfile:
    API_BASE = "https://api.github.com"

    def __init__(self, username):
        self.username = username
        self._data = None

    def fetch(self):
        """Fetch the user profile from GitHub API."""
        try:
            response = requests.get(
                f"{self.API_BASE}/users/{self.username}",
                headers={"Accept": "application/vnd.github.v3+json"},
                timeout=10
            )
            response.raise_for_status()
            self._data = response.json()
        except requests.exceptions.Timeout:
            raise GitHubProfileError(f"Timeout fetching profile for {self.username}")
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 404:
                raise GitHubProfileError(f"User '{self.username}' not found")
            raise GitHubProfileError(f"API error: {e}")
        return self

    def summary(self):
        """Return a formatted summary string."""
        if not self._data:
            raise GitHubProfileError("Profile not fetched yet. Call fetch() first.")
        name = self._data.get("name", self.username)
        bio = self._data.get("bio", "No bio available")
        repos = self._data.get("public_repos", 0)
        followers = self._data.get("followers", 0)
        return f"{name} | {repos} repos | {followers} followers | {bio}"

    @property
    def is_prolific(self):
        """Check if the user has more than 50 public repos."""
        if not self._data:
            return False
        return self._data.get("public_repos", 0) > 50

Now the test suite that exercises the full class:

# test_github_profile.py
import pytest
from unittest.mock import patch, MagicMock
import requests
from github_profile import GitHubProfile, GitHubProfileError

@pytest.fixture
def mock_api():
    with patch("github_profile.requests.get") as mock_get:
        mock_response = MagicMock()
        mock_response.raise_for_status.return_value = None
        mock_get.return_value = mock_response
        yield {"get": mock_get, "response": mock_response}

@pytest.fixture
def sample_profile_data():
    return {
        "login": "octocat",
        "name": "The Octocat",
        "bio": "GitHub mascot and occasional developer",
        "public_repos": 85,
        "followers": 12000
    }

# --- Fetch Tests ---

def test_fetch_sets_data(mock_api, sample_profile_data):
    mock_api["response"].json.return_value = sample_profile_data
    profile = GitHubProfile("octocat").fetch()
    assert profile._data == sample_profile_data

def test_fetch_uses_correct_url(mock_api, sample_profile_data):
    mock_api["response"].json.return_value = sample_profile_data
    GitHubProfile("torvalds").fetch()
    mock_api["get"].assert_called_once_with(
        "https://api.github.com/users/torvalds",
        headers={"Accept": "application/vnd.github.v3+json"},
        timeout=10
    )

def test_fetch_timeout(mock_api):
    mock_api["get"].side_effect = requests.exceptions.Timeout("timed out")
    with pytest.raises(GitHubProfileError, match="Timeout"):
        GitHubProfile("slowuser").fetch()

def test_fetch_user_not_found(mock_api):
    error_response = MagicMock()
    error_response.status_code = 404
    mock_api["response"].raise_for_status.side_effect = (
        requests.exceptions.HTTPError(response=error_response)
    )
    with pytest.raises(GitHubProfileError, match="not found"):
        GitHubProfile("ghost").fetch()

# --- Summary Tests ---

def test_summary_format(mock_api, sample_profile_data):
    mock_api["response"].json.return_value = sample_profile_data
    profile = GitHubProfile("octocat").fetch()
    result = profile.summary()
    assert "The Octocat" in result
    assert "85 repos" in result
    assert "12000 followers" in result

def test_summary_without_fetch():
    profile = GitHubProfile("someone")
    with pytest.raises(GitHubProfileError, match="not fetched"):
        profile.summary()

def test_summary_missing_bio(mock_api):
    mock_api["response"].json.return_value = {
        "login": "minimal", "name": "Min User",
        "public_repos": 1, "followers": 0
    }
    profile = GitHubProfile("minimal").fetch()
    assert "No bio available" in profile.summary()

# --- Property Tests ---

@pytest.mark.parametrize("repo_count, expected", [
    (100, True),
    (51, True),
    (50, False),
    (0, False),
])
def test_is_prolific(mock_api, repo_count, expected):
    mock_api["response"].json.return_value = {"public_repos": repo_count}
    profile = GitHubProfile("user").fetch()
    assert profile.is_prolific is expected

def test_is_prolific_without_fetch():
    profile = GitHubProfile("someone")
    assert profile.is_prolific is False

Output:

$ pytest test_github_profile.py -v
========================= test session starts =========================
collected 11 items

test_github_profile.py::test_fetch_sets_data PASSED
test_github_profile.py::test_fetch_uses_correct_url PASSED
test_github_profile.py::test_fetch_timeout PASSED
test_github_profile.py::test_fetch_user_not_found PASSED
test_github_profile.py::test_summary_format PASSED
test_github_profile.py::test_summary_without_fetch PASSED
test_github_profile.py::test_summary_missing_bio PASSED
test_github_profile.py::test_is_prolific[100-True] PASSED
test_github_profile.py::test_is_prolific[51-True] PASSED
test_github_profile.py::test_is_prolific[50-False] PASSED
test_github_profile.py::test_is_prolific_without_fetch PASSED

========================= 11 passed in 0.03s =========================

This test suite demonstrates all the mocking techniques from the article working together. The mock_api fixture handles common setup, side_effect simulates timeouts and HTTP errors, assert_called_once_with verifies the request details, and parametrize covers the boundary cases for is_prolific. Every test runs without internet access and completes in milliseconds.

Sudo Sam standing confidently before a wall of green shields
A full test suite with zero network calls. Your CI pipeline just shed a tear of joy.

Frequently Asked Questions

Should I use unittest.mock or the responses library?

Use unittest.mock when you need general-purpose mocking that works with any library or object, not just HTTP calls. Use responses when you are specifically testing code that uses the requests library and want a cleaner syntax for defining mock HTTP responses. For most projects, start with unittest.mock since it is built in and covers all use cases. Add responses when you have many request-heavy tests and the mock setup becomes repetitive.

Why does my patch not seem to work?

The most common reason is patching the wrong target. You must patch where the object is used, not where it is defined. If your module my_app.py does import requests and calls requests.get, you patch "my_app.requests.get", not "requests.get". If the module does from requests import get, you patch "my_app.get" instead. Check your import style and make sure the patch target matches it.

How do I mock async API calls with aiohttp or httpx?

For aiohttp, use the aioresponses library which works like responses but for async HTTP. For httpx, use respx. Both follow the same pattern: register expected URLs with mock responses, run your async code, and verify the calls. You can also use unittest.mock.AsyncMock (Python 3.8+) for general async mocking with patch.

Can I mock only some API calls and let others go through?

With unittest.mock, you can use side_effect with a function that conditionally returns a mock or calls the real implementation. With responses, add responses.passthrough_prefixes = ("https://allowed-api.com",) to let specific URLs through while mocking others. However, mixing real and mocked calls in tests is generally a sign that you should split the test into separate unit and integration tests.

How many things should I mock in a single test?

Mock only the external boundaries — the things that cross your application’s edge (HTTP calls, database queries, file I/O, system clocks). Do not mock internal functions or classes within your own codebase unless you have a specific reason. Over-mocking makes tests brittle because they break whenever you refactor internal code, even if the external behavior stays the same. A good rule of thumb: if you are mocking more than two things in one test, the function under test might be doing too much and should be refactored.

Conclusion

We covered the complete toolkit for mocking API calls in Python: patching with decorators and context managers, configuring return values and side effects with MagicMock, verifying call arguments with assert_called_once_with, using the responses library for HTTP-level mocking, combining mocks with pytest fixtures, and simulating errors like timeouts and 404s. The GitHub profile project showed how all these techniques work together in a realistic codebase.

Try extending the GitHub profile tests as practice: add a method that fetches the user’s repositories, handle pagination, or add caching with a TTL. Each new feature gives you more opportunities to practice mocking different response shapes and error conditions.

For the complete unittest.mock documentation, visit the official Python docs at docs.python.org/3/library/unittest.mock. For the responses library, see its GitHub page at github.com/getsentry/responses.

How To Send Emails with Python Using smtplib and Gmail

How To Send Emails with Python Using smtplib and Gmail

Beginner

You have built a Python script that generates a report, scrapes a website, or monitors a server — and now you need it to tell you what happened. Maybe you want a daily summary email, an alert when something breaks, or a confirmation that a scheduled job finished successfully. Sending email programmatically is one of those kills every Python developer eventually needs, and the good news is that Python has everything you need built right in.

Python’sstandard library includes smtplib for connecting to mail servers and email for building properly formatted messages. You do not need to install any third-party packages. All you need is a Gmail account with an App Password (we will walk through setting that up) and about 10 lines of code to send your first email.

tter properly. You need both: email.message.EmailMessage builds a correctly formatted email (headers, body, attachments), and smtplib delivers it to the mail server.

ModulePurposePart of Standard Library?
smtplibConnect to SMTP server, authenticate, sendYes
email.messageBuild email messages (headers, body, MIME)Yes
email.mimeLegacy API for building MIME messagesYes (use EmailMessage instead)
sslSecure socket layer for encrypted connectionsYes

The modern approach uses EmailMessage (introduced in Python 3.6) instead of the older MIMEText/MIMEMultipart classes. EmailMessage handles plain text, HTML, and attachments through a single clean API. We will use it throughout this tutorial.

Setting Up Gmail App Passwords

Gmail does not allow you to log in with your regular password from a script — it requires an App Password instead. An App Password is a 16-character code that gives your script access to your Gmail account without exposing your main password. Here is how to set one up.

First, you need to enable 2-Step Verification on your Google account if you have not already. Go to myaccount.google.com/security, scroll to “How you sign in to Google,” and turn on 2-Step Verification. Once that is active, go to myaccount.google.com/apppasswords, enter a name like “Python Script,” and click Create. Google will show you a 16-character password — copy it immediately because you will not see it again.

# secure_config.py
import os

# Store your App Password as an environment variable -- never hardcode it
# Set it in your terminal first:
#   export GMAIL_APP_PASSWORD="xxxx xxxx xxxx xxxx"
#   export GMAIL_ADDRESS="your_email@gmail.com"

gmail_address = os.environ.get("GMAIL_ADDRESS")
gmail_password = os.environ.get("GMAIL_APP_PASSWORD")

if not gmail_address or not gmail_password:
    raise ValueError("Set GMAIL_ADDRESS and GMAIL_APP_PASSWORD environment variables")

print(f"Configured for: {gmail_address}")
print(f"Password loaded: {'*' * len(gmail_password)}")

Output:

Configured for: your_email@gmail.com
Password loaded: ****************

Never hardcode your App Password directly in your Python files. Use environment variables or a .env file (with python-dotenv) to keep credentials out of your source code. If you accidentally commit a password to Git, revoke the App Password immediately from your Google account settings and create a new one.

SMTP_SSL vs STARTTLS: Choosing a Connection Method

There are two ways to establish a secure connection to an SMTP server: SMTP_SSL and SMTP with STARTTLS. Both encrypt your email traffic, but they work differently.

MethodPortHow It WorksWhen to Use
SMTP_SSL465Encrypted from the startPreferred for Gmail
SMTP + starttls()587Starts unencrypted, upgradesSome corporate servers

Here is how the STARTTLS approach looks in practice. The connection starts as plaintext on port 587, then upgrades to TLS before sending any sensitive data.

# starttls_example.py
import smtplib
import os
from email.message import EmailMessage

msg = EmailMessage()
msg["Subject"] = "STARTTLS Test"
msg["From"] = os.environ["GMAIL_ADDRESS"]
msg["To"] = os.environ["GMAIL_ADDRESS"]  # Send to yourself for testing
msg.set_content("This email was sent using STARTTLS on port 587.")

with smtplib.SMTP("smtp.gmail.com", 587) as server:
    server.ehlo()       # Identify ourselves to the server
    server.starttls()   # Upgrade connection to TLS
    server.ehlo()       # Re-identify after TLS upgrade
    server.login(os.environ["GMAIL_ADDRESS"], os.environ["GMAIL_APP_PASSWORD"])
    server.send_message(msg)

print("Email sent via STARTTLS!")

Output:

Email sent via STARTTLS!

For Gmail, SMTP_SSL on port 465 is simpler and preferred — it encrypts the connection from the first byte. Use the STARTTLS approach only if your email provider specifically requires port 587, or if you need to connect to a server that does not support direct SSL.

Sudo Sam choosing between padlocks - SMTP SSL vs STARTTLS comparison
SMTP_SSL wraps the whole conversation in encryption. STARTTLS hopes nobody is listening to the handshake.

Sending HTML-Formatted Emails

Plain text emails get the job done, but HTML emails let you include formatting, links, tables, and images. The EmailMessage class makes it easy to send an email with both a plain-text fallback and an HTML version — email clients that support HTML will display the rich version, while older clients fall back to plain text.

“Subject”] = “Weekly Python Report” msg[“From”] = os.environ[“GMAIL_ADDRESS”] msg[“To”] = os.environ[“GMAIL_ADDRESS”] # Plain text version (fallback) msg.set_content(“Your weekly report: 42 scripts ran, 0 failures, 15.2s avg runtime.”) # HTML version (preferred by most email clients) html_content = “””\

Weekly Python Report

Here is your automated summary for the week:

Metric Value 420Avg Runtime
Scripts Executed
Failures
15.2 seconds

All systems operational. Have a great week!

""" msg.add_alternative(html_content, subtype="html") with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server: server.login(os.environ["GMAIL_ADDRESS"], os.environ["GMAIL_APP_PASSWORD"]) server.send_message(msg) print("HTML email sent!")

Output:

HTML email sent!

The key method here is msg.add_alternative(html_content, subtype="html"). This tells the email that it has two versions: the plain text you set with set_content() and the HTML alternative. Always provide both -- some email clients strip HTML entirely, and a plain-text fallback ensures your message is readable everywhere.

Adding File Attachments

Sending reports, logs, or CSV files as email attachments is a common automation task. The EmailMessage class handles this with add_attachment(), which automatically detects the file type and encodes it correctly.

# attachment_email.py
import smtplib
import os
import mimetypes
from email.message import EmailMessage
from pathlib import Path

def send_email_with_attachment(to_address, subject, body, file_path):
    """Send an email with a file attachment."""
    msg = EmailMessage()
    msg["Subject"] = subject
    msg["From"] = os.environ["GMAIL_ADDRESS"]
    msg["To"] = to_address
    msg.set_content(body)

    # Read and attach the file
    filepath = Path(file_path)
    if not filepath.exists():
        raise FileNotFoundError(f"Attachment not found: {file_path}")

    mime_type, _ = mimetypes.guess_type(str(filepath))
    if mime_type is None:
        mime_type = "application/octet-stream"
    maintype, subtype = mime_type.split("/")

    with open(filepath, "rb") as f:
        msg.add_attachment(
            f.read(),
            maintype=maintype,
            subtype=subtype,
            filename=filepath.name
        )

    with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
        server.login(os.environ["GMAIL_ADDRESS"], os.environ["GMAIL_APP_PASSWORD"])
        server.send_message(msg)

    print(f"Email sent to {to_address} with attachment: {filepath.name}")


# Create a sample CSV file for testing
sample_csv = "name,score,grade\nAlice,95,A\nBob,87,B\nCharlie,72,C\n"
Path("report.csv").write_text(sample_csv)

# Send it
send_email_with_attachment(
    to_address=os.environ["GMAIL_ADDRESS"],
    subject="Monthly Report Attached",
    body="Please find the monthly report attached to this email.",
    file_path="report.csv"
)

Output:

Email sent to your_email@gmail.com with attachment: report.csv

The mimetypes.guess_type() function automatically detects the correct MIME type from the file extension -- text/csv for CSV files, application/pdf for PDFs, image/png for images, and so on. If the type cannot be determined, we fall back to application/octet-stream which tells the email client to treat it as a generic binary file. You can attach multiple files by calling add_attachment() multiple times on the same message.

Debug Dee attaching gift box to envelope - Python email attachments
add_attachment() handles the MIME type guessing. You just hand it the file and hope for the best.

Sending to Multiple Recipients

Often you need to send the same email to several people -- maybe a team notification or a batch of personalized messages. There are two approaches: sending one email to multiple recipients (everyone sees all addresses) or sending individual emails (each person sees only their address).

# multiple_recipients.py
import smtplib
import os
from email.message import EmailMessage

def send_to_group(recipients, subject, body)
   """Send one email to multiple recipients (all visible in To field)."""
    msg = EmailMessage()
    msg["Subject"] = subject
    msg["From"] = os.environ["GMAIL_ADDRESS"]
    msg["To"] = ", ".join(recipients)
    msg.set_content(body)

    with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
        server.login(os.environ["GMAIL_ADDRESS"], os.environ["GMAIL_APP_PASSWORD"])
        server.send_message(msg)

    print(f"Group email sent to {len(recipients)} recipients")


def send_individual(recipients, subject_template, body_template):
    """Send personalized emails to each recipient individualy."""
    with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
        server.login(os.environ["GMAIL_ADDRESS"], os.environ["GMAIL_APP_PASSWORD"])

        for name, email_addr in recipients:
            msg = EmailMessage()
            msg["Subject"] = subject_template.format(name=name)
            msg["From"] = os.environ["GMAIL_ADDRESS"]
            msg["To"] = email_addr
            msg.set_content(body_template.format(name=name))
            server.send_message(msg)
            print(f"Sent to {name} ({email_addr})")

    print(f"All {len(recipients)} individual emails sent!")


# Example: group email
team = ["alice@example.com", "bob@example.com", "charlie@example.com"]
send_to_group(team, "Team Update", "Sprint review meeting moved to 3 PM.")

# Example: personalized emails
contacts = [("Alice", "alice@example.com"), ("Bob", "bob@example.com")]
send_individual(
    contacts,
    subject_template="Hey {name}, your weekly summary",
    body_template="Hi {name},\n\nHere is your personalized weekly summary.\n\nBest regards"
)

Real-Life Example: Automated Error Alert System

Let us tie everything together with a practical project. This script monitors a log file for errors and sends an HTML alert email with a summary of any issues found. It combines plain text processing, HTML email formatting, and the EmailNotifier class pattern you can reuse in any project.

# Create a sample log file for demonstration
sample_log = """2026-03-31 08:00:01 INFO  Starting data pipeline
2026-03-31 08:00:05 INFO  Connected to database
2026-03-31 08:00:12 WARNING  Slow query detected (2.3s)
2026-03-31 08:00:15 ERROR  Failed to fetch API data: ConnectionTimeout
2026-03-31 08:00:16 ERROR  Retry 1 of 3 failed
2026-03-31 08:00:20 INFO  Retry 2 succeeded
2026-03-31 08:00:45 CRITICAL  Database connection lost
2026-03-31 08:00:46 INFO  Reconnecting to database...
2026-03-31 08:01:00 INFO  Pipeline completed with errors
"""
Path("app.log").write_text(sample_log)

# Check the log and send an alert if errors are found
def check_log_for_errors(log_path):
    """Scan a log file and return any lines containing ERROR or CRITICAL."""
    errors = []
    path = Path(log_path)
    if not path.exists():
        return errors

    with open(path, "r") as f:
        for line_num, line in enumerate(f, 1):
            stripped = line.strip()
            if "ERROR" in stripped or "CRITICAL" in stripped:
                errors.append({"line": line_num, "text": stripped})

    return errors


def build_alert_html(log_file, errors):
    """Build an HTML alert email from error entries."""
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    rows = ""
    for err in errors:
        rows += f'{err["line"]}'
        rows += f'{err["text"]}'

    return f"""""

sample_log = """2026-03-31 08:00:01 INFO  Starting data pipeline
2026-03-31 08:00:05 INFO  Connected to database
2026-03-31 08:00:12 WARNING  Slow query detected (2.3s)
2026-03-31 08:00:15 ERROR  Failed to fetch API data: ConnectionTimeout
2026-03-31 08:00:16 ERROR  Retry 1 of 3 failed
2026-03-31 08:00:20 INFO  Retry 2 succeeded
2026-03-31 08:00:45 CRITICAL  Database connection lost
2026-03-31 08:00:46 INFO  Reconnecting to database...
2026-03-31 08:01:00 INFO  Pipeline completed with errors
"""
Path("app.log").write_text(sample_log)

# Check the log and send an alert if errors are found
class EmailNotifier:
    """Reusable email notification system" HTML alert built {len(errors)} chars)
In production: notifier.send(admin_email, subject, body, html)
else:
    print("No errors found. All clear!")

Output:

Found 3 error(s) in app.log:
  Line 4: 2026-03-31 08:00:15 ERROR  Failed to fetch API data: ConnectionTimeout
  Line 5: 2026-03-31 08:00:16 ERROR  Retry 1 of 3 failed
  Line 7: 2026-03-31 08:00:45 CRITICAL  Database connection lost

HTML alert built (698 chars)
In production: notifier.send(admin_email, subject, body, html)

This notification system is designed to be dropped into any existing project. The EmailNotifier class handles all the SMTP details, check_log_for_errors() scans for problems, and build_alert_html() creates a readable alert email. You could schedule this to run every hour with cron or the schedule library, and you would have a lightweight monitoring system without any third-party services.

SMTP isn't fancy. It just works. Sometimes that's enough.
SMTP isn't fancy. It just works. Sometimes that's enough.

Gmail Sending Limits and Best Practices

Gmail enforces rate limits on how many emails you can send per day. Knowing these limits prevents your script from getting temporarily blocked.

Account TypeDaily LimitPer MinuteMax Recipients Per Email
Free Gmail500 emails~20500
Google Workspace2,000 emails~302,000

If you hit these limits, Gmail returns an SMTPDataError with code 421 or 550. Your script should catch this and wait before retrying. For high-volume sending (marketing emails, large mailing lists), use a dedicated email service like SendGrid, Mailgun, or Amazon SES instead of Gmail -- they are designed for bulk sending and provide analytics, bounce handling, and higher limits.

Frequently Asked Questions

How do I create a Gmail App Password?

Go to myaccount.google.com/apppasswords after enabling 2-Step Verification on your account. Click "Select app," choose "Other," type a name like "Python Script," and click Generate. Copy the 16-character password and use it in your server.login() call instead of your regular Gmail password. You can revoke it anytime from the same page.

Why does Gmail reject my login with SMTPAuthenticationError?

This almost always means you are using your regular Gmail password instead of an App Password. Google disabled "Less Secure App Access" permanently in 2022. You must use an App Password (see the section above) or switch to OAuth2 for more complex applications. Double-check that there are no extra spaces in your password string.

My script hangs when connecting to the SMTP server. What is wrong?

Add a timeout parameter to your SMTP connection: smtplib.SMTP_SSL("smtp.gmail.com", 465, timeout=30). This prevents the script from hanging forever if the server is unreachable. Common causes include corporate firewalls blocking port 465 or 587, VPN interference, or DNS resolution failures. Try pinging smtp.gmail.com from your terminal to verify connectivity.

How do I send emails with special characters or non-English text?

The EmailMessage class handles Unicode correctly by default. Just pass your text normally: msg.set_content("Bonjour! Voici votre rapport."). The message will be encoded as UTF-8 automatically. If you are using the older MIMEText API, explicitly set the charset: MIMEText(body, "plain", "utf-8").

How can I test email sending without actually sending emails?

Python has a built-in debugging SMTP server that prints emails to the terminal instead of sending them. Run python -m smtpd -n -c DebuggingServer localhost:1025 in one terminal, then connect your script to localhost:1025 using smtplib.SMTP("localhost", 1025) (no SSL, no login). You will see the full email content printed to the terminal. For Python 3.12+, use aiosmtpd instead since the built-in smtpd module was removed.

Conclusion

You now have everything you need to send emails from Python scripts: plain text messages with set_content(), HTML-formatted emails with add_alternative(), file attachments with add_attachment(), and a robust error-handling wrapper with retry logic. The notification system project gives you a ready-to-use template for monitoring any automated task.

Try extending the notification system to watch multiple log files, send daily digest emails instead of per-error alerts, or add Slack webhook notifications alongside email. The EmailNotifier class is designed to be subclassed and customized for your specific needs.

For the complete API reference, see the official Python documentation for smtplib and email.message.

How To Use Type Hints in Python with Mypy

How To Use Type Hints in Python with Mypy

Intermediate

How To Use Type Hints in Python with Mypy

Python is known for its simplicity and readability, but this comes at a cost — it’s dynamically typed, which means you can assign any type of value to a variable at any time. While this flexibility is powerful, it can lead to subtle bugs that only appear at runtime. Imagine debugging a function that crashes because you accidentally passed a string where an integer was expected, only to discover the issue after hours of investigation. Type hints solve this problem by letting you specify what types your functions and variables should accept, catching errors before your code ever runs.

If you’re worried that adding type hints will make your Python code feel like Java or C++, rest assured — Python’s type hints are optional, unobtrusive, and entirely optional at runtime. They exist purely for documentation, IDE support, and static analysis. Your code runs exactly the same with or without them, but with type hints, you unlock powerful tools like Mypy that can catch entire categories of bugs before deployment.

In this tutorial, you’ll learn everything you need to start using type hints effectively. We’ll cover the basics of annotating variables and functions, explore the typing module, understand how Mypy validates your code, and work through real-world examples that demonstrate the power of static type checking. By the end, you’ll understand why type hints are becoming standard practice in professional Python codebases.

Quick Example

Here’s a minimal example that shows type hints in action. Don’t worry if it looks unfamiliar — we’ll break down each component in detail:

# file: quick_example.py
def greet(name: str, age: int) -> str:
    return f"Hello {name}, you are {age} years old"

def add_numbers(a: int, b: int) -> int:
    return a + b

result = add_numbers(5, 10)
message = greet("Alice", 30)
print(message)
print(result)

Without type hints, Python wouldn’t catch it if you accidentally called `add_numbers(“5”, 10)` with a string instead of an integer. With type hints and Mypy, this error is caught instantly before you run the code.

Type hints catch the bugs your future self forgets to.
Type hints catch the bugs your future self forgets to.

What Are Type Hints and Why Use Them?

Type hints are annotations that specify what types your functions, variables, and return values should be. They’re written using Python’s typing syntax and don’t affect how your code executes — Python’s interpreter completely ignores them at runtime. Their purpose is to help you, your team, and automated tools understand what types of data flow through your code.

The primary benefits of type hints include catching bugs before runtime, improving code readability, enabling better IDE autocompletion, serving as documentation, and refactoring with confidence. When you annotate your code with types, tools like Mypy can analyze it statically and warn you about potential type mismatches without executing the code.

Let’s compare typed versus untyped code side by side:

Without Type Hints With Type Hints
def process(data): def process(data: list) -> int:
Unclear what types are expected Clear intent and expectations
IDE can’t provide smart autocomplete IDE knows what methods are available
Runtime errors when wrong types passed Static analysis catches type errors
Hard to refactor safely Mypy ensures refactoring doesn’t break contracts

Type hints scale tremendously in larger codebases. A function with type hints serves as a contract — callers know exactly what to pass, and the function author knows exactly what to expect. This reduces bugs, improves collaboration, and makes code easier to maintain.

Basic Type Hints: Built-in Types

Let’s start with the simplest type hints — annotations for built-in Python types. You can annotate variables when you create them, and annotate function parameters and return values.

# file: basic_types.py
# Simple variable annotations
name: str = "Alice"
age: int = 30
height: float = 5.8
is_active: bool = True

# Function with type hints
def calculate_age_in_days(age: int) -> int:
    return age * 365

def greet_user(name: str, age: int) -> str:
    days = calculate_age_in_days(age)
    return f"{name} is {days} days old"

# Using the functions
print(greet_user("Bob", 25))
print(calculate_age_in_days(40))

Output:

Bob is 9125 days old
14600

In the example above, the colon (:) separates the parameter name from its type. For return types, the arrow (->) comes after the parameter list. These annotations tell anyone reading the code — and tools like Mypy — exactly what types are expected. If you tried to call `calculate_age_in_days(“thirty”)`, Mypy would immediately flag it as an error.

The basic types you’ll use most often are `str`, `int`, `float`, and `bool`. But what if you need to work with collections like lists or dictionaries? That’s where things get interesting.

mypy is the friend who tells you about the bug before it ships.
mypy is the friend who tells you about the bug before it ships.

Collections: Lists, Dicts, and Tuples

When you want to annotate a list, you can’t just write `list` — you need to specify what type of items the list contains. This is where the `typing` module comes in. The `typing` module provides generic types like `List`, `Dict`, and `Tuple` that let you specify what they contain.

# file: collections_example.py
from typing import List, Dict, Tuple

# List of integers
scores: List[int] = [95, 87, 92, 88]

# List of strings
names: List[str] = ["Alice", "Bob", "Charlie"]

# Dictionary with string keys and integer values
age_map: Dict[str, int] = {"Alice": 30, "Bob": 25, "Charlie": 28}

# Tuple with fixed types
location: Tuple[float, float] = (40.7128, -74.0060)

# Function that processes a list
def sum_scores(scores: List[int]) -> int:
    return sum(scores)

# Function that works with dictionaries
def get_age(person_name: str, ages: Dict[str, int]) -> int:
    return ages[person_name]

# Using the functions
total = sum_scores(scores)
alice_age = get_age("Alice", age_map)
print(f"Total scores: {total}")
print(f"Alice's age: {alice_age}")
print(f"Location: {location}")

Output:

Total scores: 362
Alice's age: 30
Location: (40.7128, -74.006)

The syntax `List[int]` means “a list containing integers”. Similarly, `Dict[str, int]` means “a dictionary with string keys and integer values”, and `Tuple[float, float]` means “a tuple containing exactly two floats”. This specificity is what makes type checking powerful — Mypy can now verify that you’re not accidentally passing a list of strings to a function expecting a list of integers.

Optional and Union Types

Sometimes a function might return either a value or `None`, or it might accept multiple different types. Python provides `Optional` and `Union` for these scenarios. `Optional[T]` is shorthand for “either a value of type T or None”, while `Union` lets you specify multiple possible types.

# file: optional_union.py
from typing import Optional, Union

# Function that might return None
def find_user_age(name: str, users: dict) -> Optional[int]:
    if name in users:
        return users[name]["age"]
    return None

# Function that accepts multiple types
def process_value(value: Union[int, str]) -> str:
    if isinstance(value, int):
        return f"Number: {value * 2}"
    else:
        return f"Text: {value.upper()}"

# Dictionary of user data
users_db = {
    "Alice": {"age": 30},
    "Bob": {"age": 25}
}

# Using Optional
age = find_user_age("Alice", users_db)
print(f"Alice's age: {age}")

missing_age = find_user_age("Charlie", users_db)
print(f"Charlie's age: {missing_age}")

# Using Union
result1 = process_value(10)
result2 = process_value("hello")
print(result1)
print(result2)

Output:

Alice's age: 30
Charlie's age: None
Number: 20
Text: HELLO

The `Optional` type is essential in Python because None is a valid value in many scenarios. By marking a return type as `Optional[int]`, you’re telling callers “this function might return an integer or None, and you should handle both cases”. This prevents a whole class of bugs where code forgets to check for None before using a value.

When you guess the type and Python disagrees at runtime.
When you guess the type and Python disagrees at runtime.

The Typing Module: List, Dict, Tuple, and More

We briefly introduced `List`, `Dict`, and `Tuple` from the typing module. Let’s explore more capabilities and understand when to use them. In Python 3.9+, you can actually use built-in `list`, `dict`, and `tuple` directly for type hints, but the `typing` module versions work in all Python versions and provide more features.

# file: typing_module.py
from typing import List, Dict, Set, Tuple, Any

# Specific typed collections
numbers: List[int] = [1, 2, 3, 4, 5]
name_ages: Dict[str, int] = {"Alice": 30, "Bob": 25}
unique_tags: Set[str] = {"python", "tutorial", "typing"}
coordinates: Tuple[int, int, int] = (10, 20, 30)

# Using Any when type is truly unknown (use sparingly!)
# This disables type checking for this variable
unknown_value: Any = "could be anything"

# Function with complex typing
def process_data(
    items: List[Dict[str, Any]],
    filters: Set[str]
) -> List[str]:
    results = []
    for item in items:
        if item.get("type") in filters:
            results.append(item.get("name", "Unknown"))
    return results

# Sample data
data = [
    {"name": "Alice", "type": "user"},
    {"name": "Bob", "type": "admin"},
    {"name": "Document", "type": "file"}
]

# Using the function
filtered = process_data(data, {"user", "admin"})
print(f"Filtered results: {filtered}")

Output:

Filtered results: ['Alice', 'Bob']

The `Any` type is a special case that essentially says “this can be any type” and disables type checking for that variable. Use `Any` sparingly — it defeats the purpose of type hints. It’s useful for truly dynamic situations or when working with third-party code you can’t control, but typed alternatives are almost always better.

Function Annotations: Parameters and Returns

Function annotations are where type hints shine. By annotating parameters and return types, you create a contract that documents what a function expects and what it produces. This makes functions self-documenting and enables powerful static analysis.

# file: function_annotations.py
from typing import List, Optional

def calculate_average(scores: List[float]) -> float:
    """Calculate the average of a list of scores."""
    if not scores:
        return 0.0
    return sum(scores) / len(scores)

def find_maximum(numbers: List[int]) -> Optional[int]:
    """Return the maximum number, or None if list is empty."""
    return max(numbers) if numbers else None

def format_report(
    title: str,
    items: List[str],
    show_count: bool = True
) -> str:
    """Format items into a report string."""
    report = f"=== {title} ===\n"
    for item in items:
        report += f"- {item}\n"
    if show_count:
        report += f"Total: {len(items)}"
    return report

# Using the functions
test_scores = [85.5, 90.0, 78.5, 92.0]
average = calculate_average(test_scores)
print(f"Average score: {average}")

numbers = [45, 23, 89, 12, 56]
max_num = find_maximum(numbers)
print(f"Maximum number: {max_num}")

report = format_report("Tasks", ["Write code", "Review PR", "Deploy"], show_count=True)
print(report)

Output:

Average score: 86.5
Maximum number: 89
=== Tasks ===
- Write code
- Review PR
- Deploy
Total: 3

Notice that we’re using `Optional[int]` for functions that might return None, and `List[float]` for functions that accept collections. Default parameter values (like `show_count: bool = True`) work naturally with type hints — the type annotation comes before the equals sign.

Class Annotations and Instance Variables

Type hints work wonderfully with classes. You can annotate instance variables, method parameters, and return types. This makes class structure clear and helps catch errors when using class instances.

# file: class_annotations.py
from typing import List, Optional
from datetime import datetime

class Person:
    """Represents a person with type-annotated attributes."""

    # Class-level type annotations
    name: str
    age: int
    email: Optional[str]

    def __init__(self, name: str, age: int, email: Optional[str] = None) -> None:
        self.name = name
        self.age = age
        self.email = email

    def get_info(self) -> str:
        """Return a formatted string with person information."""
        return f"{self.name} ({self.age} years old)"

    def is_adult(self) -> bool:
        """Check if the person is an adult."""
        return self.age >= 18

class Team:
    """Represents a team of people."""

    name: str
    members: List[Person]

    def __init__(self, name: str) -> None:
        self.name = name
        self.members = []

    def add_member(self, person: Person) -> None:
        """Add a person to the team."""
        self.members.append(person)

    def get_adult_members(self) -> List[Person]:
        """Return only adult team members."""
        return [m for m in self.members if m.is_adult()]

    def member_count(self) -> int:
        """Return the number of team members."""
        return len(self.members)

# Using the classes
alice = Person("Alice", 30, "alice@example.com")
bob = Person("Bob", 17)
charlie = Person("Charlie", 25, "charlie@example.com")

team = Team("Development")
team.add_member(alice)
team.add_member(bob)
team.add_member(charlie)

print(f"Team: {team.name}")
print(f"Total members: {team.member_count()}")
print(f"Adults: {len(team.get_adult_members())}")
for member in team.members:
    print(f"  - {member.get_info()}")

Output:

Team: Development
Total members: 3
Adults: 2
  - Alice (30 years old)
  - Bob (17 years old)
  - Charlie (25 years old)

Class annotations make the structure of your objects immediately clear. Anyone reading this code knows exactly what attributes a `Person` has and what types they hold. The `__init__` method also has type hints showing what it expects and that it returns `None` (all constructors return None since they don’t return anything explicitly).

Generics Basics: Writing Flexible Type-Safe Code

Generics allow you to write functions and classes that work with multiple types while maintaining type safety. Instead of using `Any`, you can use type variables to specify “this function works with lists of any type, but the type must be consistent”.

# file: generics_example.py
from typing import TypeVar, List, Generic

# Define a type variable -- T is a placeholder for any type
T = TypeVar('T')

# Generic function that works with any type
def get_first(items: List[T]) -> T:
    """Return the first item from a list."""
    if not items:
        raise ValueError("List is empty")
    return items[0]

def reverse_list(items: List[T]) -> List[T]:
    """Return a reversed copy of the list."""
    return items[::-1]

# Generic class
class Container(Generic[T]):
    """A simple container that holds one item of any type."""

    def __init__(self, item: T) -> None:
        self.item = item

    def get_item(self) -> T:
        return self.item

    def set_item(self, item: T) -> None:
        self.item = item

# Using generic functions
int_list = [10, 20, 30, 40]
str_list = ["apple", "banana", "cherry"]

first_int = get_first(int_list)  # Type checker knows this is int
first_str = get_first(str_list)  # Type checker knows this is str

print(f"First int: {first_int}")
print(f"First string: {first_str}")
print(f"Reversed ints: {reverse_list(int_list)}")

# Using generic class
int_container = Container(42)
str_container = Container("hello")

print(f"Int container: {int_container.get_item()}")
print(f"String container: {str_container.get_item()}")

Output:

First int: 10
First string: apple
Reversed ints: [40, 30, 20, 10]
Int container: 42
String container: hello

Generics are powerful because they preserve type information. When you call `get_first(int_list)`, type checkers understand that the return value is an `int`, not just some unknown `T`. This is much safer than using `Any` and provides excellent IDE support — your editor can offer correct autocompletion based on the actual type.

Installing and Running Mypy

Mypy is a static type checker for Python that analyzes your code without running it. Installation is straightforward using pip, and running it is even simpler. Let’s set up Mypy and check our type hints.

First, install Mypy using pip:

# file: terminal
pip install mypy

Once installed, you can check a single file or an entire directory. Create a test file with some intentional type errors to see how Mypy catches them:

# file: mypy_test.py
def add_numbers(a: int, b: int) -> int:
    return a + b

# This is correct
result1 = add_numbers(5, 10)
print(result1)

# This will cause a Mypy error
result2 = add_numbers("5", 10)
print(result2)

Now run Mypy on this file:

# file: terminal
mypy mypy_test.py

Mypy will output something like:

mypy_test.py:8: error: Argument 1 to "add_numbers" has incompatible type "str"; expected "int"

This error tells you exactly where the problem is — line 8, argument 1 of the `add_numbers` call. Even though the code would run fine if `add_numbers` could handle string input, Mypy caught the type mismatch before you ran the code. For larger projects, you can run mypy on an entire directory:

# file: terminal
mypy your_project/

You can also configure Mypy’s strictness using a `mypy.ini` file. A basic configuration might look like:

# file: mypy.ini
[mypy]
python_version = 3.9
warn_return_any = True
warn_unused_configs = True
disallow_untyped_defs = True

The `disallow_untyped_defs = True` option enforces that every function must have type hints. This is strict but catches a lot of bugs in larger codebases.

Common Mypy Errors and How to Fix Them

Let’s explore the most common type errors you’ll encounter with Mypy and how to fix them. Understanding these patterns will help you write type-safe code quickly.

Error: Incompatible Type Assignment

This is the most common error — you’re assigning a value of the wrong type to a variable:

# file: error_incompatible.py
# WRONG: str assigned to int variable
count: int = "five"

# CORRECT: assign actual integer
count: int = 5

# WRONG: list of strings assigned to list of ints
numbers: list[int] = ["1", "2", "3"]

# CORRECT: list of integers
numbers: list[int] = [1, 2, 3]

Fix: Make sure the value type matches the annotated type. Convert types if needed:

# file: fix_incompatible.py
# Convert before assigning
count: int = int("five")  # Could raise ValueError, but type is correct
numbers: list[int] = [int(x) for x in ["1", "2", "3"]]

Error: Missing Return Type

When a function doesn’t explicitly return a value on all code paths, Mypy complains:

# file: error_missing_return.py
def check_age(age: int) -> str:
    if age >= 18:
        return "Adult"
    # Missing return statement -- what if age < 18?

# CORRECT:
def check_age(age: int) -> str:
    if age >= 18:
        return "Adult"
    else:
        return "Minor"

Fix: Ensure all code paths return a value, or change the return type to `Optional[str]` if None is acceptable:

# file: fix_missing_return.py
from typing import Optional

def check_age(age: int) -> Optional[str]:
    if age >= 18:
        return "Adult"
    return None  # Explicitly return None

Error: Argument Has Incompatible Type

You’re passing the wrong type to a function:

# file: error_argument.py
def process_list(items: list[int]) -> int:
    return sum(items)

# WRONG: passing list of strings
result = process_list(["1", "2", "3"])

# CORRECT: convert to integers first
result = process_list([1, 2, 3])

Fix: Convert the argument to the correct type or check your function call:

# file: fix_argument.py
def process_list(items: list[int]) -> int:
    return sum(items)

# Convert strings to ints
result = process_list([int(x) for x in ["1", "2", "3"]])
print(result)  # Output: 6

Error: Item Is None (Need Optional Check)

You’re accessing an attribute on something that might be None:

# file: error_none_access.py
from typing import Optional

def get_name(person: Optional[dict]) -> str:
    return person["name"]  # person could be None!

# CORRECT:
def get_name(person: Optional[dict]) -> Optional[str]:
    if person is not None:
        return person.get("name")
    return None

Fix: Always check for None before using Optional values:

# file: fix_none_access.py
from typing import Optional

def get_name(person: Optional[dict]) -> str:
    if person is None:
        return "Unknown"
    return person.get("name", "Unknown")

# Test it
result = get_name(None)
print(result)  # Output: Unknown

Error: Return Type Mismatch

Your function returns a different type than what’s annotated:

# file: error_return_mismatch.py
def calculate(value: int) -> str:
    # WRONG: returning int instead of str
    return value * 2

# CORRECT:
def calculate(value: int) -> str:
    return str(value * 2)

Fix: Ensure your return statement returns the correct type, or change the annotation:

# file: fix_return_mismatch.py
def calculate(value: int) -> str:
    result = value * 2
    return str(result)

print(calculate(5))  # Output: "10"

Real-Life Example: Type-Safe Contact Manager

Let’s bring everything together with a practical example — a contact manager application with full type hints. This demonstrates how type hints make complex code safe and maintainable:

# file: contact_manager.py
from typing import List, Optional, Dict
from datetime import datetime

class Contact:
    """Represents a single contact with full type annotations."""

    name: str
    email: str
    phone: Optional[str]
    created_at: datetime

    def __init__(self, name: str, email: str, phone: Optional[str] = None) -> None:
        if not name or not email:
            raise ValueError("Name and email are required")
        self.name = name
        self.email = email
        self.phone = phone
        self.created_at = datetime.now()

    def get_display_name(self) -> str:
        """Return formatted contact name."""
        return self.name.upper()

    def has_phone(self) -> bool:
        """Check if contact has phone number."""
        return self.phone is not None

class ContactManager:
    """Manages a collection of contacts."""

    contacts: List[Contact]

    def __init__(self) -> None:
        self.contacts = []

    def add_contact(self, contact: Contact) -> None:
        """Add a new contact to the manager."""
        self.contacts.append(contact)

    def find_by_email(self, email: str) -> Optional[Contact]:
        """Find a contact by email address."""
        for contact in self.contacts:
            if contact.email == email:
                return contact
        return None

    def find_by_name(self, name: str) -> List[Contact]:
        """Find all contacts matching a name (partial match)."""
        return [c for c in self.contacts if name.lower() in c.name.lower()]

    def get_all_with_phone(self) -> List[Contact]:
        """Return contacts that have phone numbers."""
        return [c for c in self.contacts if c.has_phone()]

    def get_contact_summary(self) -> Dict[str, int]:
        """Return summary statistics about contacts."""
        return {
            "total": len(self.contacts),
            "with_phone": len(self.get_all_with_phone()),
            "without_phone": len(self.contacts) - len(self.get_all_with_phone())
        }

    def export_emails(self) -> List[str]:
        """Export all contact emails."""
        return [c.email for c in self.contacts]

# Using the contact manager
manager = ContactManager()

# Add contacts
alice = Contact("Alice Johnson", "alice@example.com", "+1-555-0101")
bob = Contact("Bob Smith", "bob@example.com")
charlie = Contact("Charlie Brown", "charlie@example.com", "+1-555-0103")

manager.add_contact(alice)
manager.add_contact(bob)
manager.add_contact(charlie)

# Query contacts
print(f"Total contacts: {len(manager.contacts)}")
print(f"Contacts with phones: {len(manager.get_all_with_phone())}")

found = manager.find_by_email("alice@example.com")
if found:
    print(f"Found: {found.name} ({found.phone})")

johns = manager.find_by_name("john")
print(f"Contacts named 'john': {len(johns)}")

summary = manager.get_contact_summary()
print(f"Summary: {summary}")

emails = manager.export_emails()
print(f"All emails: {emails}")

Output:

Total contacts: 3
Contacts with phones: 2
Found: Alice Johnson (+1-555-0101)
Contacts named 'john': 1
Summary: {'total': 3, 'with_phone': 2, 'without_phone': 1}
All emails: ['alice@example.com', 'bob@example.com', 'charlie@example.com']

This contact manager demonstrates several key principles: every method has clear type annotations, return types are explicit (including `Optional` and `List`), class attributes are type-annotated, and the code is self-documenting. If you run Mypy on this file, it will validate that every function returns the correct type and every variable receives compatible values. This gives you confidence that the code works as intended without having to manually trace through every function call.

Best Practices for Type Hints

Now that you understand the mechanics of type hints, here are some best practices to follow in your projects. First, be consistent — if you use type hints in one file, use them throughout your project. Second, use the most specific type possible; don’t settle for `Any` when `List[int]` would work. Third, use type hints in all public functions but you can be more relaxed with private helper functions. Fourth, combine docstrings with type hints; while type hints show what types a function expects, docstrings explain what it does.

Another best practice is to use `Optional` only when None is truly an acceptable value. If a function should always return a string, don’t use `Optional[str]` just to be safe. Fifth, keep your types as simple as possible — deeply nested types like `Dict[str, List[Tuple[int, Optional[str]]]]` become hard to read. Consider breaking these into type aliases or separate functions. Finally, use tools like Mypy and pylint in your CI/CD pipeline to catch type errors automatically before code is merged.

Frequently Asked Questions

Do type hints affect performance or runtime behavior?

No, type hints are completely ignored at runtime. Python’s interpreter removes them during compilation, so they have zero impact on how fast or slow your code runs. Type hints exist purely for documentation and static analysis by tools like Mypy.

Can I use type hints with older Python versions?

Type hints were introduced in Python 3.5, so any Python 3.5+ supports basic type hints. However, some advanced features like union using the pipe operator (`int | str`) require Python 3.10+. For maximum compatibility, use the `typing` module imports like `Union[int, str]`.

What’s the difference between `List` from typing and built-in `list`?

In Python 3.9+, you can use built-in `list[int]` instead of `typing.List[int]`. They’re equivalent, but the built-in versions are preferred in newer code. The typing module versions work in older Python versions, so use those if you need to support Python 3.8 and earlier.

How strict should I be with type hints?

Start with type hints on all public functions and class methods. As your codebase grows and you become comfortable with types, increase strictness. Mypy has a `disallow_untyped_defs` option that enforces types everywhere, but it’s strict and requires more discipline. Find a balance that works for your team.

Can I type hint dictionaries with multiple value types?

Yes, use `Dict[str, Union[int, str]]` to indicate a dictionary with string keys and values that can be either int or str. You can also use `Any` if values are truly unknown, but try to be more specific when possible.

Should I use type hints in scripts and small projects?

Even small projects benefit from type hints, especially if you’ll return to them later or share them with others. Type hints serve as documentation and help you catch bugs. The investment in adding them pays off quickly.

Conclusion

Type hints are a powerful tool for writing safer, more maintainable Python code. They transform Python from a language where type errors hide until runtime into one where you catch them during development. Combined with Mypy, type hints let you refactor code with confidence, understand complex codebases faster, and collaborate more effectively with teammates.

The journey to type-safe Python starts simple with basic annotations and grows as your codebase becomes more complex. Begin by adding type hints to your public functions, run Mypy regularly, and gradually increase your type coverage. The investment in type hints pays dividends in code quality and developer productivity.

To learn more, check out the official Python typing module documentation and the Mypy documentation. Both resources provide comprehensive references and advanced patterns for type hints.

Related Python Tutorials

Continue learning with these related guides:

How To Write Unit Tests with pytest in Python

How To Write Unit Tests with pytest in Python

Beginner

Introduction

Writing unit tests is one of the most important practices in modern software development, yet many beginners skip it thinking it slows them down. The truth is the opposite — testing saves time by catching bugs early, making refactoring safer, and helping you understand your own code better. In this guide, you will learn how pytest makes testing so simple that you will actually enjoy writing tests.

If you are worried that testing is complex or requires special knowledge, put that fear to rest. pytest is designed to be intuitive and beginner-friendly. You will write test functions that look almost identical to regular Python functions, using plain assertions instead of cryptic methods. No need to memorize a dozen different assertion types or inherit from test base classes.

In this tutorial, we will start with a quick working example so you see testing in action immediately. Then we will explore what pytest is, install it, write various types of tests, and work through a complete real-world example. By the end, you will understand how to test your Python code effectively and confidently.

Quick Example

Let us jump straight into a working pytest test. This minimal example shows how simple testing can be:

# test_quick.py
def add(a, b):
    return a + b

def test_add_positive_numbers():
    assert add(2, 3) == 5

def test_add_negative_numbers():
    assert add(-1, -2) == -3

def test_add_mixed():
    assert add(5, -2) == 3

Output:

$ pytest test_quick.py
============================= test session starts ==============================
collected 3 items

test_quick.py::test_add_positive_numbers PASSED                        [ 33%]
test_quick.py::test_add_negative_numbers PASSED                        [ 66%]
test_quick.py::test_add_mixed PASSED                                   [100%]

============================== 3 passed in 0.02s ===============================

That is it! Three tests passing. Notice there is no special TestCase class to inherit from, no setUp methods, and no assertEquals calls. Just plain functions and simple assertions. This simplicity is what makes pytest so powerful.

Python pytest tests passing with green checkmarks
assert expected == reality. For once, it passed.

What is pytest and Why Use It?

pytest is a testing framework that makes writing and running tests in Python delightfully simple. It is now the de facto standard for Python testing, used by companies like Mozilla, Stripe, and countless open-source projects. pytest shines because it reduces boilerplate, makes test discovery automatic, and provides powerful features like fixtures and parametrization built in.

Python comes with a built-in testing module called unittest, which is powerful but verbose. It requires you to create classes, inherit from TestCase, and use assertion methods like assertEqual. By contrast, pytest uses simple functions and the plain assert statement. Here is a comparison:

Featureunittestpytest
Test discoveryRequires naming conventionAutomatic (test_* or Test*)
AssertionsassertEqual, assertTrue, etc.Plain assert statement
Class requirementMust inherit from TestCaseSimple functions
Setup/teardownsetUp/tearDown methodsFixtures (more flexible)
ParametrizationUse subTest or external tools@pytest.mark.parametrize
Learning curveModerateGentle

For beginners, pytest removes friction. You write less boilerplate, learn fewer concepts, and get productive faster. For experienced developers, pytest provides industrial-strength capabilities through fixtures, parametrization, and its plugin ecosystem.

Installing pytest

Before we write any tests, we need to install pytest. Open your terminal and run:

# install_pytest.sh
pip install pytest

Output:

Collecting pytest
  Downloading pytest-7.4.0-py3-none-any.whl (298 kB)
Successfully installed pytest-7.4.0

Verify the installation:

# verify.sh
pytest --version

Output:

pytest 7.4.0

You are ready to start writing tests. If you are using a virtual environment (recommended), activate it before running pip install.

Installing pytest tools for Python testing
pip install pytest — the only setup you will ever need.

Writing Basic Tests

Test files in pytest must be named test_*.py or *_test.py so pytest can discover them automatically. A basic test is simply a function starting with test_ that uses assert statements:

# test_calculator.py
def multiply(a, b):
    return a * b

def test_multiply_basic():
    result = multiply(3, 4)
    assert result == 12

def test_multiply_by_zero():
    result = multiply(5, 0)
    assert result == 0

def test_multiply_negatives():
    result = multiply(-2, -3)
    assert result == 6

Output:

$ pytest test_calculator.py -v
======================== test session starts ==========================
collected 3 items

test_calculator.py::test_multiply_basic PASSED                  [ 33%]
test_calculator.py::test_multiply_by_zero PASSED                [ 66%]
test_calculator.py::test_multiply_negatives PASSED              [100%]

======================== 3 passed in 0.01s ===========================

The -v flag shows verbose output with each test listed individually. Each test is independent — they run in any order and share no state.

Mastering Assertions

The assert statement is the heart of testing. Here are the most common patterns:

# test_assertions.py
def test_equality():
    assert 5 == 5
    assert "hello" == "hello"
    assert [1, 2, 3] == [1, 2, 3]

def test_truthiness():
    assert True
    assert not False
    assert [1, 2, 3]  # non-empty list is truthy
    assert not []  # empty list is falsy

def test_membership():
    assert 2 in [1, 2, 3]
    assert "key" in {"key": "value"}

def test_type_checking():
    assert isinstance(5, int)
    assert isinstance("hello", str)

Output:

$ pytest test_assertions.py -v
======================== test session starts ==========================
test_assertions.py::test_equality PASSED                        [ 25%]
test_assertions.py::test_truthiness PASSED                      [ 50%]
test_assertions.py::test_membership PASSED                      [ 75%]
test_assertions.py::test_type_checking PASSED                   [100%]

======================== 4 passed in 0.01s ===========================

When an assertion fails, pytest provides detailed error messages showing exactly what went wrong, including the values on both sides of the comparison.

Comparing expected and actual values in pytest assertions
assert actual == expected. The debugger’s mantra.

Using Fixtures

Fixtures are reusable pieces of test setup. Instead of repeating setup code in every test, define it once and inject it where needed. Think of fixtures as the pytest way of doing setup and teardown:

# test_database.py
import pytest

class Database:
    def __init__(self):
        self.connected = False
        self.data = {}

    def connect(self):
        self.connected = True

    def disconnect(self):
        self.connected = False

    def store(self, key, value):
        if not self.connected:
            raise RuntimeError("Not connected")
        self.data[key] = value

    def retrieve(self, key):
        if not self.connected:
            raise RuntimeError("Not connected")
        return self.data.get(key)

@pytest.fixture
def db():
    database = Database()
    database.connect()
    yield database  # code after yield runs as teardown
    database.disconnect()

def test_store_and_retrieve(db):
    db.store("name", "Alice")
    assert db.retrieve("name") == "Alice"

def test_store_overwrites(db):
    db.store("age", 25)
    db.store("age", 26)
    assert db.retrieve("age") == 26

def test_retrieve_nonexistent(db):
    assert db.retrieve("missing") is None

Output:

$ pytest test_database.py -v
======================== test session starts ==========================
test_database.py::test_store_and_retrieve PASSED               [ 33%]
test_database.py::test_store_overwrites PASSED                 [ 66%]
test_database.py::test_retrieve_nonexistent PASSED             [100%]

======================== 3 passed in 0.01s ===========================

The fixture uses yield instead of return. Code before yield runs before the test (setup), code after yield runs after (teardown). Each test gets a fresh database connection, so tests never interfere with each other.

Parametrized Tests

Parametrization runs the same test with different input values — DRY in action:

# test_parametrize.py
import pytest

def is_even(num):
    return num % 2 == 0

@pytest.mark.parametrize("number,expected", [
    (2, True),
    (4, True),
    (1, False),
    (3, False),
    (0, True),
    (-2, True),
])
def test_is_even(number, expected):
    assert is_even(number) == expected

Output:

$ pytest test_parametrize.py -v
======================== test session starts ==========================
test_parametrize.py::test_is_even[2-True] PASSED              [ 16%]
test_parametrize.py::test_is_even[4-True] PASSED              [ 33%]
test_parametrize.py::test_is_even[1-False] PASSED             [ 50%]
test_parametrize.py::test_is_even[3-False] PASSED             [ 66%]
test_parametrize.py::test_is_even[0-True] PASSED              [ 83%]
test_parametrize.py::test_is_even[-2-True] PASSED             [100%]

======================== 6 passed in 0.01s ===========================

pytest creates one test per parameter set and labels each one, making it easy to identify which specific input caused a failure.

Parametrized testing with multiple test cases in pytest
Six test cases, one function. @pytest.mark.parametrize does the heavy lifting.

Testing Exceptions

Sometimes correct behavior means raising an exception. Use pytest.raises to verify exceptions:

# test_exceptions.py
import pytest

def validate_age(age):
    if not isinstance(age, int):
        raise TypeError("Age must be an integer")
    if age < 0:
        raise ValueError("Age cannot be negative")
    if age > 150:
        raise ValueError("Age must be realistic")
    return True

def test_valid_age():
    assert validate_age(25) is True

def test_negative_age():
    with pytest.raises(ValueError, match="cannot be negative"):
        validate_age(-5)

def test_invalid_type():
    with pytest.raises(TypeError, match="must be an integer"):
        validate_age("twenty-five")

Output:

$ pytest test_exceptions.py -v
======================== test session starts ==========================
test_exceptions.py::test_valid_age PASSED                      [ 33%]
test_exceptions.py::test_negative_age PASSED                   [ 66%]
test_exceptions.py::test_invalid_type PASSED                   [100%]

======================== 3 passed in 0.01s ===========================

The match parameter verifies the exception message using regex, ensuring not just the right type but the right message is raised.

Mocking Basics

Mocking replaces real dependencies with controlled fakes so you can test in isolation:

# test_mocking.py
from unittest.mock import Mock, patch

def fetch_user(user_id):
    import requests
    response = requests.get(f"https://jsonplaceholder.typicode.com/users/{user_id}")
    return response.json()

def test_fetch_user_with_mock():
    with patch("requests.get") as mock_get:
        mock_response = Mock()
        mock_response.json.return_value = {"id": 1, "name": "Alice"}
        mock_get.return_value = mock_response

        result = fetch_user(1)

        assert result["name"] == "Alice"
        mock_get.assert_called_once_with(
            "https://jsonplaceholder.typicode.com/users/1"
        )

Output:

$ pytest test_mocking.py -v
======================== test session starts ==========================
test_mocking.py::test_fetch_user_with_mock PASSED             [100%]

======================== 1 passed in 0.01s ===========================

The patch context manager replaces the real requests.get with a mock. The mock tracks how it was called and what it returns, letting you test API-dependent code without network requests.

Mocking dependencies in Python unit tests
The API is down. The tests still pass. Thank unittest.mock.
pytest fixtures: dependency injection for tests.
pytest fixtures: dependency injection for tests.

Real-Life Example: Testing a Shopping Cart

Here is a complete shopping cart with comprehensive tests demonstrating fixtures, parametrization, and exception testing together:

# shopping_cart.py
class Product:
    def __init__(self, name, price):
        self.name = name
        self.price = price

class ShoppingCart:
    def __init__(self):
        self.items = []

    def add_item(self, product, quantity=1):
        if quantity <= 0:
            raise ValueError("Quantity must be positive")
        self.items.append({"product": product, "quantity": quantity})

    def remove_item(self, product_name):
        self.items = [i for i in self.items if i["product"].name != product_name]

    def get_total(self):
        return sum(i["product"].price * i["quantity"] for i in self.items)

    def apply_discount(self, percent):
        if percent < 0 or percent > 100:
            raise ValueError("Discount must be between 0 and 100")
        return self.get_total() * (1 - percent / 100)

    def is_empty(self):
        return len(self.items) == 0
# test_shopping_cart.py
import pytest
from shopping_cart import Product, ShoppingCart

@pytest.fixture
def cart():
    return ShoppingCart()

@pytest.fixture
def laptop():
    return Product("Laptop", 999.99)

@pytest.fixture
def mouse():
    return Product("Mouse", 29.99)

def test_cart_starts_empty(cart):
    assert cart.is_empty()
    assert cart.get_total() == 0

def test_add_single_item(cart, laptop):
    cart.add_item(laptop)
    assert not cart.is_empty()
    assert cart.get_total() == 999.99

def test_add_multiple_items(cart, laptop, mouse):
    cart.add_item(laptop)
    cart.add_item(mouse)
    assert cart.get_total() == 1029.98

@pytest.mark.parametrize("quantity", [0, -1, -10])
def test_invalid_quantity(cart, laptop, quantity):
    with pytest.raises(ValueError, match="must be positive"):
        cart.add_item(laptop, quantity=quantity)

def test_remove_item(cart, laptop, mouse):
    cart.add_item(laptop)
    cart.add_item(mouse)
    cart.remove_item("Mouse")
    assert cart.get_total() == 999.99

@pytest.mark.parametrize("discount,expected", [
    (10, 900), (50, 500), (100, 0),
])
def test_apply_discount(cart, discount, expected):
    cart.add_item(Product("Item", 1000))
    assert cart.apply_discount(discount) == pytest.approx(expected)

Output:

$ pytest test_shopping_cart.py -v
======================== test session starts ==========================
test_shopping_cart.py::test_cart_starts_empty PASSED           [  8%]
test_shopping_cart.py::test_add_single_item PASSED             [ 16%]
test_shopping_cart.py::test_add_multiple_items PASSED          [ 25%]
test_shopping_cart.py::test_invalid_quantity[0] PASSED         [ 33%]
test_shopping_cart.py::test_invalid_quantity[-1] PASSED        [ 41%]
test_shopping_cart.py::test_invalid_quantity[-10] PASSED       [ 50%]
test_shopping_cart.py::test_remove_item PASSED                 [ 58%]
test_shopping_cart.py::test_apply_discount[10-900] PASSED     [ 66%]
test_shopping_cart.py::test_apply_discount[50-500] PASSED     [ 75%]
test_shopping_cart.py::test_apply_discount[100-0] PASSED      [ 83%]

======================== 10 passed in 0.02s ===========================

This example ties together everything: fixtures for reusable setup, parametrization for multiple cases, and exception testing for error handling. Notice we test both happy paths and error paths.

Frequently Asked Questions

How do I run a single test file?

Use pytest followed by the filename: pytest test_calculator.py. To run a specific function: pytest test_calculator.py::test_multiply_basic.

How do I run tests matching a pattern?

Use the -k flag: pytest -k "multiply". This runs all tests with “multiply” in their name.

What does the -v flag do?

The -v (verbose) flag shows each test individually. Use -vv for even more detail including assertion introspection.

Can I stop on the first failure?

Yes, use pytest -x. This stops as soon as one test fails, useful for quick feedback during development.

How do I see print output from my tests?

By default pytest captures print statements. Use pytest -s to show all output during test execution.

What is the difference between a fixture and a helper function?

Fixtures are managed by pytest and support setup/teardown via yield. Helper functions are regular Python functions. Use fixtures for shared setup, helpers for reusable test logic.

How do I test async functions?

Install pytest-asyncio (pip install pytest-asyncio), then mark tests with @pytest.mark.asyncio and use async def.

Conclusion

You now have a solid foundation in pytest. You understand how to write tests with assertions, use fixtures for setup and teardown, parametrize tests for multiple cases, verify exceptions are raised correctly, and mock external dependencies. More importantly, you understand that testing does not have to be complicated.

Start by writing tests for new code, then gradually add tests to existing code. For more advanced topics, visit the official pytest documentation at https://docs.pytest.org/.

Fixtures: The Killer Feature

Fixtures are pytest’s way of providing setup data to tests. Declare them once, request them as function parameters — pytest wires up dependencies automatically:

# File: conftest.py — fixtures available to all tests in this directory
import pytest

@pytest.fixture
def sample_user():
    return {"id": 1, "name": "Alice", "age": 30}

@pytest.fixture
def db_session():
    session = create_test_session()
    yield session
    session.rollback()
    session.close()

# File: test_users.py
def test_get_user(sample_user):
    assert sample_user["name"] == "Alice"

def test_save_user(db_session, sample_user):
    db_session.add(sample_user)
    db_session.flush()
    assert db_session.get_user(1) is not None

The yield pattern separates setup from teardown. Code before yield runs before the test; code after runs after — even if the test fails.

Fixture Scopes

By default fixtures rebuild for every test. For expensive resources (database connections, web drivers), set a wider scope:

@pytest.fixture(scope="session")
def engine():
    # Created once for the entire test run
    return create_engine("sqlite:///test.db")

@pytest.fixture(scope="module")
def schema(engine):
    # Created once per test file
    Base.metadata.create_all(engine)
    yield
    Base.metadata.drop_all(engine)

@pytest.fixture(scope="function")  # default — one per test
def fresh_user(engine, schema):
    with engine.connect() as conn:
        conn.execute("INSERT INTO users ...")
        yield {"id": last_id}

Scopes: function (default), class, module, session. Pick the widest that’s still safe for test isolation.

Parametrization

Test the same logic with multiple inputs via @pytest.mark.parametrize:

@pytest.mark.parametrize("input,expected", [
    (1, 1),
    (5, 120),
    (10, 3628800),
    (0, 1),
])
def test_factorial(input, expected):
    assert factorial(input) == expected

# Multiple params combine into a matrix
@pytest.mark.parametrize("payment_method", ["card", "paypal", "bank"])
@pytest.mark.parametrize("amount", [10, 100, 1000])
def test_checkout(payment_method, amount):
    # 9 tests total: 3 methods x 3 amounts
    process_payment(payment_method, amount)

Marks: skip, xfail, slow

Custom marks tag tests for selective running:

import pytest

@pytest.mark.skip(reason="API endpoint not yet deployed")
def test_new_endpoint():
    ...

@pytest.mark.skipif(sys.version_info < (3, 11), reason="Requires 3.11+")
def test_new_feature():
    ...

@pytest.mark.xfail(reason="Known bug, fix in PR #234")
def test_buggy():
    assert broken_thing() == "expected"

@pytest.mark.slow
def test_full_integration():
    ...  # 30 seconds

# Run only fast tests by default:  pytest -m "not slow"
# Run only slow:                    pytest -m slow

Mocking with monkeypatch and mocker

For tests that need to replace dependencies (external APIs, system calls, current time):

def test_api_call(monkeypatch):
    monkeypatch.setenv("API_KEY", "test-key")
    monkeypatch.setattr("mymodule.requests.get", lambda url: FakeResp())
    assert fetch_data() == {"status": "ok"}

# With pytest-mock (more powerful)
def test_with_mocker(mocker):
    mock_send = mocker.patch("mymodule.send_email")
    mock_send.return_value = True

    result = signup_user("alice@example.com")
    mock_send.assert_called_once_with("alice@example.com", "Welcome")

Test Discovery and Configuration

pytest auto-discovers tests in files starting with test_ or ending in _test.py, classes named Test*, and functions named test_*. Configure in pyproject.toml:

# pyproject.toml
[tool.pytest.ini_options]
minversion = "7.0"
testpaths = ["tests"]
python_files = "test_*.py"
python_classes = "Test*"
python_functions = "test_*"
addopts = "-ra --strict-markers --cov=myapp"
markers = [
    "slow: slow tests (deselect with -m 'not slow')",
    "integration: tests that hit real external services",
]

Common Pitfalls

  • Shared state between tests. Module-level globals that tests modify make order-dependent bugs. Use fixtures with function scope to reset state per test.
  • Fixture scope mismatch. A session-scoped fixture that mutates state breaks isolation. Mutable fixtures should be function-scoped.
  • Catching too-broad exceptions. assert exc.value instead of with pytest.raises(SpecificError) swallows bugs.
  • Slow imports in conftest. pytest imports conftest.py before any test runs. Heavy imports there slow every pytest invocation.
  • Ignoring -ra output. The summary section at the end shows skipped, xfailed, and warning details. Read it — it's where flaky tests hide.

FAQ

Q: pytest or unittest?
A: pytest — better fixtures, parametrization, plugin ecosystem, simpler assertion syntax. unittest is fine for tiny projects or if you can't add dependencies.

Q: How do I run only failing tests?
A: pytest --lf (last-failed). Combine with --ff (failed-first) for fast iteration while debugging.

Q: How do I measure code coverage?
A: pip install pytest-cov then pytest --cov=myapp --cov-report=html. The HTML report tells you which lines weren't hit.

Q: How do I test async code?
A: pip install pytest-asyncio. Mark tests with @pytest.mark.asyncio and use async def. Fixtures can be async too.

Q: How do I parallelize tests?
A: pip install pytest-xdist then pytest -n auto uses all CPU cores. Works great for independent unit tests; less great for tests that share databases.

Wrapping Up

pytest's superpower is fixtures + parametrization — together they remove almost all test boilerplate. Add pytest-cov for coverage, pytest-mock for mocking, pytest-xdist for parallelism. The ecosystem is huge (over 1000 plugins) but you usually need only those four to cover 95% of testing needs. Master fixtures first; everything else flows from there.