PostgreSQL is one of the most powerful open-source relational databases, and Python developers interact with it constantly — whether building web APIs, running data pipelines, or managing application state. If you have ever needed to store structured data beyond what SQLite can handle, PostgreSQL is usually the next step.
The good news is that psycopg (version 3, the modern successor to the venerable psycopg2) makes connecting Python to PostgreSQL straightforward and safe. It supports parameterized queries out of the box, handles connection pooling, and works beautifully with async code. You can install it with a single pip install psycopg[binary] command and be running queries in minutes.
In this article, we will cover everything you need to connect Python to PostgreSQL. We will start with a quick example showing a basic connection and query, then explain the psycopg library and why it is the recommended adapter. From there, we will walk through CRUD operations (Create, Read, Update, Delete), parameterized queries for security, connection pooling for performance, error handling patterns, and finish with a complete real-life project that builds a task manager backed by PostgreSQL.
Connecting Python to PostgreSQL: Quick Example
Here is a minimal working example that connects to a PostgreSQL database, creates a table, inserts a row, and reads it back. This gives you the core pattern you will use in every PostgreSQL project.
# quick_example.py
import psycopg
# Connect to PostgreSQL (adjust these for your setup)
conn_string = "host=localhost dbname=testdb user=postgres password=postgres"
with psycopg.connect(conn_string) as conn:
with conn.cursor() as cur:
# Create a simple table
cur.execute("""
CREATE TABLE IF NOT EXISTS greetings (
id SERIAL PRIMARY KEY,
message TEXT NOT NULL
)
""")
# Insert a row
cur.execute("INSERT INTO greetings (message) VALUES (%s)", ("Hello from Python!",))
conn.commit()
# Read it back
cur.execute("SELECT id, message FROM greetings ORDER BY id DESC LIMIT 1")
row = cur.fetchone()
print(f"ID: {row[0]}, Message: {row[1]}")
Output:
ID: 1, Message: Hello from Python!
The key things to notice: we use psycopg.connect() with a connection string, wrap everything in with blocks for automatic cleanup, and use %s placeholders for parameterized queries (never string formatting). The conn.commit() call makes the insert permanent. Want to go deeper? Below we cover connection options, all four CRUD operations, pooling, and a complete project.
One connection string. Infinite queries. Zero SQL injection.
What Is psycopg and Why Use It?
psycopg is the most popular PostgreSQL adapter for Python. Version 3 (just called psycopg) is a complete rewrite of the classic psycopg2 that powered Django, Flask, and countless Python applications for over a decade. The new version brings a cleaner API, native async support, and better type handling while keeping the reliability developers trusted.
Here is how psycopg compares to other options for connecting Python to PostgreSQL:
Feature
psycopg (v3)
psycopg2
asyncpg
Python 3.7+ support
Yes
Yes
Yes
Async support
Built-in
No (needs wrappers)
Async only
Connection pooling
Built-in
Separate package
Built-in
Parameterized queries
%s and named
%s and named
$1, $2 style
COPY support
Excellent
Good
Good
Active development
Yes (recommended)
Maintenance only
Yes
Django/Flask compatible
Yes
Yes
Limited
For most Python developers, psycopg (v3) is the right choice. It handles both sync and async workflows, has excellent documentation, and is officially recommended by the PostgreSQL community. The rest of this article uses psycopg v3 exclusively.
Installing psycopg
The easiest way to install psycopg is with the binary package, which bundles the C library so you do not need PostgreSQL development headers installed:
If you prefer to compile from source (common in production Docker images), install the base package and make sure libpq-dev is available: pip install psycopg[c]. For development and tutorials, the binary option is the fastest path.
Connecting to PostgreSQL
psycopg offers several ways to specify your connection. The most common patterns are a connection string (DSN) and keyword arguments. Both produce identical results — choose whichever reads better in your codebase.
# connection_methods.py
import psycopg
# Method 1: Connection string (DSN)
conn1 = psycopg.connect("host=localhost dbname=myapp user=appuser password=secret")
# Method 2: Keyword arguments
conn2 = psycopg.connect(
host="localhost",
dbname="myapp",
user="appuser",
password="secret",
port=5432
)
# Method 3: PostgreSQL URI format
conn3 = psycopg.connect("postgresql://appuser:secret@localhost:5432/myapp")
# Always use context managers for automatic cleanup
with psycopg.connect("host=localhost dbname=myapp user=appuser password=secret") as conn:
print(f"Connected to: {conn.info.dbname}")
print(f"Server version: {conn.info.server_version}")
conn1.close()
conn2.close()
conn3.close()
Output:
Connected to: myapp
Server version: 160001
The context manager pattern (with psycopg.connect(...) as conn) is strongly recommended. It automatically commits the transaction on success, rolls back on exception, and closes the connection when the block exits. This prevents connection leaks and orphaned transactions — two of the most common PostgreSQL headaches in production.
conn = psycopg.connect() — three seconds to production-ready database access.
CRUD Operations with psycopg
CREATE: Inserting Data
Inserting data uses cursor.execute() with parameterized queries. Always use %s placeholders — never f-strings or string concatenation. Parameterized queries prevent SQL injection and handle type conversion automatically.
# insert_data.py
import psycopg
with psycopg.connect("host=localhost dbname=testdb user=postgres password=postgres") as conn:
with conn.cursor() as cur:
cur.execute("""
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
age INTEGER
)
""")
# Insert a single row with parameterized query
cur.execute(
"INSERT INTO users (name, email, age) VALUES (%s, %s, %s)",
("Alice Chen", "alice@example.com", 28)
)
# Insert multiple rows efficiently with executemany
new_users = [
("Bob Park", "bob@example.com", 34),
("Carol Smith", "carol@example.com", 22),
("Dave Wilson", "dave@example.com", 45),
]
cur.executemany(
"INSERT INTO users (name, email, age) VALUES (%s, %s, %s)",
new_users
)
conn.commit()
print(f"Inserted {1 + len(new_users)} users successfully")
Output:
Inserted 4 users successfully
The executemany() method is cleaner than looping with individual execute() calls, and psycopg optimizes it internally. For truly large batches (thousands of rows), look into cursor.copy() which uses PostgreSQL’s COPY protocol and is dramatically faster.
READ: Querying Data
Reading data involves executing a SELECT query and fetching results. psycopg gives you several fetch options depending on how much data you expect.
# read_data.py
import psycopg
with psycopg.connect("host=localhost dbname=testdb user=postgres password=postgres") as conn:
with conn.cursor() as cur:
# Fetch all rows
cur.execute("SELECT id, name, email, age FROM users ORDER BY name")
all_users = cur.fetchall()
print("All users:")
for user in all_users:
print(f" {user[0]}: {user[1]} ({user[2]}), age {user[3]}")
# Fetch one row
cur.execute("SELECT name, age FROM users WHERE email = %s", ("alice@example.com",))
alice = cur.fetchone()
print(f"\nFound: {alice[0]}, age {alice[1]}")
# Use row factory for named columns (much more readable)
cur = conn.cursor(row_factory=psycopg.rows.dict_row)
cur.execute("SELECT name, email, age FROM users WHERE age > %s", (25,))
older_users = cur.fetchall()
print(f"\nUsers over 25:")
for u in older_users:
print(f" {u['name']}: {u['email']}, age {u['age']}")
Output:
All users:
1: Alice Chen (alice@example.com), age 28
2: Bob Park (bob@example.com), age 34
3: Carol Smith (carol@example.com), age 22
4: Dave Wilson (dave@example.com), age 45
Found: Alice Chen, age 28
Users over 25:
Alice Chen: alice@example.com, age 28
Bob Park: bob@example.com, age 34
Dave Wilson: dave@example.com, age 45
The dict_row row factory is a game-changer for readability. Instead of accessing columns by index (row[0], row[1]), you use names (row['name'], row['email']). This makes your code self-documenting and resilient to column order changes.
UPDATE: Modifying Data
Updates follow the same parameterized pattern. The rowcount attribute tells you how many rows were affected.
# update_data.py
import psycopg
with psycopg.connect("host=localhost dbname=testdb user=postgres password=postgres") as conn:
with conn.cursor() as cur:
# Update a single user
cur.execute(
"UPDATE users SET age = %s WHERE email = %s",
(29, "alice@example.com")
)
print(f"Updated {cur.rowcount} row(s)")
# Update multiple rows with a condition
cur.execute(
"UPDATE users SET age = age + 1 WHERE age < %s",
(30,)
)
print(f"Birthday bump: {cur.rowcount} user(s) aged up")
conn.commit()
Output:
Updated 1 row(s)
Birthday bump: 2 user(s) aged up
Always check cur.rowcount after updates and deletes. If it returns 0 when you expected changes, your WHERE clause might be wrong -- and catching that early saves hours of debugging.
DELETE: Removing Data
Deletes work the same way. Be cautious with DELETE statements -- a missing WHERE clause deletes everything in the table.
# delete_data.py
import psycopg
with psycopg.connect("host=localhost dbname=testdb user=postgres password=postgres") as conn:
with conn.cursor() as cur:
# Delete a specific user
cur.execute(
"DELETE FROM users WHERE email = %s",
("dave@example.com",)
)
print(f"Deleted {cur.rowcount} user(s)")
# Verify the deletion
cur.execute("SELECT COUNT(*) FROM users")
count = cur.fetchone()[0]
print(f"Remaining users: {count}")
conn.commit()
Output:
Deleted 1 user(s)
Remaining users: 3
Four operations, infinite applications. CRUD is the backbone of every database app.
Error Handling
Database operations fail in predictable ways -- duplicate keys, connection drops, malformed queries. psycopg raises specific exception types for each, so you can handle them precisely.
# error_handling.py
import psycopg
from psycopg import errors
conn_string = "host=localhost dbname=testdb user=postgres password=postgres"
try:
with psycopg.connect(conn_string) as conn:
with conn.cursor() as cur:
# This will fail if email already exists (UNIQUE constraint)
cur.execute(
"INSERT INTO users (name, email, age) VALUES (%s, %s, %s)",
("Alice Chen", "alice@example.com", 28)
)
conn.commit()
except errors.UniqueViolation as e:
print(f"Duplicate entry: {e.diag.message_detail}")
except errors.OperationalError as e:
print(f"Connection problem: {e}")
except errors.ProgrammingError as e:
print(f"SQL error: {e}")
except Exception as e:
print(f"Unexpected error: {type(e).__name__}: {e}")
The psycopg.errors module maps every PostgreSQL error code to a Python exception class. UniqueViolation, ForeignKeyViolation, CheckViolation -- they are all there. This lets you show users a friendly "email already taken" message instead of a raw database error.
Connection Pooling
Creating a new database connection for every request is slow (each connection involves a TCP handshake, authentication, and memory allocation on the server). Connection pooling solves this by maintaining a set of open connections that get reused across requests.
# connection_pool.py
from psycopg_pool import ConnectionPool
# Create a pool with min 2, max 10 connections
pool = ConnectionPool(
"host=localhost dbname=testdb user=postgres password=postgres",
min_size=2,
max_size=10
)
# Use connections from the pool
with pool.connection() as conn:
with conn.cursor() as cur:
cur.execute("SELECT COUNT(*) FROM users")
count = cur.fetchone()[0]
print(f"User count: {count}")
# The connection is returned to the pool, not closed
with pool.connection() as conn:
with conn.cursor() as cur:
cur.execute("SELECT name FROM users LIMIT 1")
name = cur.fetchone()[0]
print(f"First user: {name}")
# Get pool stats
stats = pool.get_stats()
print(f"Pool size: {stats['pool_size']}, available: {stats['pool_available']}")
pool.close()
Output:
User count: 3
First user: Alice Chen
Pool size: 2, available: 2
In a web application (Flask, FastAPI, Django), you would create the pool once at startup and share it across all request handlers. This dramatically reduces latency since connections are reused instead of created fresh for every HTTP request. The max_size parameter prevents your application from overwhelming the database with too many simultaneous connections.
One pool, ten connections, a thousand requests. Connection pooling is free performance.
Working with Transactions
By default, psycopg wraps every operation in a transaction. The context manager commits on success and rolls back on failure. But sometimes you need more control -- for example, when multiple operations must succeed or fail together.
# transactions.py
import psycopg
conn_string = "host=localhost dbname=testdb user=postgres password=postgres"
with psycopg.connect(conn_string) as conn:
# Explicit transaction control
try:
with conn.transaction():
with conn.cursor() as cur:
# Both operations must succeed
cur.execute(
"UPDATE users SET age = age - 1 WHERE name = %s",
("Alice Chen",)
)
cur.execute(
"UPDATE users SET age = age + 1 WHERE name = %s",
("Bob Park",)
)
print("Both updates committed together")
except Exception as e:
print(f"Transaction rolled back: {e}")
# Nested savepoints
with conn.transaction() as tx1:
with conn.cursor() as cur:
cur.execute("INSERT INTO users (name, email, age) VALUES (%s, %s, %s)",
("Eve Brown", "eve@example.com", 31))
try:
with conn.transaction() as tx2:
cur.execute("INSERT INTO users (name, email, age) VALUES (%s, %s, %s)",
("Eve Brown", "eve-duplicate@example.com", 31))
# This inner transaction can fail without killing the outer one
except Exception:
print("Inner savepoint rolled back, outer transaction continues")
conn.commit()
print("Eve inserted successfully")
Output:
Both updates committed together
Eve inserted successfully
The conn.transaction() context manager creates a savepoint when nested. This is incredibly useful for "try this, but if it fails, keep going" patterns -- common in data import pipelines where you want to skip bad rows without losing the entire batch.
Real-Life Example: Building a Task Manager CLI
Let us tie everything together with a complete task manager that stores tasks in PostgreSQL. This project uses connection pooling, parameterized queries, error handling, and all four CRUD operations.
A complete CRUD app with pooling and error handling. Not bad for 50 lines.
# task_manager.py
import psycopg
from psycopg_pool import ConnectionPool
from psycopg import errors
from datetime import datetime
DB_URL = "host=localhost dbname=testdb user=postgres password=postgres"
def setup_database(pool):
"""Create the tasks table if it does not exist."""
with pool.connection() as conn:
with conn.cursor() as cur:
cur.execute("""
CREATE TABLE IF NOT EXISTS tasks (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
description TEXT DEFAULT '',
status TEXT DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW(),
completed_at TIMESTAMP
)
""")
conn.commit()
def add_task(pool, title, description=""):
"""Add a new task and return its ID."""
with pool.connection() as conn:
with conn.cursor() as cur:
cur.execute(
"INSERT INTO tasks (title, description) VALUES (%s, %s) RETURNING id",
(title, description)
)
task_id = cur.fetchone()[0]
conn.commit()
return task_id
def list_tasks(pool, status_filter=None):
"""List tasks, optionally filtered by status."""
with pool.connection() as conn:
with conn.cursor(row_factory=psycopg.rows.dict_row) as cur:
if status_filter:
cur.execute(
"SELECT id, title, status, created_at FROM tasks WHERE status = %s ORDER BY created_at",
(status_filter,)
)
else:
cur.execute("SELECT id, title, status, created_at FROM tasks ORDER BY created_at")
return cur.fetchall()
def complete_task(pool, task_id):
"""Mark a task as completed."""
with pool.connection() as conn:
with conn.cursor() as cur:
cur.execute(
"UPDATE tasks SET status = %s, completed_at = %s WHERE id = %s",
("completed", datetime.now(), task_id)
)
conn.commit()
return cur.rowcount > 0
def delete_task(pool, task_id):
"""Delete a task by ID."""
with pool.connection() as conn:
with conn.cursor() as cur:
cur.execute("DELETE FROM tasks WHERE id = %s", (task_id,))
conn.commit()
return cur.rowcount > 0
# Demo usage
pool = ConnectionPool(DB_URL, min_size=2, max_size=5)
setup_database(pool)
# Add some tasks
id1 = add_task(pool, "Learn psycopg", "Complete the PostgreSQL tutorial")
id2 = add_task(pool, "Build REST API", "Create FastAPI endpoints for tasks")
id3 = add_task(pool, "Write tests", "Add pytest coverage for database layer")
print(f"Created tasks: {id1}, {id2}, {id3}")
# List all tasks
print("\nAll tasks:")
for task in list_tasks(pool):
print(f" [{task['status']}] #{task['id']}: {task['title']}")
# Complete a task
complete_task(pool, id1)
print(f"\nCompleted task #{id1}")
# List pending tasks only
print("\nPending tasks:")
for task in list_tasks(pool, "pending"):
print(f" #{task['id']}: {task['title']}")
# Delete a task
delete_task(pool, id3)
print(f"\nDeleted task #{id3}")
# Final count
print(f"\nTotal tasks remaining: {len(list_tasks(pool))}")
pool.close()
Output:
Created tasks: 1, 2, 3
All tasks:
[pending] #1: Learn psycopg
[pending] #2: Build REST API
[pending] #3: Write tests
Completed task #1
Pending tasks:
#2: Build REST API
#3: Write tests
Deleted task #3
Total tasks remaining: 2
This task manager demonstrates every concept from the article: connecting with a pool, parameterized queries for safety, dict_row for readable results, RETURNING clauses for getting generated IDs, and proper transaction handling. You could extend this into a full web application by wrapping these functions in FastAPI or Flask endpoints.
Frequently Asked Questions
Should I use psycopg2 or psycopg (v3)?
For new projects, always use psycopg v3 (installed as pip install psycopg). It has better async support, built-in connection pooling, a cleaner API, and is actively developed. psycopg2 is in maintenance mode -- it still works, but new features and improvements only land in v3. The migration is straightforward since the core concepts (parameterized queries, cursors, context managers) are the same.
How do I prevent SQL injection with psycopg?
Always use parameterized queries with %s placeholders: cur.execute("SELECT * FROM users WHERE id = %s", (user_id,)). Never use f-strings, string concatenation, or format() to build SQL. psycopg handles escaping and type conversion automatically, making injection impossible as long as you use placeholders consistently.
Can I use psycopg with async/await?
Yes. psycopg v3 has a built-in async module: from psycopg import AsyncConnection. Use await AsyncConnection.connect() and await cursor.execute(). It works with asyncio, FastAPI, and any other async framework. The async connection pool is AsyncConnectionPool from psycopg_pool.
How many connections should my pool have?
A good starting point is min_size=2, max_size=10 for small applications. The PostgreSQL documentation suggests a formula: max_connections = (core_count * 2) + effective_spindle_count. In practice, most web applications work well with 10-20 connections in the pool. Monitor your PostgreSQL pg_stat_activity view to see actual connection usage and tune from there.
How do I store database credentials securely?
Never hardcode credentials in your source code. Use environment variables (os.environ['DATABASE_URL']), a .env file loaded with python-dotenv, or a secrets manager (AWS Secrets Manager, HashiCorp Vault). PostgreSQL also supports a ~/.pgpass file for local development. For connection strings, the standard DATABASE_URL environment variable works with most frameworks and deployment platforms.
Conclusion
You now have a solid foundation for connecting Python to PostgreSQL with psycopg. We covered the essential workflow: installing psycopg[binary], establishing connections with context managers, running all four CRUD operations with parameterized queries, handling database errors gracefully, and using connection pooling for production performance. The task manager project ties all these concepts into a practical, extensible application.
From here, try extending the task manager with features like priority levels, due dates, or full-text search using PostgreSQL's tsvector type. Psycopg handles all of these naturally since it passes your SQL through to PostgreSQL without limiting which features you can use.
For the complete API reference and advanced topics like COPY operations, async usage, and custom type adapters, check the official psycopg documentation at www.psycopg.org/psycopg3/docs/.
Imagine you’ve written a Python script that processes data files. Right now, you have the filename hard-coded inside your script. Tomorrow, you need to process a different file. Today, you run it manually every morning and copy-paste results into a spreadsheet. What if your script could accept the filename, output format, and processing options directly from the terminal? Command line arguments turn a rigid script into a flexible tool that integrates seamlessly into automation pipelines, cron jobs, and CI/CD systems.
Good news: Python makes this straightforward. You already have everything you need in the standard library. The sys module gives you raw access to command line arguments via sys.argv, and for more complex tools, the argparse module handles parsing, validation, and automatic help text generation. Both are built-in — no external packages required.
In this article, you’ll learn how to capture and use command line arguments in your Python scripts. We’ll start with sys.argv for simple cases, explore why Python doesn’t have argc, then dive deep into argparse for professional-grade CLI tools. You’ll see how to add required and optional arguments, set defaults, enforce type conversion, create mutually exclusive groups, and build subcommands like git commit and git push. By the end, you’ll build a complete file-processing CLI tool and understand when to reach for third-party libraries like click and typer.
Command Line Arguments in Python: Quick Example
Here’s the fastest way to access command line arguments in Python:
# quick_example.py
import sys
# sys.argv is a list of strings
# sys.argv[0] is the script name
# sys.argv[1:] are the arguments passed to the script
if len(sys.argv) < 2:
print("Usage: python quick_example.py ")
sys.exit(1)
name = sys.argv[1]
print(f"Hello, {name}!")
When you run a Python script from the terminal with arguments, those arguments end up in a list called sys.argv. The first element (index 0) is always the name of your script. The rest are whatever you typed after the script name. This is the foundation for all command line input in Python.
For simple scripts with one or two arguments, this is perfectly fine. But for tools with multiple arguments, flags, and options, you’ll want argparse. Let’s explore what’s actually happening under the hood first.
sys.argv[0] is your script name. sys.argv[1:] is everything else.
What Are Command Line Arguments and Why Use Them?
Command line arguments are values you pass to a program when you run it from the terminal. They’re the text that appears after your program name. For example:
In this case, process_data.py is the script name, input.csv and output.json are positional arguments, and --verbose and --format=json are optional flag arguments.
Command line arguments are essential because they let users control your script without editing code. They make your script reusable, testable, and compatible with automation tools. A script that only processes one hard-coded file is a toy. A script that accepts a file path as an argument is a real tool that others can use in pipelines and cron jobs.
Three types of command line arguments exist:
Type
Example
Purpose
Positional
python script.py input.txt
Required values passed in order (like function arguments)
Optional flags
python script.py --verbose
Boolean switches or named options (prefixed with -- or -)
Subcommands
git commit -m "msg"
Different commands with their own arguments (like git push vs git pull)
Python’s sys.argv gives you raw access to everything. The argparse module wraps that complexity and handles validation, type conversion, help text, and error messages for you.
Understanding sys.argv: The Foundation
sys.argv is a simple list. When Python runs your script, it automatically populates this list with everything typed on the command line. Let’s see what’s actually in it:
$ python inspect_argv.py hello world 42
sys.argv contents:
['inspect_argv.py', 'hello', 'world', '42']
Script name (argv[0]): inspect_argv.py
All arguments (argv[1:]: ['hello', 'world', '42']
Number of arguments: 3
This reveals something important: every element in sys.argv is a string. Even though you typed 42, it’s stored as the string '42'. If you need an integer, you must convert it yourself using int(). This is why argparse exists — it handles type conversion automatically.
Why Python Doesn’t Have argc (And What To Use Instead)
You might know that languages like C and JavaScript have both argc (argument count) and argv (argument values). Python doesn’t have argc because sys.argv is a list, and lists have a built-in length. To get the argument count in Python, you simply use len(sys.argv).
Here’s the comparison:
Language
Get argument count
Get argument value
C
argc
argv[0], argv[1], …
JavaScript (Node.js)
process.argv.length
process.argv[2] (index 2, since 0 and 1 are reserved)
Python
len(sys.argv)
sys.argv[0], sys.argv[1], …
Since Python gives you a list directly, you get the count for free. This is more Pythonic — simpler, fewer moving parts.
Parsing sys.argv Manually for Simple Scripts
For a script with just one or two arguments, manual parsing is often clearer than adding argparse:
# backup.py
import sys
import shutil
from pathlib import Path
if len(sys.argv) < 2:
print("Usage: python backup.py ")
sys.exit(1)
source = sys.argv[1]
destination = sys.argv[2] if len(sys.argv) > 2 else f"{source}.backup"
source_path = Path(source)
if not source_path.exists():
print(f"Error: {source} does not exist")
sys.exit(1)
shutil.copy(source_path, destination)
print(f"Backed up {source} to {destination}")
Output:
$ python backup.py config.json
Backed up config.json to config.json.backup
$ python backup.py config.json config_v2.json
Backed up config.json to config_v2.json
$ python backup.py nonexistent.json
Error: nonexistent.json does not exist
This pattern works: check the length of sys.argv, extract arguments by index, validate them, and exit with an error code if anything is wrong. The downside is that you’re building your own help text, validation, and error messages. When your script grows to five or more arguments, argparse becomes worth the overhead.
Manual argv parsing scales to about three arguments. After that, argparse saves your sanity.
Introducing argparse: The Standard Solution
The argparse module is Python’s built-in tool for building professional command line interfaces. It handles parsing, validation, type conversion, help text generation, and error messages. Here’s a minimal example:
# greet.py
import argparse
parser = argparse.ArgumentParser(description="Greet someone by name")
parser.add_argument("name", help="The name to greet")
parser.add_argument("--formal", action="store_true", help="Use formal greeting")
args = parser.parse_args()
if args.formal:
print(f"Good day, {args.name}. How do you do?")
else:
print(f"Hey {args.name}!")
Output:
$ python greet.py Alice
Hey Alice!
$ python greet.py Alice --formal
Good day, Alice. How do you do?
$ python greet.py --help
usage: greet.py [-h] [--formal] name
Greet someone by name
positional arguments:
name The name to greet
optional arguments:
-h, --help show this help message and exit
--formal Use formal greeting
Notice what just happened: you didn’t write any help text manually. argparse generated it from the description and help parameters you provided. It also validated that the required name argument was provided, parsed the --formal flag, and made the values accessible as attributes on the args object.
The structure is always the same: create a parser, add arguments to it, then call parse_args() to get back an object with the parsed values.
Adding Positional Arguments
Positional arguments are required values that users pass in order. They’re like function parameters:
# rename_file.py
import argparse
import os
parser = argparse.ArgumentParser(description="Rename a file")
parser.add_argument("old_name", help="Current filename")
parser.add_argument("new_name", help="New filename")
args = parser.parse_args()
if not os.path.exists(args.old_name):
print(f"Error: {args.old_name} not found")
exit(1)
os.rename(args.old_name, args.new_name)
print(f"Renamed {args.old_name} to {args.new_name}")
Output:
$ echo "test" > original.txt
$ python rename_file.py original.txt renamed.txt
Renamed original.txt to renamed.txt
$ python rename_file.py nonexistent.txt backup.txt
Error: nonexistent.txt not found
$ python rename_file.py
usage: rename_file.py [-h] old_name new_name
rename_file.py: error: the following arguments are required: old_name, new_name
Positional arguments are mandatory by default. If the user doesn’t provide them, argparse exits with an error automatically. The order matters — the first argument becomes args.old_name, the second becomes args.new_name.
Adding Optional Arguments and Flags
Optional arguments are prefixed with -- (long form) or - (short form). They’re not required and can appear in any order:
# list_files.py
import argparse
import os
parser = argparse.ArgumentParser(description="List files with filtering")
parser.add_argument("directory", help="Directory to list")
parser.add_argument("--extension", "-e", help="Filter by file extension (e.g., .py)")
parser.add_argument("--verbose", "-v", action="store_true", help="Show file sizes")
parser.add_argument("--limit", type=int, default=None, help="Max number of files to show")
args = parser.parse_args()
if not os.path.isdir(args.directory):
print(f"Error: {args.directory} is not a directory")
exit(1)
files = os.listdir(args.directory)
if args.extension:
files = [f for f in files if f.endswith(args.extension)]
if args.limit:
files = files[:args.limit]
for filename in files:
if args.verbose:
filepath = os.path.join(args.directory, filename)
size = os.path.getsize(filepath)
print(f"{filename} ({size} bytes)")
else:
print(filename)
Output:
$ python list_files.py . --extension .py
script1.py
script2.py
$ python list_files.py . -e .py -v
script1.py (248 bytes)
script2.py (512 bytes)
$ python list_files.py . -e .py --limit 1
script1.py
$ python list_files.py . --help
usage: list_files.py [-h] [--extension EXTENSION] [--verbose] [--limit LIMIT] directory
List files with filtering
positional arguments:
directory Directory to list
optional arguments:
-h, --help show this help message and exit
--extension EXTENSION, -e EXTENSION
Filter by file extension (e.g., .py)
--verbose, -v Show file sizes
--limit LIMIT Max number of files to show
Key observations: --extension accepts a value (the extension string), --verbose is a boolean flag using action="store_true", and --limit has type=int for automatic conversion. The short forms -e and -v work alongside the long forms.
argparse generates help text, validates arguments, and converts types. You just define them.
Type Conversion and Default Values
One of argparse‘s strengths is automatic type conversion. Specify a type parameter and argparse converts the string input for you:
# process_config.py
import argparse
import json
parser = argparse.ArgumentParser(description="Process configuration")
parser.add_argument("--workers", type=int, default=4, help="Number of worker threads")
parser.add_argument("--timeout", type=float, default=30.0, help="Timeout in seconds")
parser.add_argument("--enable-cache", action="store_true", help="Enable caching")
parser.add_argument("--tags", type=str, default="", help="Comma-separated tags")
args = parser.parse_args()
# All values are now the correct type
config = {
"workers": args.workers,
"timeout": args.timeout,
"cache_enabled": args.enable_cache,
"tags": [t.strip() for t in args.tags.split(",") if t.strip()]
}
print("Configuration:")
print(json.dumps(config, indent=2))
The type=int and type=float parameters tell argparse to convert strings to those types. If conversion fails, argparse exits with a clear error message. Default values are provided with the default parameter and are used when the argument isn’t provided on the command line.
Restricting Values with Choices
The choices parameter restricts an argument to a fixed set of allowed values:
$ python deploy.py staging
Deploying to staging with log level info
$ python deploy.py --log-level debug staging
Deploying to staging with log level debug
$ python deploy.py testing
usage: deploy.py [-h] [--log-level {debug,info,warning,error}] {dev,staging,prod}
deploy.py: error: argument environment: invalid choice: 'testing' (choose from 'dev', 'staging', 'prod')
The choices parameter automatically validates input and displays allowed values in the help text. This prevents invalid configuration from reaching your code.
Making Optional Arguments Required
By default, arguments prefixed with -- are optional. You can make them required with required=True:
# download.py
import argparse
parser = argparse.ArgumentParser(description="Download a file")
parser.add_argument("--url", required=True, help="URL to download from")
parser.add_argument("--output", "-o", required=True, help="Output filename")
parser.add_argument("--timeout", type=int, default=30, help="Timeout in seconds")
args = parser.parse_args()
print(f"Downloading from {args.url} to {args.output} (timeout: {args.timeout}s)")
Output:
$ python download.py --url https://example.com/file.zip --output file.zip
Downloading from https://example.com/file.zip to file.zip (timeout: 30s)
$ python download.py --output file.zip
usage: download.py [-h] --url URL [-o OUTPUT] [--timeout TIMEOUT]
download.py: error: the following arguments are required: --url
This pattern is useful when you want semantic clarity — using --url=value is more explicit than a positional argument, but sometimes you still want to make it mandatory.
Mutually exclusive groups: pick one path or the other, never both.
Mutually Exclusive Argument Groups
Sometimes arguments conflict with each other. You want users to provide either option A or option B, but not both. Use a mutually exclusive group:
# format_converter.py
import argparse
parser = argparse.ArgumentParser(description="Convert data format")
parser.add_argument("input_file", help="Input file to convert")
# Create a mutually exclusive group
output_group = parser.add_mutually_exclusive_group(required=True)
output_group.add_argument("--to-json", action="store_true", help="Convert to JSON")
output_group.add_argument("--to-csv", action="store_true", help="Convert to CSV")
output_group.add_argument("--to-xml", action="store_true", help="Convert to XML")
args = parser.parse_args()
format_name = "json" if args.to_json else "csv" if args.to_csv else "xml"
print(f"Converting {args.input_file} to {format_name}")
Output:
$ python format_converter.py data.txt --to-json
Converting data.txt to json
$ python format_converter.py data.txt --to-json --to-csv
usage: format_converter.py [-h] (--to-json | --to-csv | --to-xml) input_file
format_converter.py: error: argument --to-csv: not allowed with argument --to-json
$ python format_converter.py data.txt
usage: format_converter.py [-h] (--to-json | --to-csv | --to-xml) input_file
format_converter.py: error: one of the arguments --to-json --to-csv --to-xml is required
The add_mutually_exclusive_group(required=True) creates a group where exactly one option must be chosen. Set required=False if at least one should be chosen but none is acceptable. The error messages are automatically clear about the conflict.
Building Subcommands (Like git commit, git push)
Complex tools like git use subcommands: git commit, git push, and git pull are all different commands with different arguments. argparse supports this with subparsers:
$ python git_like.py commit "Fix bug" --author Alice
Committing: 'Fix bug' by Alice
$ python git_like.py push main --remote upstream
Pushing main to upstream
$ python git_like.py log --limit 5
Showing last 5 commits
$ python git_like.py --help
usage: git_like.py [-h] {commit,push,log} ...
Git-like tool
positional arguments:
{commit,push,log} Available commands
commit Create a commit
push Push commits
log Show commit history
optional arguments:
-h, --help show this help message and exit
The add_subparsers() method creates a sub-parser for each command. Each subparser has its own arguments and help text. The dest="command" stores which subcommand was chosen in args.command. This pattern scales to tools with dozens of commands.
Real-Life Example: A File Processing CLI Tool
Let’s build a realistic tool that accepts input and output files, processes them with various options, and validates everything:
# file_processor.py
import argparse
import sys
from pathlib import Path
import json
parser = argparse.ArgumentParser(
description="Process text files with various transformations"
)
# Positional arguments
parser.add_argument("input_file", help="Input file to process")
parser.add_argument("output_file", help="Output file")
# Optional arguments
parser.add_argument("--transform", choices=["uppercase", "lowercase", "reverse"],
default="lowercase", help="Text transformation to apply")
parser.add_argument("--add-line-numbers", action="store_true",
help="Prepend line numbers")
parser.add_argument("--exclude-empty-lines", action="store_true",
help="Skip empty lines")
parser.add_argument("--max-lines", type=int, default=None,
help="Process only first N lines")
parser.add_argument("--encoding", default="utf-8",
help="File encoding")
parser.add_argument("--stats", action="store_true",
help="Print processing statistics")
args = parser.parse_args()
# Validate input file exists
input_path = Path(args.input_file)
if not input_path.exists():
print(f"Error: Input file '{args.input_file}' not found", file=sys.stderr)
sys.exit(1)
# Process the file
try:
with open(input_path, "r", encoding=args.encoding) as f:
lines = f.readlines()
except UnicodeDecodeError as e:
print(f"Error: Could not decode file with {args.encoding} encoding", file=sys.stderr)
sys.exit(1)
# Apply transformations
processed_lines = []
original_count = len(lines)
skipped_count = 0
for line_num, line in enumerate(lines, 1):
# Check line limit
if args.max_lines and line_num > args.max_lines:
break
# Skip empty lines if requested
if args.exclude_empty_lines and line.strip() == "":
skipped_count += 1
continue
# Apply transformation
content = line.rstrip("\n")
if args.transform == "uppercase":
content = content.upper()
elif args.transform == "lowercase":
content = content.lower()
elif args.transform == "reverse":
content = content[::-1]
# Add line numbers if requested
if args.add_line_numbers:
content = f"{line_num}: {content}"
processed_lines.append(content + "\n")
# Write output file
output_path = Path(args.output_file)
try:
with open(output_path, "w", encoding=args.encoding) as f:
f.writelines(processed_lines)
except IOError as e:
print(f"Error: Could not write to '{args.output_file}': {e}", file=sys.stderr)
sys.exit(1)
# Print statistics if requested
if args.stats:
stats = {
"input_file": args.input_file,
"output_file": args.output_file,
"original_lines": original_count,
"processed_lines": len(processed_lines),
"skipped_lines": skipped_count,
"transformation": args.transform,
"line_numbers_added": args.add_line_numbers,
"encoding": args.encoding
}
print("\nProcessing Statistics:")
print(json.dumps(stats, indent=2))
else:
print(f"Processed {len(processed_lines)} lines, output written to {args.output_file}")
Output:
$ cat input.txt
Hello World
This is a test
Keep going
$ python file_processor.py input.txt output.txt --transform uppercase --add-line-numbers --stats
Processing Statistics:
{
"input_file": "input.txt",
"output_file": "output.txt",
"original_lines": 5,
"processed_lines": 5,
"skipped_lines": 0,
"transformation": "uppercase",
"line_numbers_added": true,
"encoding": "utf-8"
}
$ cat output.txt
1: HELLO WORLD
2: THIS IS A TEST
3:
4: KEEP GOING
$ python file_processor.py input.txt output.txt --transform lowercase --exclude-empty-lines --max-lines 2
Processed 2 lines, output written to output.txt
$ cat output.txt
hello world
this is a test
This example demonstrates several key patterns: input validation, defensive file I/O with error handling, type-safe argument conversion, and combining multiple options. The tool is flexible (users can apply transformations, filter lines, add statistics) while remaining simple to understand and extend.
A CLI tool that accepts arguments is infinitely more useful than one with hard-coded paths.
Third-Party Alternatives: click and typer
For even more powerful CLI tools, the Python community has built two popular third-party libraries:
click is a decorator-based framework that makes building CLI tools elegant and expressive. It handles groups, commands, options, and context passing with minimal boilerplate. It’s widely used in professional tools like Flask and Invoke.
typer is the modern alternative, built on top of Click but with a focus on type hints and fewer decorators. If you’re comfortable with Python’s type annotation syntax, Typer feels more natural.
Here’s a quick comparison:
Feature
argparse
click
typer
Built-in
Yes
No (pip install)
No (pip install)
Syntax
Verbose, class-based
Decorator-based
Type hints
Subcommands
Good
Excellent
Excellent
Context/State
Manual
Built-in
Built-in
Auto-help text
Yes
Yes
Yes
Learning curve
Moderate
Low (for decorator style)
Low (for type hints)
For production scripts and tools that ship with your project, stick with argparse — no external dependencies. For internal tools, microservices, and CLIs meant for other developers, click and typer often reduce boilerplate and improve readability.
Frequently Asked Questions
How do I access sys.argv at any point in my code?
sys.argv is a global list that persists for the entire run of your script. You can import sys and access it anywhere. However, argparse is better because it parses arguments once, validates them, and gives you structured access. With argparse, you pass the args object to functions instead of having functions depend on sys.argv directly. This makes testing easier and your code more modular.
Can I make a positional argument optional?
Yes, use the nargs="?" parameter: parser.add_argument("name", nargs="?", default="World"). This makes the argument optional with a default value. If the user provides it, your code uses that value; if not, the default is used. However, this can be confusing for users because they won’t know the argument is optional just from the usage line. Use optional flags with -- instead for clarity.
How do I handle a variable number of arguments?
Use nargs="*" (zero or more), nargs="+" (one or more), or nargs=3 (exactly three). For example: parser.add_argument("files", nargs="+", help="Files to process") requires at least one file and stores them as a list in args.files.
How do I pass arguments with spaces or special characters?
Quote them on the command line: python script.py "hello world" --message "test message". The shell treats quoted strings as single arguments. Python receives them correctly in sys.argv or through argparse.
How do I test a script that uses argparse?
Mock sys.argv in your tests or call parse_args() with a list of strings instead of using the default (which reads sys.argv). Example: args = parser.parse_args(["input.txt", "--verbose"]). This lets you test different argument combinations without running the script from the command line.
Conclusion
Command line arguments transform your scripts from one-off tools into reusable, composable utilities. You’ve learned the fundamentals: sys.argv for raw access, the reasons Python doesn’t need argc, and why argparse is the standard library’s powerful answer to building professional CLI tools. You’ve seen how to parse positional arguments, optional flags, enforce type conversion, restrict choices, create mutually exclusive groups, and build subcommands. The file-processing tool example shows how these patterns combine in real code.
Now take the real-life example and extend it. Add a --config flag that reads settings from a JSON file. Build a tool that accepts multiple input files and processes them in parallel. Create a command with subcommands like your own mini git. These exercises will solidify your understanding and show you the flexibility of command line argument handling.
Email is everywhere in modern software — from order confirmations to password resets to automated reports. If you’re building a Python application that needs to send messages to users, you don’t need to pay for a third-party email service right away. Gmail, which most developers already use, has a built-in SMTP (Simple Mail Transfer Protocol) server that you can connect to directly. This opens up a world of possibilities: send alerts when your scripts finish, notify team members of important events, or automate bulk communication — all without leaving Python.
The good news: you don’t need to understand SMTP inside and out to get started. Python’s smtplib library and the email module handle the complex parts, and Gmail provides clear documentation for developers. You’ll need to set up Gmail for programmatic access (it’s a one-time configuration), but after that, it takes just a few lines of Python to send your first email.
This article covers the complete journey: setting up Gmail for Python access, connecting via SMTP, sending plain text and HTML emails, attaching files, handling errors gracefully, and using secure authentication practices. We’ll start with a working example you can run in 30 seconds, then dive into each concept in detail. By the end, you’ll be able to send formatted emails with attachments, implement proper error handling, and understand the security best practices that separate a toy script from production-ready code.
How To Send Emails From Gmail: Quick Example
Before we dive into the details, here’s a working script that sends a simple email from Gmail. This is the absolute minimum to get a message sent:
# quick_gmail_send.py
import smtplib
from email.mime.text import MIMEText
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
recipient = "recipient@example.com"
message = MIMEText("This is the body of the email.")
message['Subject'] = "Hello from Python"
message['From'] = sender_email
message['To'] = recipient
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
print("Email sent successfully!")
# Expected Output:
# Email sent successfully!
The script creates a MIMEText message (MIME stands for Multipurpose Internet Mail Extensions — it’s the standard email format), connects to Gmail’s SMTP server using SSL encryption on port 465, authenticates with your email and password, and sends. The with statement handles closing the connection automatically.
Three critical things are happening here: (1) we’re reading the email and password from environment variables, not hardcoding them into the script — this keeps your credentials safe; (2) we’re using port 465 with SMTP_SSL for secure, encrypted communication; and (3) we’re using the send_message method instead of the older sendmail, which is cleaner and handles headers automatically. The next sections explain each piece in depth.
What Is SMTP and Why Use Gmail?
SMTP is the protocol computers use to send email across the internet. When you hit “send” in your email client, it connects to an SMTP server, authenticates, and hands off your message. The server then delivers it to the recipient’s mailbox server (which uses IMAP or POP3 on the receiving end — but that’s outside our scope).
Gmail’s SMTP server is smtp.gmail.com on port 465 (for SSL/TLS encryption) or port 587 (for STARTTLS). Most developers use port 465 because it’s simpler: the connection is encrypted from the start. You authenticate using your Gmail address and a special app password (more on that in the next section), and Gmail handles delivery for you.
The advantage: you get a reliable, professional email infrastructure without hosting your own mail server or paying for a service like SendGrid. The trade-off: Gmail has rate limits (you can send up to 500 emails per day for a typical account), and bulk email is better handled by a service built for that purpose. For automating scripts, notifications, and moderate-volume communication, Gmail is perfect.
Approach
Setup Complexity
Cost
Volume Limit
Use Case
Gmail SMTP
Low
Free
500/day
Notifications, automated alerts, low-volume
SendGrid / Mailgun
Medium
Pay-as-you-go
Higher limits
Production bulk email, webhooks, analytics
Gmail API + OAuth2
High
Free
500/day
Production apps, user consent, best practices
Self-hosted SMTP
Very High
Server costs
Unlimited (delivery dependent)
Enterprise, full control
For the purposes of this article, we’re focusing on SMTP — it’s direct, easy to understand, and enough for most use cases. If you’re building a production app that sends email on behalf of users, you’ll eventually want to move to the Gmail API with OAuth2 (we’ll touch on that at the end).
App passwords exist for a reason. Don’t use your main Gmail password in code.
Setting Up Gmail for Programmatic Access
Step 1: Enable Two-Factor Authentication
Gmail no longer allows you to use your regular password in third-party apps for security reasons. First, you need to enable Two-Factor Authentication (2FA) on your Gmail account — this is a one-time setup. Go to your Google Account security page, find “How you sign in to Google,” and enable 2-Step Verification. You’ll need a phone to receive a verification code. Once that’s done, you’re ready for the next step.
Step 2: Generate an App Password
After 2FA is enabled, Google will give you the option to create “App Passwords.” An App Password is a 16-character random password that grants access to your Gmail account without ever sharing your real password. Go back to the security page, find “App passwords” (it appears under “How you sign in to Google” once 2FA is on), select “Mail” and “Windows Computer” (or your device), and Google generates a unique password. Copy this password and save it somewhere safe — you’ll only see it once.
Why use an App Password instead of your real password? If your script is compromised (or worse, your script source code is leaked on GitHub), an attacker gets access to send email from your account, but not to change your password or access other Google services. It’s a security boundary. Always use App Passwords for programmatic access.
Step 3: Store Credentials Securely
Now you have an app password. Never hardcode it in your script. If your script ends up on GitHub or in a log file, your credentials are exposed. Instead, store them in environment variables. Create a .env file in your project directory (and add .env to your .gitignore so it’s never committed):
In your Python script, read these values using the os module or the python-dotenv library (which loads .env automatically). Here’s the secure pattern:
# secure_email_setup.py
import os
from dotenv import load_dotenv
load_dotenv() # Loads GMAIL_EMAIL and GMAIL_PASSWORD from .env
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
if not sender_email or not sender_password:
raise ValueError("GMAIL_EMAIL and GMAIL_PASSWORD must be set in environment.")
print(f"Using email: {sender_email}")
# Expected Output:
# Using email: your-email@gmail.com
Install python-dotenv with pip install python-dotenv if it’s not already available. The load_dotenv() call reads your .env file and makes the variables available via os.getenv(). Checking that both values exist with the if not guard prevents confusing errors later if someone forgets to set up their .env file.
Connecting to Gmail’s SMTP Server
Python’s smtplib module is your gateway to sending email. Let’s break down the connection pattern:
# connect_to_gmail_smtp.py
import smtplib
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
# Method 1: SMTP_SSL (port 465, encrypted from start)
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
print("Connected and authenticated!")
# Expected Output:
# Connected and authenticated!
SMTP_SSL creates a secure connection to Gmail’s SMTP server on port 465. The connection is encrypted immediately, and the with statement ensures the connection closes automatically when done. The login() method authenticates using your email and app password. If the credentials are wrong, smtplib raises an SMTPAuthenticationError.
There’s an older alternative, SMTP with port 587 and starttls():
# connect_with_starttls.py
import smtplib
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
# Method 2: SMTP with STARTTLS (port 587, upgrade to encryption)
with smtplib.SMTP("smtp.gmail.com", 587) as server:
server.starttls() # Upgrade to encrypted connection
server.login(sender_email, sender_password)
print("Connected and authenticated via STARTTLS!")
# Expected Output:
# Connected and authenticated via STARTTLS!
Both methods are secure. SMTP_SSL (port 465) is simpler and preferred; STARTTLS (port 587) starts with a plain connection then upgrades to encryption. For Gmail, use SMTP_SSL unless your network blocks port 465 (rare, but it happens). The rest of this article uses port 465.
Port 465? Port 587? Both work. Pick one and move on.
Sending Plain Text Emails
The simplest email is plain text. You create a MIMEText message, set the subject and recipients, and send. Here’s the complete flow:
# send_plain_text_email.py
import smtplib
from email.mime.text import MIMEText
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
recipient = "recipient@example.com"
# Create the email message
message = MIMEText("This is the body of a plain text email.")
message['Subject'] = "Hello from Python"
message['From'] = sender_email
message['To'] = recipient
# Send via Gmail SMTP
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
print("Plain text email sent successfully!")
# Expected Output:
# Plain text email sent successfully!
The MIMEText() constructor takes the email body as a string. We then set the standard email headers: Subject, From, and To. These headers are visible to the recipient and email clients. The send_message() method (added in Python 3.2) is cleaner than the older sendmail() method because it extracts the sender and recipients from the message headers automatically.
You can send to multiple recipients by setting To as a comma-separated string and passing a list to send_message():
# send_to_multiple_recipients.py
import smtplib
from email.mime.text import MIMEText
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
recipients = ["alice@example.com", "bob@example.com"]
message = MIMEText("Hello everyone!")
message['Subject'] = "Group notification"
message['From'] = sender_email
message['To'] = ", ".join(recipients)
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
print(f"Email sent to {len(recipients)} recipients!")
# Expected Output:
# Email sent to 2 recipients!
The ", ".join(recipients) line converts the list into a comma-separated string for the To header, making it readable in the recipient’s email client. You still pass the original list to send_message() so SMTP delivers to each address directly.
Sending HTML-Formatted Emails
Plain text is fine for simple messages, but modern emails are formatted with HTML: colors, images, links, bold text, and multi-column layouts. The MIMEText constructor accepts a second argument, _subtype='html', which tells email clients to render the content as HTML instead of plain text.
# send_html_email.py
import smtplib
from email.mime.text import MIMEText
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
recipient = "recipient@example.com"
# HTML body
html_body = """
<html>
<body>
<h1 style="color: #0066cc;">Welcome!</h1>
<p>This is an <strong>HTML email</strong> with <em>formatting</em>.</p>
<a href="https://pythonhowtoprogram.com">Visit our site</a>
</body>
</html>
"""
message = MIMEText(html_body, 'html')
message['Subject'] = "Formatted HTML Email"
message['From'] = sender_email
message['To'] = recipient
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
print("HTML email sent successfully!")
# Expected Output:
# HTML email sent successfully!
The key difference: MIMEText(html_body, 'html') tells MIME that this is HTML content. Email clients that support HTML will render the formatted version; older clients fall back to plain text (the raw HTML appears, but at least the message is readable). Always make sure your HTML is valid and test in multiple email clients, as Gmail, Outlook, Apple Mail, and mobile clients each have slightly different HTML rendering engines.
For production emails, consider using a templating approach — write your HTML in a separate file and load it into the script:
# send_html_from_template.py
import smtplib
from email.mime.text import MIMEText
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
recipient = "recipient@example.com"
# Load HTML template
with open("email_template.html", "r") as f:
html_body = f.read()
message = MIMEText(html_body, 'html')
message['Subject'] = "Email from template"
message['From'] = sender_email
message['To'] = recipient
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
print("Templated email sent!")
# Expected Output:
# Templated email sent!
Keeping templates in separate files makes your code cleaner and easier to update without touching Python logic. For even more power, use a library like Jinja2 to insert variables into templates: pip install jinja2, then Template(html_body).render(user_name="Alice").
Style your emails, but remember: Outlook ignores 90% of your CSS.
Adding Attachments
Emails often carry files — invoices, PDFs, images, spreadsheets. To attach files, you need to use MIMEMultipart instead of just MIMEText. A multipart message can contain multiple components: text body, attachments, embedded images, etc.
# send_email_with_attachment.py
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email import encoders
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
recipient = "recipient@example.com"
# Create multipart message (can contain text + attachments)
message = MIMEMultipart()
message['Subject'] = "Email with PDF attachment"
message['From'] = sender_email
message['To'] = recipient
# Add text body
body = "Please find the report attached."
message.attach(MIMEText(body, 'plain'))
# Attach a file
filename = "report.pdf"
if os.path.exists(filename):
with open(filename, 'rb') as attachment:
part = MIMEBase('application', 'octet-stream')
part.set_payload(attachment.read())
encoders.encode_base64(part)
part.add_header('Content-Disposition', f'attachment; filename= {filename}')
message.attach(part)
# Send
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
print(f"Email with {filename} sent successfully!")
# Expected Output:
# Email with report.pdf sent successfully!
This pattern uses MIMEBase for generic attachments and MIMEText for the body. The file is read in binary mode (‘rb’), the bytes are base64-encoded (so they survive email transmission as text), and a Content-Disposition header tells email clients it’s an attachment with a filename. The os.path.exists() check ensures the file actually exists before trying to read it — defensive programming that prevents crashes on missing files.
For common file types, Python provides shortcuts:
# send_email_with_image_attachment.py
import smtplib
from email.mime.text import MIMEText
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
recipient = "recipient@example.com"
message
= MIMEMultipart()
message['Subject'] = "Email with image"
message['From'] = sender_email
message['To'] = recipient
body = "Here's a photo:"
message.attach(MIMEText(body, 'plain'))
# Attach an image
image_file = "screenshot.png"
if os.path.exists(image_file):
with open(image_file, 'rb') as img:
part = MIMEImage(img.read())
part.add_header('Content-Disposition', f'attachment; filename= {image_file}')
message.attach(part)
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
print("Email with image sent!")
# Expected Output:
# Email with image sent!
MIMEImage is simpler for images than MIMEBase — it handles the MIME type automatically. For PDFs, Word docs, and binary formats, use MIMEBase with 'application', 'octet-stream' (a generic binary type). For plain text files, you can use MIMEText directly without needing multipart.
Error Handling and Debugging
Email sending can fail for many reasons: wrong credentials, network issues, recipient address is invalid, rate limits hit, or the SMTP server is temporarily down. Good error handling makes debugging easier and prevents your scripts from crashing silently.
# send_email_with_error_handling.py
import smtplib
from email.mime.text import MIMEText
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
recipient = "recipient@example.com"
try:
message = MIMEText("Test email body.")
message['Subject'] = "Test"
message['From'] = sender_email
message['To'] = recipient
with smtplib.SMTP_SSL("smtp.gmail.com", 465, timeout=10) as server:
server.login(sender_email, sender_password)
server.send_message(message)
print("Email sent successfully!")
except smtplib.SMTPAuthenticationError:
print("Error: Invalid email or password.")
except smtplib.SMTPException as e:
print(f"SMTP error occurred: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
# Expected Output (on success):
# Email sent successfully!
The key exceptions to catch: SMTPAuthenticationError (wrong credentials), SMTPException (SMTP-level issues like invalid recipients or server errors), and generic Exception as a catch-all. The timeout=10 parameter tells Python to wait up to 10 seconds for a server response before giving up. Without a timeout, a hung connection can block your script forever.
Common exceptions and their causes:
Exception
Cause
Fix
SMTPAuthenticationError
Wrong email/password
Verify credentials in .env file. Regenerate app password.
SMTPNotSupportedError
SMTP command not supported
Check Gmail account type; some limits apply to newer accounts.
socket.timeout
Connection timeout
Check internet connection; increase timeout value.
ConnectionRefusedError
Can’t reach SMTP server
Verify SMTP server address; check firewall/network.
# debug_smtp_connection.py
import smtplib
from email.mime.text import MIMEText
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
try:
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.set_debuglevel(1) # Print all SMTP commands and responses
server.login(sender_email, sender_password)
message = MIMEText("Test")
message['Subject'] = "Test"
message['From'] = sender_email
message['To'] = "recipient@example.com"
server.send_message(message)
except Exception as e:
print(f"Error: {e}")
# Expected Output (with debug info):
# send: b'ehlo [your.ip.address]\r\n'
# reply: b'250-smtp.gmail.com at your service...'
# ... (many more debug lines)
The set_debuglevel(1) call prints every command sent to the server and every response received. This is invaluable for understanding what’s happening under the hood. Use it when your script fails unexpectedly.
set_debuglevel(1) reveals everything the SMTP server is thinking. Useful at 3am.
Security Best Practices
Sending email is straightforward, but there are security pitfalls that can compromise your account or expose user data.
Never Hardcode Credentials
This is rule #1. If you commit credentials to GitHub, you’ve publicly leaked them, even if you delete them later (GitHub’s history is searchable). Always use environment variables or a secrets management system:
# bad_example.py (DO NOT DO THIS)
sender_email = "my-email@gmail.com" # Exposed on GitHub!
sender_password = "xxxxxx" # Exposed on GitHub!
# good_example.py
import os
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
For local development, use a .env file (remember to add it to .gitignore). For production (servers, CI/CD pipelines, cloud environments), use your platform’s native secrets: GitHub Secrets for Actions, AWS Secrets Manager for Lambda, Google Secret Manager for Cloud Functions, etc.
Use App Passwords, Not Your Real Password
Google App Passwords are specifically designed for third-party apps. If an attacker gets an App Password, they can only send email; they can’t access your Google Drive, Gmail inbox, or change your password. If you accidentally leaked your real Gmail password, an attacker could take over your entire account. Always use App Passwords for programmatic access.
Validate Recipient Addresses
User input for email addresses should be validated. A simple regex check catches obvious typos:
# validate_email_addresses.py
import re
import smtplib
from email.mime.text import MIMEText
import os
def is_valid_email(email):
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return re.match(pattern, email) is not None
recipients = ["alice@example.com", "bob@example", "charlie@domain.co.uk"]
valid_recipients = [e for e in recipients if is_valid_email(e)]
invalid_recipients = [e for e in recipients if not is_valid_email(e)]
print(f"Valid: {valid_recipients}")
print(f"Invalid: {invalid_recipients}")
# Expected Output:
# Valid: ['alice@example.com', 'charlie@domain.co.uk']
# Invalid: ['bob@example']
This regex is simple and covers most real email formats. It’s not bulletproof (the RFC 5322 standard for email addresses is insanely complex), but it catches common mistakes. For production systems, consider sending a confirmation email and only adding to your list after the user clicks a link in the confirmation.
Be Aware of Rate Limits
Gmail limits you to 500 emails per day for standard accounts (business/workspace accounts have higher limits). If you hit this limit, Gmail temporarily blocks further sends. For bulk email, you’ll need a specialized service like SendGrid or AWS SES. For monitoring, keep a log of sent emails:
# log_sent_emails.py
import smtplib
from email.mime.text import MIMEText
import os
import json
from datetime import datetime
sender_email = os.getenv('GMAIL_EMAIL')
sender_password = os.getenv('GMAIL_PASSWORD')
log_file = "email_log.json"
daily_count = 0
# Count today's emails
if os.path.exists(log_file):
with open(log_file, 'r') as f:
logs = json.load(f)
today = datetime.now().strftime("%Y-%m-%d")
daily_count = sum(1 for log in logs if log['date'] == today)
if daily_count >= 500:
print("Error: Daily email limit reached.")
else:
# Send email
message = MIMEText("Test")
message['Subject'] = "Test"
message['From'] = sender_email
message['To'] = "recipient@example.com"
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
# Log the send
log_entry = {
'date': datetime.now().strftime("%Y-%m-%d"),
'time': datetime.now().strftime("%H:%M:%S"),
'to': "recipient@example.com"
}
logs = []
if os.path.exists(log_file):
with open(log_file, 'r') as f:
logs = json.load(f)
logs.append(log_entry)
with open(log_file, 'w') as f:
json.dump(logs, f, indent=2)
print(f"Email sent. Daily count: {daily_count + 1}/500")
# Expected Output:
# Email sent. Daily count: 1/500
This script maintains a JSON log of sends and checks the count before sending. For production, a database is more robust, but a file works for simple scripts.
Real-Life Example: Automated Report Sender
Let’s combine all the concepts into a practical project: an automated script that generates a daily report and emails it to team members. This is a common pattern for data analysis, monitoring, and notifications.
Your daily report just landed in their inbox. No manual copy-paste required.
# daily_report_sender.py
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email import encoders
import os
from datetime impo
rt datetime
import json
class ReportSender:
def __init__(self):
self.sender_email = os.getenv('GMAIL_EMAIL')
self.sender_password = os.getenv('GMAIL_PASSWORD')
if not self.sender_email or not self.sender_password:
raise ValueError("GMAIL_EMAIL and GMAIL_PASSWORD not set.")
def generate_report(self):
"""Generate a sample daily report."""
report_data = {
'date': datetime.now().strftime("%Y-%m-%d"),
'items_processed': 1250,
'errors': 3,
'success_rate': 99.76
}
return report_data
def create_html_report(self, data):
"""Create HTML-formatted report."""
html = f"""
<html>
<body style="font-family: Arial, sans-serif;">
<h2>Daily Report - {data['date']}</h2>
<table border="1" cellpadding="10">
<tr>
<td><strong>Items Processed</strong></td>
<td>{data['items_processed']}</td>
</tr>
<tr>
<td><strong>Errors</strong></td>
<td>{data['errors']}</td>
</tr>
<tr>
<td><strong>Success Rate</strong></td>
<td>{data['success_rate']}%</td>
</tr>
</table>
<p><em>Report generated by your Python automation script.</em></p>
</body>
</html>
"""
return html
def send_report(self, recipients, report_data):
"""Send the report to recipients."""
try:
message = MIMEMultipart('alternative')
message['Subject'] = f"Daily Report - {report_data['date']}"
message['From'] = self.sender_email
message['To'] = ", ".join(recipients)
# Create both plain text and HTML versions
text_body = f"Daily Report: {report_data['items_processed']} items, {report_data['errors']} errors."
html_body = self.create_html_report(report_data)
message.attach(MIMEText(text_body, 'plain'))
message.attach(MIMEText(html_body, 'html'))
with smtplib.SMTP_SSL("smtp.gmail.com", 465, timeout=10) as server:
server.login(self.sender_email, self.sender_password)
server.send_message(message)
return True, f"Report sent to {len(recipients)} recipients."
except smtplib.SMTPAuthenticationError:
return False, "Authentication failed. Check credentials."
except smtplib.SMTPException as e:
return False, f"SMTP error: {e}"
except Exception as e:
return False, f"Unexpected error: {e}"
# Main execution
if __name__ == "__main__":
try:
sender = ReportSender()
report_data = sender.generate_report()
recipients = ["alice@example.com", "bob@example.com"]
success, message = sender.send_report(recipients, report_data)
print(message)
except ValueError as e:
print(f"Setup error: {e}")
# Expected Output:
# Report sent to 2 recipients.
This example demonstrates several best practices: class-based organization separates concerns, the generate_report() method can be extended to pull real data, the create_html_report() method creates a professional-looking email, and error handling returns success/failure status. For production, you’d schedule this with cron (Unix/Linux), Task Scheduler (Windows), or a cloud scheduler (AWS EventBridge, Google Cloud Scheduler).
Alternative: Using the Gmail API with OAuth2
For production applications where your script sends email on behalf of users (not just from your own account), the Gmail API with OAuth2 is the right approach. It’s more complex than SMTP but offers better security, built-in analytics, and compliance with Google’s policies.
The difference: SMTP requires storing your password (or app password) in the script. The Gmail API uses OAuth2, where users grant permission through Google’s login flow, and you receive a token that expires. If the token is compromised, it only works for the specific permissions granted and only for a limited time.
Here’s the high-level flow: (1) Register your app in Google Cloud Console, (2) Configure OAuth2 credentials, (3) Direct users to Google’s login page where they grant permission, (4) Receive an access token, (5) Use the Gmail API (not SMTP) to send email on their behalf.
For detailed instructions, follow Google’s Gmail API sending guide. The google-auth-oauthlib and google-auth-httplib2 libraries handle the OAuth2 flow. SMTP is simpler for personal scripts and low-volume automation; the Gmail API is essential when you’re handling user accounts.
Frequently Asked Questions
My app password isn’t working. What do I check first?
Most likely culprits: (1) Two-factor authentication isn’t enabled on your Gmail account yet — go to myaccount.google.com/security and enable it. (2) You copied the app password with extra spaces — the 16-character password is sensitive to trailing/leading whitespace. (3) You’re using your regular Gmail password instead of the app password — they’re different; always use the app password for scripts. (4) Your environment variables aren’t being loaded — verify print(os.getenv('GMAIL_PASSWORD')) returns the password, not None.
I hit Gmail’s 500-email limit. How do I recover?
The limit resets daily at midnight PST. Wait until the next day, and you can send again. If you regularly need to send more than 500 emails per day, you need a transactional email service: SendGrid (100/month free, then $20+/month), Mailgun, AWS SES, or similar. These services are designed for bulk email and have much higher limits (thousands per day).
My HTML email renders differently in Gmail vs Outlook. Why?
Email clients have inconsistent CSS and HTML support. Gmail strips `
Intermediate
Why Logging Matters in Python
You’re debugging a production issue, but your application is silent. You added a few print() statements weeks ago, the messages got buried in the terminal, and now you have no idea what’s happening. Or worse: your app is logging to console, but the logs disappear the moment the process restarts. You need a way to capture what your application is doingâwhen it’s doing it, at what severity level, and where it should be recorded.
This is where Python’s built-in logging module becomes essential. Unlike print() statements, which are crude and destructive once you delete them, the logging module is a professional-grade system designed for production applications. It comes built-in to Python, requires no external dependencies, and provides granular control over message levels, formatting, and output destinations.
In this article, you’ll learn how to set up the logging module to output messages simultaneously to both your console (for immediate feedback during development) and to a file (for long-term record-keeping and debugging). We’ll cover logging levels, handlers, formatters, log rotation to prevent massive log files, and the patterns used in real multi-module projects. By the end, you’ll understand how to instrument your code with logging that developers trust.
How To Set Up Logging: Quick Example
Here’s a minimal example that outputs log messages to both console and file:
# quick_logging_example.py
import logging
# Create a logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# File handler
file_handler = logging.FileHandler("app.log")
file_handler.setLevel(logging.DEBUG)
# Console handler
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
# Formatter
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
console_handler.setFormatter(formatter)
# Add handlers to logger
logger.addHandler(file_handler)
logger.addHandler(console_handler)
# Log some messages
logger.debug("Debug message (goes to file only)")
logger.info("Info message (goes to both)")
logger.warning("Warning message (goes to both)")
logger.error("Error message (goes to both)")
logger.critical("Critical message (goes to both)")
Output (to console):
2026-03-29 14:22:15,342 - __main__ - INFO - Info message (goes to both)
2026-03-29 14:22:15,343 - __main__ - WARNING - Warning message (goes to both)
2026-03-29 14:22:15,344 - __main__ - ERROR - Error message (goes to both)
2026-03-29 14:22:15,344 - __main__ - CRITICAL - Critical message (goes to both)
Output (written to app.log):
2026-03-29 14:22:15,341 - __main__ - DEBUG - Debug message (goes to file only)
2026-03-29 14:22:15,342 - __main__ - INFO - Info message (goes to both)
2026-03-29 14:22:15,343 - __main__ - WARNING - Warning message (goes to both)
2026-03-29 14:22:15,344 - __main__ - ERROR - Error message (goes to both)
2026-03-29 14:22:15,344 - __main__ - CRITICAL - Critical message (goes to both)
Notice the key pattern: we created a logger, attached two separate handlers (one for files, one for console), set different levels for each, and applied a formatter that includes timestamps and severity levels. This is the foundation for everything that follows. The sections below show you how to customize each piece.
Good logs are how you debug code you wrote six months ago and forgot about.
What is Python Logging and Why Use It?
The logging module is Python’s standard library tool for recording events that happen during program execution. Unlike print statements, logging provides:
Multiple outputs â send logs to files, console, email, syslog, or custom handlers simultaneously
Formatting control â include timestamps, function names, line numbers, and custom metadata
Filtering â selectively log messages based on logger name, level, or custom criteria
No side effects â unlike print, you can leave logging code in production without cluttering output
The alternativeâusing print() for debuggingâbreaks down immediately:
Aspect
print() Statements
logging Module
Disable in production
Must manually remove
Adjust level, keep code in place
Output destination
Always stdout
File, console, email, or custom
Timestamps
Manual string concatenation
Automatic, customizable format
Severity levels
None
DEBUG, INFO, WARNING, ERROR, CRITICAL
Performance
Always evaluates
Can be filtered; lazy evaluation
Multi-module coordination
No built-in support
Hierarchical logger names
The logging module is designed for exactly what you need: professional-grade event recording that stays in your code indefinitely.
Understanding Logging Levels
Python’s logging module defines five standard severity levels, plus a catch-all NOTSET. Each level has a numeric value, and loggers will only record messages at or above their configured level:
Level
Numeric Value
When to Use
Example
DEBUG
10
Detailed diagnostic info for debugging
Variable values, function entry/exit, loop iterations
INFO
20
General informational messages
Application startup, config loaded, request received
WARNING
30
Something unexpected or potentially harmful
Deprecated API usage, missing optional config, retrying failed request
ERROR
40
A serious problem; some operation failed
File not found, API returned 500, database connection lost
CRITICAL
50
A very serious error; program may not continue
Out of memory, permissions denied, unrecoverable system error
When you set a logger’s level to INFO, it will log INFO, WARNING, ERROR, and CRITICAL messagesâbut not DEBUG messages. This is how you control verbosity.
# logging_levels_demo.py
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# Add a console handler so we can see output
handler = logging.StreamHandler()
handler.setLevel(logging.WARNING)
formatter = logging.Formatter("%(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
# These will NOT appear (level is below WARNING)
logger.debug("This is a debug message")
logger.info("This is an info message")
# These WILL appear
logger.warning("This is a warning message")
logger.error("This is an error message")
logger.critical("This is a critical message")
Output:
WARNING - This is a warning message
ERROR - This is an error message
CRITICAL - This is a critical message
Notice: the logger itself has one level (DEBUG), but the console handler has a different level (WARNING). You can filter messages at multiple levelsâfirst at the logger, then at each handler. This is crucial for sending different messages to different outputs (e.g., all DEBUG messages to a debug log file, only ERROR+ to a critical alert file).
Handlers and Formatters: Controlling Where and How Logs Go
A logger is just a container. The actual work happens in handlers and formatters:
Handler â an output destination. FileHandler writes to a file, StreamHandler writes to console, etc.
Formatter â defines how log messages are formatted: which fields to include (timestamp, function name, etc.) and in what order
You create a handler, assign a formatter to it, set a level, and attach it to a logger. A single logger can have multiple handlers, each with different levels and formatters.
2026-03-29 14:25:30,123 - myapp - INFO - Application started
2026-03-29 14:25:30,124 - myapp - WARNING - This is a warning
2026-03-29 14:25:30,125 - myapp - ERROR - An error occurred
The %(asctime)s token automatically includes a timestamp. Other useful tokens include %(funcName)s (the function name), %(lineno)d (line number), and %(module)s (the module filename).
After running this, check your app.log file. All four messages will be there because the file handler’s level is DEBUG.
Output (written to app.log):
2026-03-29 14:27:01,456 - myapp - DEBUG - Debug: application starting
2026-03-29 14:27:01,457 - myapp - INFO - Info: loading configuration
2026-03-29 14:27:01,458 - myapp - WARNING - Warning: deprecated API used
2026-03-29 14:27:01,459 - myapp - ERROR - Error: failed to connect to database
Handlers are traffic directors: DEBUG takes the file fork, ERROR takes the console.
Logging to Console and File Simultaneously
The most common pattern in production is to send all logs to a file (for permanent record) and only show WARNING+ messages on the console (for immediate visibility during operation). Here’s how:
# console_and_file_logging.py
import logging
import os
# Create a logger
logger = logging.getLogger("myapp")
logger.setLevel(logging.DEBUG)
# Create log directory if it doesn't exist
log_dir = "logs"
if not os.path.exists(log_dir):
os.makedirs(log_dir)
# File handler: captures all messages
file_handler = logging.FileHandler(os.path.join(log_dir, "app.log"))
file_handler.setLevel(logging.DEBUG)
# Console handler: shows only warnings and above
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.WARNING)
# Shared formatter for both handlers
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s",
datefmt="%Y-%m-%d %H:%M:%S"
)
file_handler.setFormatter(formatter)
console_handler.setFormatter(formatter)
# Attach handlers to logger
logger.addHandler(file_handler)
logger.addHandler(console_handler)
# Lof messages at different levels
logger.debug("Starting application initialization")
logger.info("Configuration loaded successfully")
logger.info("Database connection established")
logger.warning("API response time is higher than usual")
logger.error("Failed to write to cache, continuing without cache")
logger.critical("Memory usage exceeded safe threshold")
Output (to console):
2026-03-29 14:30:12 - myapp - WARNING - API response time is higher than usual
2026-03-29 14:30:12 - myapp - ERROR - Failed to write to cache, continuing without cache
2026-03-29 14:30:12 - myapp - CRITICAL - Memory usage exceeded safe threshold
Output (written to logs/app.log):
2026-03-29 14:30:12 - myapp - DEBUG - Starting application initialization
2026-03-29 14:30:12 - myapp - INFO - Configuration loaded successfully
2026-03-29 14:30:12 - myapp - INFO - Database connection established
2026-03-29 14:30:12 - myapp - WARNING - API response time is higher than usual
2026-03-29 14:30:12 - myapp - ERROR - Failed to write to cache, continuing without cache
2026-03-29 14:30:12 - myapp - CRITICAL - Memory usage exceeded safe threshold
This pattern is powerful: you get a permanent record of everything (including debug messages developers need when troubleshooting), but the console stays clean during normal operationâonly showing problems that need immediate attention. When a warning or error occurs, developers see it right away.
Custom Log Formatting with Timestamps and Metadata
The formatter string controls what information appears in each log message. The most useful format tokens are:
Token
Meaning
Example
%(asctime)s
Timestamp (human-readable)
2026-03-29 14:30:12,456
%(name)s
Logger name
myapp.database
%(levelname)s
Severity level
INFO, WARNING, ERROR
%(message)s
The actual log message
Database query completed
%(funcName)s
Name of function that logged
connect_to_db
%(filename)s
Source filename
database.py
%(lineno)d
Line number in source
42
%(module)s
Module name
database
%(process[=]d
Process ID
12345
%(thread)d
Thread ID
140256789012345
Here are some practical format examples:
# formatting_examples.py
import logging
logger = logging.getLogger("myapp")
logger.setLevel(logging.DEBUG)
# Example 1: Detailed format with function and line number
handler1 = logging.StreamHandler()
formatter1 = logging.Formatter(
"%(asctime)s [%(levelname)s] %(funcName)s:;%(lineno)d - %(message)s"
)
handler1.setFormatter(formatter1)
# Example 2: Compact format (good for production)
handler2_formatter = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
# Example 3: Include module name (useful in multi-file projects)
handler3_formatter = (
"[%(asctime)s] %(module)s - %(levelname)s - %(message)s"
)
# Example 4: ISO 8601 timestamp with timezone
handler4 = logging.StreamHandler()
formatter4 = logging.Formatter(
"%(asctime)s - %(levelname)s - %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S"
)
handler4.setFormatter(formatter4)
logger.addHandler(handler1)
def process_payment(user_id):
logger.info(f"Processing payment for user {user_id}")
logger.debug("Validating card information")
logger.info("Payment submitted to processor")
return True
process_payment(12345)
Output (Example 1 format):
2026-03-29 14:32:45,123 [INFO] process_payment:55 - Processing payment for user 12345
2026-03-29 14:32:45,124 [DEBUG] process_payment:56 - Validating card information
2026-03-29:0;( 14:32:45,125 [INFO] process_payment:57 - Payment submitted to processor
Controlling Log File Size with Log Rotation
If your application runs 24/7 and logs every request, your log files can grow huge in notime, wasting disk space. The solution is CotatingFileRotationHandler, which automatically recicycles old log files:
- WF app.log (1 MB)
- WF app.log.1 (1 MB)
- WF app.log.2 (500 KB)
- WF app.log.3 (450 KB)
- WF app.log.4 (425 KB)
- WF app.log.5 (400 KB)
(project has processed ~6350KB = 63.5 MB since rotation started)
The rotating handler automatically deletes oldest files when it exceeds the backupCount - saving space.
One logging configuration. Dozens of modules. Hierarchical loggers handle the coordination.
Logging in Multi-File, Multi-Module Projects
For anything beyond a tiny script, use hierarchical logger names based on the module structure:
45,125 [INFO] process_payment:57 – Payment submitted to processor
The detailed format is invaluable when debugging: you know exactly which function logged the message and on which line. For production systems receiving hundreds of requests per second, the compact format reduces file size while keeping essential information.
Every log message is a snapshot in time. Good formatting makes the snapshot useful.
Preventing Massive Log Files with RotatingFileHandler
If your application runs 24/7, a single FileHandler will eventually create a multi-gigabyte log file. The solution is RotatingFileHandler, which automatically archives old log files and starts a new one when the current file reaches a size limit.
# rotating_file_handler_example.py
import logging
from logging.handlers import RotatingFileHandler
import os
logger = logging.getLogger("myapp")
logger.setLevel(logging.DEBUG)
# Create log directory
log_dir = "logs"
if not os.path.exists(log_dir):
os.makedirs(log_dir)
# RotatingFileHandler: max 1 MB per file, keep 5 backups
rotating_handler = RotatingFileHandler(
filename=os.path.join(log_dir, "app.log"),
maxBytes=1024 * 1024, # 1 MB
backupCount=5 # Keep app.log.1, app.log.2, ..., app.log.5
)
rotating_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
rotating_handler.setFormatter(formatter)
logger.addHandler(rotating_handler)
# Simulate some logging activity
for i in range(100):
logger.info(f"Processing item {i}: " + "X" * 100) # Verbose message
When app.log reaches 1 MB, the handler automatically renames it to app.log.1, creates a fresh app.log, and continues logging. After 5 rotations, the oldest file is deleted. This keeps your disk usage bounded while preserving recent log history.
For time-based rotation (e.g., “create a new log file each day”), use TimedRotatingFileHandler:
# timed_rotating_file_handler_example.py
import logging
from logging.handlers import TimedRotatingFileHandler
import os
logger = logging.getLogger("myapp")
logger.setLevel(logging.DEBUG)
log_dir = "logs"
if not os.path.exists(log_dir):
os.makedirs(log_dir)
# Create a new log file every day at midnight
timed_handler = TimedRotatingFileHandler(
filename=os.path.join(log_dir, "app.log"),
when="midnight", # Rotate at midnight
interval=1, # Every 1 day
backupCount=7 # Keep 7 days of logs
)
timed_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
timed_handler.setFormatter(formatter)
logger.addHandler(timed_handler)
logger.info("Daily log rotation is configured")
The `when` parameter accepts values like "midnight" (daily), "W0" (Monday), "H" (hourly), etc. This is the preferred approach for long-running services where you want to correlate logs with calendar time.
Web scrapers talk a lot. Good logging turns chatter into intelligence.
Real-World Pattern: Logging in Multi-Module Projects
Most projects have multiple modules. The best practice is to:
Configure logging once in your main module (or in a centralized config module)
In each module, create a logger with logging.getLogger(__name__)
Every developer has experienced the monotony of repetitive tasks: renaming thousands of files, backing up project folders on schedule, generating weekly reports, or scanning for files that need processing. These are the moments when you wish a robot would just handle it while you focus on actual coding. The good news? Python makes this incredibly straightforward, and you already have everything you need in the standard library.
Python was designed with automation in mind. Libraries like os, shutil, pathlib, and smtplib give you powerful tools to interact with the file system, schedule tasks, and send notifications. You don’t need to learn complex shell scripts or invest in expensive automation software. A few lines of Python can save you hours of manual work.
In this guide, we’ll explore practical automation patterns starting with file operations and building toward a real-world automated backup system. By the end, you’ll have a toolkit for automating any repetitive task in your workflow.
Quick Example: Rename Files in Bulk
Before diving deep, let’s see automation in action. Imagine you have 500 image files named like IMG_0001.jpg, IMG_0002.jpg, and you want to prefix them with today’s date. Without automation, this takes hours. With Python, it takes seconds:
# bulk_rename.py
import os
import datetime
directory = "./photos"
prefix = datetime.date.today().strftime("%Y%m%d_")
for filename in os.listdir(directory):
if filename.endswith(".jpg"):
old_path = os.path.join(directory, filename)
new_filename = prefix + filename
new_path = os.path.join(directory, new_filename)
os.rename(old_path, new_path)
print(f"Renamed: {filename} -> {new_filename}")
That script runs instantly and accomplishes what would take manual clicking for hours. This is the power of automation.
Python automation: where boredom goes to die.
Why Automate with Python?
You might be wondering: why Python instead of shell scripts, scheduled tasks, or other tools? The answer is clarity, portability, and power. Here’s how they compare:
Task Aspect
Manual Process
Shell Script
Python Script
Development Time
Hours per occurrence
30-60 minutes
15-30 minutes
Readability
N/A
Cryptic syntax
Human-readable code
Cross-Platform
N/A
Linux/Mac only
Windows, Mac, Linux
Debugging
N/A
Difficult
Easy with proper logging
Email Integration
Manual setup
Complex
Built-in libraries
Maintainability
N/A
Hard to modify
Easy to extend and modify
Python wins for most automation tasks because it balances simplicity with power. You can read Python code six months later and understand what it does, and you can add new features without rewriting everything.
Working with Files and Directories
Using os and pathlib Modules
Python provides two ways to work with file paths and directories: the older os module and the modern pathlib module. pathlib is more intuitive and handles cross-platform differences automatically, but os is still widely used. Let’s explore both:
# file_operations.py
import os
from pathlib import Path
# Using os module
print("Using os module:")
current_dir = os.getcwd()
print(f"Current directory: {current_dir}")
# List files
for item in os.listdir("."):
if os.path.isfile(item):
print(f"File: {item}")
# Using pathlib (modern approach)
print("\nUsing pathlib:")
current_path = Path(".")
for item in current_path.iterdir():
if item.is_file():
print(f"File: {item.name}")
print(f"Size: {item.stat().st_size} bytes")
print(f"Extension: {item.suffix}")
Output:
Using os module:
Current directory: /home/user/projects
Using pathlib:
File: script.py
Size: 1245 bytes
Extension: .py
File: data.csv
Size: 5678 bytes
Extension: .csv
pathlib.Path is generally preferred because it’s more readable and handles path separators automatically (backslash on Windows, forward slash on Unix). However, both work fine depending on your preference and existing codebase.
Renaming and Organizing Files
One of the most common automation tasks is organizing files by type, date, or naming convention. The shutil module and os.rename() make this simple:
# organize_files.py
import os
import shutil
from pathlib import Path
download_dir = "./downloads"
# Create subdirectories if they don't exist
for category in ["Images", "Documents", "Archives", "Other"]:
Path(download_dir, category).mkdir(exist_ok=True)
# Organize files by extension
for filename in os.listdir(download_dir):
if filename.startswith("."):
continue
filepath = os.path.join(download_dir, filename)
if not os.path.isfile(filepath):
continue
# Determine category based on extension
ext = os.path.splitext(filename)[1].lower()
if ext in [".jpg", ".png", ".gif", ".webp"]:
category = "Images"
elif ext in [".pdf", ".doc", ".docx", ".txt"]:
category = "Documents"
elif ext in [".zip", ".rar", ".7z"]:
category = "Archives"
else:
category = "Other"
# Move file to appropriate directory
dest_path = os.path.join(download_dir, category, filename)
shutil.move(filepath, dest_path)
print(f"Moved {filename} to {category}/")
Output:
Moved vacation.jpg to Images/
Moved resume.pdf to Documents/
Moved backup.zip to Archives/
Moved config.txt to Documents/
This script is the foundation of smart file organization. In a real system, you’d add error handling, logging, and checks to avoid overwriting files. The Path.mkdir(exist_ok=True) pattern ensures directories exist without throwing errors if they do.
When your Downloads folder finally achieves organization.
Watching for File Changes with watchdog
Sometimes you need to react the moment a file appears or changes. The watchdog library monitors file system events in real-time. First, install it:
pip install watchdog
Now create a file watcher that triggers actions when new files appear:
# watch_folder.py
import time
from pathlib import Path
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class FileProcessor(FileSystemEventHandler):
def on_created(self, event):
if not event.is_directory:
filename = Path(event.src_path).name
print(f"New file detected: {filename}")
print(f"Full path: {event.src_path}")
def on_modified(self, event):
if not event.is_directory:
filename = Path(event.src_path).name
print(f"File modified: {filename}")
# Watch the current directory
observer = Observer()
observer.schedule(FileProcessor(), path=".", recursive=False)
observer.start()
print("Watching for file changes. Press Ctrl+C to stop.")
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
Output (after creating/modifying files):
Watching for file changes. Press Ctrl+C to stop.
New file detected: report.pdf
Full path: ./report.pdf
File modified: report.pdf
The watchdog library is perfect for implementing “drop a file to process it” workflows, such as converting documents, generating thumbnails, or triggering CI/CD pipelines.
Scheduling Tasks with the schedule Library
Many automation tasks need to run at specific times or intervals: daily backups, hourly data syncs, or weekly reports. The schedule library makes this elegant:
pip install schedule
Here’s how to create a task scheduler:
# task_scheduler.py
import schedule
import time
from datetime import datetime
def backup_database():
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"[{timestamp}] Running database backup...")
# Actual backup logic here
def clean_temp_files():
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"[{timestamp}] Cleaning temporary files...")
# Actual cleanup logic here
def generate_report():
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"[{timestamp}] Generating daily report...")
# Actual report generation here
# Schedule tasks
schedule.every().day.at("02:00").do(backup_database)
schedule.every().hour.do(clean_temp_files)
schedule.every().monday.at("09:00").do(generate_report)
# Keep scheduler running
print("Scheduler started. Tasks will run according to schedule.")
while True:
schedule.run_pending()
time.sleep(60) # Check every minute
Output (sample execution):
Scheduler started. Tasks will run according to schedule.
[2026-03-29 02:00:12] Running database backup...
[2026-03-29 03:00:05] Cleaning temporary files...
[2026-03-29 09:00:00] Generating daily report...
The schedule library is straightforward but doesn’t persist across system restarts. For production systems, consider using cron (Linux/Mac) or Task Scheduler (Windows) to run your Python script, or use a more robust library like APScheduler.
Sending Email Notifications with smtplib
Automating tasks is great, but you need to know when something fails or completes. Python’s built-in smtplib library sends email notifications:
# send_email.py
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
def send_notification(recipient, subject, body):
sender_email = "automation@example.com"
sender_password = "your_app_password_here"
# Create message
message = MIMEMultipart()
message["From"] = sender_email
message["To"] = recipient
message["Subject"] = subject
message.attach(MIMEText(body, "plain"))
# Send email
try:
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
print(f"Email sent to {recipient}")
except Exception as e:
print(f"Error sending email: {e}")
# Usage
send_notification(
"admin@example.com",
"Backup Complete",
"Daily backup completed successfully at 2026-03-29 02:15:30."
)
Output:
Email sent to admin@example.com
Important: Never hardcode passwords in scripts. Use environment variables or a configuration file outside version control. For Gmail, generate an “App Password” in your account settings rather than using your actual password.
Working with CSV and Excel Files for Reports
Automated reporting is a huge time-saver. Python handles CSV files natively and can create Excel files with the openpyxl library:
# generate_report.py
import csv
from datetime import datetime
from pathlib import Path
# Sample data (from database or API in real scenario)
sales_data = [
{"date": "2026-03-29", "product": "Widget A", "sales": 150},
{"date": "2026-03-29", "product": "Widget B", "sales": 200},
{"date": "2026-03-29", "product": "Widget C", "sales": 175},
]
# Generate CSV report
report_date = datetime.now().strftime("%Y%m%d")
report_filename = f"sales_report_{report_date}.csv"
with open(report_filename, "w", newline="") as csvfile:
fieldnames = ["date", "product", "sales"]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(sales_data)
print(f"Report generated: {report_filename}")
For more complex reports with formatting, install openpyxl: pip install openpyxl. This lets you create Excel files with colors, formulas, and multiple sheets.
Running System Commands with subprocess
Sometimes you need to call external programs from Python. The subprocess module handles this safely:
# run_commands.py
import subprocess
import os
# Run a simple command
result = subprocess.run(["python", "--version"], capture_output=True, text=True)
print(f"Python version: {result.stdout.strip()}")
# Run a command and capture output
result = subprocess.run(["ls", "-la"], capture_output=True, text=True)
print("Directory listing:")
print(result.stdout)
# Check if command succeeded
result = subprocess.run(["git", "status"], capture_output=True)
if result.returncode == 0:
print("Git repository is clean")
else:
print("Not a git repository or git error")
Output (Linux/Mac):
Python version: Python 3.10.6
Directory listing:
total 48
drwxr-xr-x 5 user user 4096 Mar 29 10:15 .
drwxr-xr-x 8 user user 4096 Mar 29 09:00 ..
-rw-r--r-- 1 user user 1245 Mar 29 10:12 script.py
Git repository is clean
Use capture_output=True to collect program output and text=True to get strings instead of bytes. Always check the return code to verify success.
Python calling system commands: the glue that holds automation together.
Real-Life Example: Automated Backup System
Now let’s build a complete, production-ready backup system that watches a directory and creates timestamped ZIP archives. This example combines everything we’ve learned:
# backup_system.py
import os
import shutil
import zipfile
import smtplib
import schedule
import time
from pathlib import Path
from datetime import datetime
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
class BackupManager:
def __init__(self, source_dir, backup_dir, email_to):
self.source_dir = source_dir
self.backup_dir = backup_dir
self.email_to = email_to
Path(backup_dir).mkdir(exist_ok=True)
def create_backup(self):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
backup_filename = f"backup_{timestamp}.zip"
backup_path = os.path.join(self.backup_dir, backup_filename)
try:
with zipfile.ZipFile(backup_path, "w", zipfile.ZIP_DEFLATED) as zipf:
for root, dirs, files in os.walk(self.source_dir):
for file in files:
file_path = os.path.join(root, file)
arcname = os.path.relpath(file_path, self.source_dir)
zipf.write(file_path, arcname)
file_size = os.path.getsize(backup_path) / (1024 * 1024)
print(f"Backup created: {backup_filename} ({file_size:.2f} MB)")
self.send_notification(
f"Backup Success",
f"Backup created successfully: {backup_filename}\nSize: {file_size:.2f} MB"
)
# Cleanup old backups (keep last 7)
self.cleanup_old_backups()
except Exception as e:
print(f"Backup failed: {e}")
self.send_notification("Backup Failed", f"Error: {str(e)}")
def cleanup_old_backups(self):
backups = sorted(Path(self.backup_dir).glob("backup_*.zip"))
if len(backups) > 7:
for old_backup in backups[:-7]:
old_backup.unlink()
print(f"Deleted old backup: {old_backup.name}")
def send_notification(self, subject, body):
sender_email = "backup@example.com"
sender_password = "your_app_password"
try:
message = MIMEMultipart()
message["From"] = sender_email
message["To"] = self.email_to
message["Subject"] = subject
message.attach(MIMEText(body, "plain"))
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(message)
except Exception as e:
print(f"Could not send email: {e}")
# Setup and run
if __name__ == "__main__":
manager = BackupManager(
source_dir="./important_files",
backup_dir="./backups",
email_to="admin@example.com"
)
# Schedule daily backups at 2 AM
schedule.every().day.at("02:00").do(manager.create_backup)
print("Backup system started. Waiting for scheduled time...")
while True:
schedule.run_pending()
time.sleep(60)
Output (sample):
Backup system started. Waiting for scheduled time...
Backup created: backup_20260329_020015.zip (45.32 MB)
Deleted old backup: backup_20260322_020012.zip
This system handles the full lifecycle: creating backups, managing disk space, and notifying you of success or failure. In production, you’d run this as a background service using systemd (Linux), launchd (Mac), or Task Scheduler (Windows).
Frequently Asked Questions
How do I run a Python script in the background?
Linux/Mac: Use nohup to ignore hangup signals: nohup python backup_system.py &. Or use screen or tmux for interactive backgrounds. Better: use cron to schedule it properly.
Windows: Use Task Scheduler to run the script with python.exe. Create a task that runs at startup or on a schedule without showing a window.
Should I add error handling to automation scripts?
Absolutely. Always wrap file operations in try-except blocks. Log errors to a file so you can debug later. For critical tasks, send notifications on failure. Here’s a pattern:
try:
# Your automation code
do_something()
except Exception as e:
logger.error(f"Task failed: {e}")
send_alert_email(f"Error: {e}")
Is it safe to put passwords in automation scripts?
No. Use environment variables, config files outside version control, or credential managers. For email, use app-specific passwords instead of your real password. Never commit secrets to GitHub.
import os
password = os.getenv("EMAIL_PASSWORD") # Load from environment
How do I write automation that works on Windows, Mac, and Linux?
Use pathlib.Path instead of string path concatenation–it handles separators automatically. Use subprocess carefully since some commands differ. Test on all platforms or use Docker for consistency.
What if the user’s system doesn’t have the libraries I need?
Create a requirements.txt file listing dependencies, then users can install them with pip install -r requirements.txt. For standalone scripts, use PyInstaller to bundle Python and libraries into a single executable.
Conclusion
Python automation transforms tedious manual tasks into reliable, repeatable processes. You’ve learned to work with files and directories using os and pathlib, schedule tasks with the schedule library, send email notifications via smtplib, and build complete systems like automated backups. The key is starting simple–automate your most painful task first, then gradually expand your automation toolkit.
For deeper learning, explore the official documentation: os module, pathlib, shutil, and smtplib are all built-in. For external libraries, check schedule and watchdog on PyPI. The automation possibilities are endless once you see Python as your personal robot assistant.