Beginner
CSV (Comma-Separated Values) files are the most universal format for tabular data. Excel exports CSV. Databases export CSV. Every data analytics tool imports CSV. When a colleague sends you “the data,” there’s a good chance it’s a .csv file. If you work with spreadsheets, databases, or any form of tabular data, you’ll read and write CSV files all the time.
Python’s built-in csv module handles CSV reading and writing cleanly. It manages quoting, delimiters, line endings, and encoding edge cases that would break a naive split(',') approach — like fields that contain commas inside quotes, or newlines inside values. Just import csv and you’re ready.
In this article we’ll cover reading with csv.reader and csv.DictReader, writing with csv.writer and csv.DictWriter, handling different delimiters and encodings, dealing with real-world CSV quirks, filtering and transforming CSV data, and a complete sales report generator as a practical example. By the end, you’ll be fluent with CSV handling in Python for both simple and complex files.
Reading CSV in Python: Quick Example
Let’s read a CSV file and print its contents in three lines of code:
# quick_csv.py
import csv
# Create a sample CSV file to read
with open('people.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['Name', 'Age', 'City'])
writer.writerow(['Alice', '30', 'Sydney'])
writer.writerow(['Bob', '25', 'Melbourne'])
writer.writerow(['Charlie', '35', 'Brisbane'])
# Read and print it
with open('people.csv', 'r', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
Output:
['Name', 'Age', 'City']
['Alice', '30', 'Sydney']
['Bob', '25', 'Melbourne']
['Charlie', '35', 'Brisbane']
Note the newline='' argument when opening files for writing with the csv module — this is required on Windows to prevent extra blank lines between rows. The csv.reader returns each row as a list of strings. We’ll look at csv.DictReader shortly, which gives you named fields as dicts instead of positional lists.
What Is CSV and Why Use Python’s csv Module?
A CSV file stores tabular data as plain text with values separated by a delimiter (usually a comma, but sometimes a tab, semicolon, or pipe). The first row is typically a header row with column names. While it looks simple, CSV has many edge cases that a naive line.split(',') approach gets wrong.
| CSV Edge Case | Example | What Breaks split(‘,’) |
|---|---|---|
| Field contains comma | "Smith, John",30 | Splits in the wrong place |
| Field contains newline | "line1\nline2",30 | Breaks row detection |
| Field contains quotes | "say ""hello""",30 | Escaping ignored |
| Tab-separated values | Alice\t30\tSydney | Delimiter mismatch |
| Different encodings | Accented chars in latin-1 | UnicodeDecodeError |
Python’s csv module handles all of these correctly. It follows RFC 4180 (the CSV standard) by default and lets you configure delimiters, quoting, and line terminators through the dialect system.
Reading CSV with DictReader
csv.DictReader is the preferred way to read CSV files for most use cases. It reads the header row and returns each subsequent row as an OrderedDict (or regular dict in Python 3.8+) with column names as keys. No more remembering which index is which.
# dict_reader.py
import csv
# Create a sample CSV with product data
csv_data = """id,product,price,stock,category
1,Python Handbook,29.99,150,Books
2,USB-C Hub,49.99,75,Electronics
3,Mechanical Keyboard,89.99,40,Electronics
4,Standing Desk Pad,24.99,200,Office
5,Monitor Light,35.00,90,Electronics
"""
with open('products.csv', 'w', newline='', encoding='utf-8') as f:
f.write(csv_data)
# Read with DictReader
with open('products.csv', 'r', encoding='utf-8') as f:
reader = csv.DictReader(f)
products = list(reader)
print(f'Loaded {len(products)} products\n')
# Access by column name
for p in products:
name = p['product']
price = float(p['price'])
stock = int(p['stock'])
print(f' {name}: ${price:.2f} ({stock} in stock)')
# Filter: electronics under $60
print('\nElectronics under $60:')
electronics = [p for p in products
if p['category'] == 'Electronics' and float(p['price']) < 60]
for p in electronics:
print(f' {p["product"]} - ${p["price"]}')
Output:
Loaded 5 products
Python Handbook: $29.99 (150 in stock)
USB-C Hub: $49.99 (75 in stock)
Mechanical Keyboard: $89.99 (40 in stock)
Standing Desk Pad: $24.99 (200 in stock)
Monitor Light: $35.00 (90 in stock)
Electronics under $60:
USB-C Hub - $49.99
Monitor Light - $35.00
Remember that all values from DictReader are strings -- use int() or float() to convert numeric fields before arithmetic. A common defensive pattern: wrap conversions in try/except or use a helper like safe_float = lambda x: float(x) if x else 0.0 to handle empty or malformed fields.
Writing CSV with DictWriter
csv.DictWriter lets you write dicts to CSV rows without tracking column order manually. You specify the field names once and then write rows as dicts.
# dict_writer.py
import csv
from datetime import date
# Data to write
orders = [
{'order_id': 'ORD-001', 'customer': 'Alice', 'amount': 149.97, 'date': '2026-04-16', 'status': 'shipped'},
{'order_id': 'ORD-002', 'customer': 'Bob', 'amount': 89.99, 'date': '2026-04-16', 'status': 'pending'},
{'order_id': 'ORD-003', 'customer': 'Charlie', 'amount': 24.99, 'date': '2026-04-15', 'status': 'delivered'},
]
fieldnames = ['order_id', 'customer', 'amount', 'date', 'status']
with open('orders.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader() # Write the header row
writer.writerows(orders) # Write all rows at once
print('Written to orders.csv')
# Verify: read it back
with open('orders.csv', 'r', encoding='utf-8') as f:
print(f.read())
Output:
Written to orders.csv
order_id,customer,amount,date,status
ORD-001,Alice,149.97,2026-04-16,shipped
ORD-002,Bob,89.99,2026-04-16,pending
ORD-003,Charlie,24.99,2026-04-15,delivered
writer.writeheader() writes the field names as the first row. writer.writerows(orders) writes all rows in one call, which is more efficient than looping and calling writer.writerow() for each item. If a dict has extra keys not in fieldnames, DictWriter ignores them by default (or raises an error with extrasaction='raise').
Handling Different Delimiters and Encodings
Not all "CSV" files use commas. Tab-separated files (.tsv) are common in bioinformatics. Semicolon-separated files appear frequently in European locales (where commas are decimal separators). The delimiter parameter handles all of these.
# custom_delimiter.py
import csv
# Write a tab-separated file
tsv_data = [
['gene_id', 'chromosome', 'start', 'end', 'strand'],
['BRCA1', 'chr17', '43044295', '43125482', '-'],
['TP53', 'chr17', '7661779', '7687538', '-'],
['EGFR', 'chr7', '55086725', '55275031', '+'],
]
with open('genes.tsv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerows(tsv_data)
# Read it back
with open('genes.tsv', 'r', encoding='utf-8') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
print(' | '.join(f'{col:15}' for col in row))
Output:
gene_id | chromosome | start | end | strand
BRCA1 | chr17 | 43044295 | 43125482 | -
TP53 | chr17 | 7661779 | 7687538 | -
EGFR | chr7 | 55086725 | 55275031 | +
For Windows-generated CSV files, you may encounter cp1252 encoding instead of UTF-8. If you get a UnicodeDecodeError, try encoding='cp1252' or encoding='latin-1'. For files where you don't know the encoding, the chardet library (install via pip) can detect it automatically.
Real-Life Example: Monthly Sales Report Generator
Here's a complete script that reads a raw sales CSV, filters and aggregates the data, and writes a formatted monthly summary report.
# sales_report.py
import csv
from collections import defaultdict
from datetime import datetime
# Create sample sales data
raw_sales = [
['date', 'product', 'category', 'quantity', 'unit_price', 'region'],
['2026-04-01', 'Python Handbook', 'Books', '3', '29.99', 'NSW'],
['2026-04-01', 'USB-C Hub', 'Electronics', '2', '49.99', 'VIC'],
['2026-04-02', 'Standing Desk Pad', 'Office', '5', '24.99', 'QLD'],
['2026-04-03', 'Python Handbook', 'Books', '1', '29.99', 'NSW'],
['2026-04-03', 'Monitor Light', 'Electronics', '4', '35.00', 'VIC'],
['2026-04-04', 'Mechanical Keyboard', 'Electronics', '2', '89.99', 'NSW'],
['2026-04-05', 'USB-C Hub', 'Electronics', '3', '49.99', 'WA'],
['2026-04-06', 'Python Handbook', 'Books', '2', '29.99', 'VIC'],
]
with open('sales_raw.csv', 'w', newline='', encoding='utf-8') as f:
csv.writer(f).writerows(raw_sales)
# --- Process the data ---
category_totals = defaultdict(float)
region_totals = defaultdict(float)
product_units = defaultdict(int)
grand_total = 0.0
with open('sales_raw.csv', 'r', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
try:
qty = int(row['quantity'])
price = float(row['unit_price'])
revenue = qty * price
except (ValueError, KeyError) as e:
print(f'Skipping malformed row: {row} | Error: {e}')
continue
category_totals[row['category']] += revenue
region_totals[row['region']] += revenue
product_units[row['product']] += qty
grand_total += revenue
# --- Write the summary report ---
report_rows = []
report_rows.append(['=== MONTHLY SALES REPORT: April 2026 ===', '', ''])
report_rows.append(['', '', ''])
report_rows.append(['REVENUE BY CATEGORY', '', ''])
report_rows.append(['Category', 'Revenue', '% of Total'])
for cat, rev in sorted(category_totals.items(), key=lambda x: -x[1]):
pct = rev / grand_total * 100
report_rows.append([cat, f'${rev:.2f}', f'{pct:.1f}%'])
report_rows.append(['', '', ''])
report_rows.append(['REVENUE BY REGION', '', ''])
report_rows.append(['Region', 'Revenue', '% of Total'])
for region, rev in sorted(region_totals.items(), key=lambda x: -x[1]):
pct = rev / grand_total * 100
report_rows.append([region, f'${rev:.2f}', f'{pct:.1f}%'])
report_rows.append(['', '', ''])
report_rows.append([f'GRAND TOTAL', f'${grand_total:.2f}', '100.0%'])
with open('sales_report.csv', 'w', newline='', encoding='utf-8') as f:
csv.writer(f).writerows(report_rows)
# Print summary to console
print('Sales Report Generated\n')
print('Revenue by Category:')
for cat, rev in sorted(category_totals.items(), key=lambda x: -x[1]):
print(f' {cat:20} ${rev:8.2f}')
print(f'\n {"TOTAL":20} ${grand_total:8.2f}')
print('\nReport written to sales_report.csv')
Output:
Sales Report Generated
Revenue by Category:
Electronics $529.90
Books $179.94
Office $124.95
TOTAL $834.79
Report written to sales_report.csv
This script demonstrates the complete CSV pipeline: reading raw data row by row using DictReader, aggregating with defaultdict, handling malformed rows gracefully, and writing a structured multi-section report using csv.writer. The same pattern scales to millions of rows with minimal modification.
Frequently Asked Questions
Why does my CSV file have blank lines between rows on Windows?
This happens when you open the file without newline=''. Without it, Python's universal newline handling adds an extra \r\n, and the csv module adds another, resulting in double line endings. Always use open('file.csv', 'w', newline='', encoding='utf-8') when writing CSV files to prevent this issue.
My CSV file has garbled characters. What's wrong?
The file was probably saved with a different encoding -- most commonly Windows' cp1252 or latin-1. Try encoding='cp1252' or encoding='latin-1' when opening the file. Excel in particular often exports as cp1252 rather than UTF-8. If you're not sure, install the chardet library and run chardet.detect(open('file.csv', 'rb').read()) to detect the encoding automatically.
How do I handle a very large CSV file without running out of memory?
The csv.reader and csv.DictReader objects are iterators -- they read one row at a time, so they don't load the entire file into memory. Don't call list(reader) on a huge file; instead, process rows in a for loop. For multi-gigabyte files, this approach uses only a few KB of memory regardless of file size.
How do I handle fields that contain commas or newlines?
The csv module handles this automatically. When you write a field that contains a comma, quote, or newline, the writer automatically wraps it in double quotes. When you read it back, the reader correctly identifies the entire quoted field as a single value. You don't need to do any manual escaping -- just let the csv module handle it.
Should I use the csv module or pandas for CSV?
For simple reading/writing and lightweight processing, the built-in csv module is perfect -- no dependencies, fast startup, and minimal memory overhead. For heavy data manipulation (filtering, grouping, joining, sorting large datasets), pandas is faster and more expressive. The csv module is the right choice for scripts that need to run on any machine without installing dependencies.
Conclusion
Python's csv module is a reliable, production-ready tool for all CSV work. We covered csv.reader and csv.writer for row-level access, csv.DictReader and csv.DictWriter for named-field access, handling custom delimiters like tabs and semicolons, always using newline='' on Windows, encoding issues and how to debug them, defensive parsing with try/except, and a complete aggregation-based sales report generator. The csv module handles all the quoting and escaping edge cases that would trip up a naive split-based approach.
Try extending the sales report example to generate a pivot table by region and category, or add a chart using matplotlib based on the aggregated data. CSV processing and data visualization are a natural combination.
For the complete API including dialect configuration and custom quoting behavior, see the official csv module documentation.