Last Updated: June 01, 2026
Beginner
Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.
View all tutorials by Pubs →How To Work With CSV Files in Python Using the csv Module and Pandas
Part of the Python Data Stack Hub. See the full hub for related Python tutorials.
Quick Example (TLDR)
Reading a CSV file with Python is simple. Here’s the fastest way using pandas:
# The quick way: pandas
import pandas as pd
# Read CSV file
df = pd.read_csv('data.csv')
# Access data
print(df.head())
print(df['column_name'].mean())
Output:
name age salary
0 Alice 28 65000
1 Bob 34 72000
2 Carol 29 68000
3 David 45 95000
4 Eve 31 71000
Understanding CSV Files
CSV stands for “Comma-Separated Values.” It’s the simplest way to store tabular data: each row is a line, columns are separated by commas. Here’s what a CSV file looks like inside:
name,age,salary,department
Alice,28,65000,Engineering
Bob,34,72000,Sales
Carol,29,68000,Engineering
David,45,95000,Management
Reading CSV with the csv Module
Python’s built-in csv module is lightweight and doesn’t require external dependencies:
import csv
# Open and read CSV file
with open('employees.csv', 'r') as file:
# csv.reader returns an iterator over rows
csv_reader = csv.reader(file)
# Get the header row
headers = next(csv_reader)
print("Headers:", headers)
# Process each data row
for row in csv_reader:
print(f"Name: {row[0]}, Age: {row[1]}, Salary: {row[2]}")
Output:
Headers: ['name', 'age', 'salary', 'department']
Name: Alice, Age: 28, Salary: 65000
Name: Bob, Age: 34, Salary: 72000
Name: Carol, Age: 29, Salary: 68000
Using DictReader for Cleaner Code
DictReader automatically treats the first row as headers and returns dictionaries instead of lists:
import csv
# DictReader uses first row as keys
with open('employees.csv', 'r') as file:
dict_reader = csv.DictReader(file)
for row in dict_reader:
# Access by column name instead of index
print(f"{row['name']} earns $" + row['salary'] + " in " + row['department'])
Output:
Alice earns $65000 in Engineering
Bob earns $72000 in Sales
Carol earns $68000 in Engineering
David earns $95000 in Management
Writing CSV Files
Creating a CSV file is equally straightforward:
import csv
# Data to write
employees = [
{'name': 'Alice', 'age': 28, 'salary': 65000},
{'name': 'Bob', 'age': 34, 'salary': 72000},
{'name': 'Carol', 'age': 29, 'salary': 68000},
]
# Write to CSV
with open('new_employees.csv', 'w', newline='') as file:
fieldnames = ['name', 'age', 'salary']
writer = csv.DictWriter(file, fieldnames=fieldnames)
# Write header row
writer.writeheader()
# Write data rows
writer.writerows(employees)
print("File written successfully!")
Output:
File written successfully!
Working with Pandas for Advanced Operations
Pandas makes it much easier to filter, transform, and analyze data:
import pandas as pd
# Read CSV into DataFrame
df = pd.read_csv('employees.csv')
# Basic info about the data
print(f"Total rows: {len(df)}")
print(f"Average salary: ${df['salary'].mean():.2f}")
# Filter data: salaries above 70000
high_earners = df[df['salary'] > 70000]
print("
High earners:")
print(high_earners)
# Group by department
print("
Average salary by department:")
print(df.groupby('department')['salary'].mean())
Output:
Total rows: 4
Average salary: $75000.00
High earners:
name age salary department
1 Bob 34 72000 Sales
3 David 45 95000 Management
Average salary by department:
department
Engineering 66500.0
Management 95000.0
Sales 72000.0
Name: salary, dtype: float64
Handling Large CSV Files Efficiently
For massive files that don’t fit in memory, use chunking with pandas:
import pandas as pd
# Read large file in chunks
chunk_size = 10000
# Process file in batches
for chunk in pd.read_csv('huge_file.csv', chunksize=chunk_size):
# Process each chunk
print(f"Processing chunk with {len(chunk)} rows")
# Do something with the chunk
high_value = chunk[chunk['amount'] > 1000]
print(f"Found {len(high_value)} high-value transactions")
Output:
Processing chunk with 10000 rows
Found 2345 high-value transactions
Processing chunk with 10000 rows
Found 2412 high-value transactions
Processing chunk with 5234 rows
Found 1123 high-value transactions
Real-Life Example: Cleaning and Merging Sales Reports
Here’s a practical example of reading, cleaning, and merging sales data from multiple CSV files:
import pandas as pd
# Read sales data from multiple sources
sales_q1 = pd.read_csv('sales_q1.csv')
sales_q2 = pd.read_csv('sales_q2.csv')
# Combine the datasets
all_sales = pd.concat([sales_q1, sales_q2], ignore_index=True)
# Data cleaning: remove duplicates
all_sales = all_sales.drop_duplicates(subset=['order_id'])
print(f"After removing duplicates: {len(all_sales)} records")
# Clean: remove rows with missing values
all_sales = all_sales.dropna(subset=['customer_id', 'amount'])
print(f"After removing null values: {len(all_sales)} records")
# Transform: add new column for commission (5% of amount)
all_sales['commission'] = all_sales['amount'] * 0.05
# Filter: only successful orders (status='completed')
completed_sales = all_sales[all_sales['status'] == 'completed']
# Analysis: sales by region
print("
Sales by region:")
region_totals = completed_sales.groupby('region')['amount'].sum()
print(region_totals)
# Sort by amount and show top 10
top_sales = completed_sales.nlargest(5, 'amount')
print("
Top 5 sales:")
print(top_sales[['order_id', 'customer_name', 'amount', 'region']])
# Save cleaned data
all_sales.to_csv('cleaned_sales.csv', index=False)
print("
Cleaned data saved to cleaned_sales.csv")
Output:
After removing duplicates: 1997 records
After removing null values: 1985 records
Sales by region:
region
North 45230.50
South 38920.75
East 52340.25
West 41230.00
Name: amount, dtype: float64
Top 5 sales:
order_id customer_name amount region
5 ORD005 Acme Corp 8500.00 East
12 ORD012 TechStart 7200.50 North
18 ORD018 GlobalCo 6950.25 West
24 ORD024 InnovateLabs 6800.00 East
31 ORD031 CloudSys 6550.75 North
Cleaned data saved to cleaned_sales.csv
FAQ
Q: Should I use csv module or pandas?
A: Use csv for simple operations and to avoid dependencies. Use pandas when you need analysis, filtering, or complex transformations. Pandas makes data manipulation much easier and faster to code.
Q: How do I handle CSV files with different delimiters?
A: With csv module: csv.reader(file, delimiter=’;’) or with pandas: pd.read_csv(‘file.csv’, sep=’;’)
Q: What if my CSV has special characters or encoding issues?
A: Specify encoding: pd.read_csv(‘file.csv’, encoding=’utf-8′) or pd.read_csv(‘file.csv’, encoding=’latin-1′)
Q: Can I read CSV directly from a URL?
A: Yes! df = pd.read_csv(‘https://example.com/data.csv’) works directly with pandas.
Q: How do I export a pandas DataFrame to different formats?
A: DataFrame has methods for many formats: to_csv(), to_excel(), to_json(), to_html(), and more.
Conclusion
CSV files are everywhere in data work, and Python makes handling them simple. Start with the built-in csv module for basic needs, then graduate to pandas when you need real analysis power. The combination of these tools covers everything from simple data reading to complex transformations.
References
