Should I use csv module or pandas? A: Use csv for simple operations and to avoid dependencies. Use pandas when you need analysis, filtering, or complex transformations. Pandas makes data manipulation much easier and faster to code. Q: How do I handle CSV files with different delimiters? A: With csv module: csv.reader(file, delimiter=’;’) or with pandas: pd.read_csv(‘file.csv’, sep=’;’) Q: What if my CSV has special characters or encoding issues? A: Specify encoding: pd.read_csv(‘file.csv’, encoding=’utf-8′) or pd.read_csv(‘file.csv’, encoding=’latin-1′) Q: Can I read CSV directly from a URL? A: Yes! df = pd.read_csv(‘https://example.com/data.csv’) works directly with pandas. Q: How do I export a pandas DataFrame to different formats? A: DataFrame has methods for many formats: to_csv(), to_excel(), to_json(), to_html(), and more. Conclusion CSV files are everywhere in data work, and Python makes handling them simple. Start with the built-in csv module for basic needs, then graduate to pandas when you need real analysis power. The combination of these tools covers everything from simple data reading to complex transformations. References Python csv module documentation Pandas read_csv documentation CSV format on Wikipedia line.split(',') works until a field contains a comma. Then you learn why csv.reader exists. Related Articles How To Read and Write JSON Files in Python 3 With Examples How To Use Python Pathlib For Modern File Path Handling How To Use Python TOML For Configuration Files Instead of INI [/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section] Writing CSV Files Writing CSV is symmetric — use csv.writer to emit rows from any iterable, or csv.DictWriter when you already have dict data. The crucial setting is newline="": import csv rows = [ {"name": "Alice", "age": 28, "salary": 65000}, {"name": "Bob", "age": 34, "salary": 72000}, ] # Best practice: open with newline='' to avoid double-line-break on Windows with open("out.csv", "w", newline="") as f: writer = csv.DictWriter(f, fieldnames=["name", "age", "salary"]) writer.writeheader() writer.writerows(rows) Without newline="", Python's universal-newlines mode and the csv module both insert \r\n, giving you blank rows between every line on Windows. This is the #1 csv-writing gotcha. Reading CSV with pandas For analysis, pandas is dramatically more ergonomic. read_csv() infers types, handles missing values, and gives you a DataFrame with vectorized operations: import pandas as pd df = pd.read_csv("employees.csv") # Filter rows engineering = df[df["department"] == "Engineering"] # Aggregate print(df.groupby("department")["salary"].mean()) # Transform df["raise"] = df["salary"] * 0.05 df["new_salary"] = df["salary"] + df["raise"] # Write back to CSV df.to_csv("employees_with_raise.csv", index=False) The index=False argument prevents pandas from writing the row index as a column — almost always what you want. Handling Different Delimiters and Quoting "CSV" is loosely standardized. Files use tabs, semicolons, or pipes as delimiters. Quoted fields may contain commas or embedded quotes. Both the csv module and pandas handle this with parameters: # Tab-separated values df = pd.read_csv("data.tsv", sep="\t") # European-format CSV (semicolon delimiter, comma decimal) df = pd.read_csv("euro.csv", sep=";", decimal=",") # Hand-rolled parser with custom quoting import csv with open("messy.csv") as f: reader = csv.reader(f, delimiter="|", quotechar='"', quoting=csv.QUOTE_NONNUMERIC) for row in reader: print(row) Common Pitfalls Encoding errors. Files saved from Excel are often cp1252 or utf-8-sig (with BOM). If you see UnicodeDecodeError, try encoding='utf-8-sig' first, then cp1252. Memory blow-ups on large files. pd.read_csv("huge.csv") loads the whole file into RAM. For multi-GB files, use chunksize=10000 to iterate, or switch to Polars / DuckDB for streaming. Mixed types in a column. Pandas guesses dtypes from the first ~100 rows. A column that starts numeric but has a string later forces the whole column to object. Pass dtype={"col": str} if you know what you want. Quoted commas inside fields. The csv module handles this correctly when you use csv.reader — don't try to split(',') manually. Embedded commas inside quoted strings will break naive parsers. Missing the header. If your file has no header row, pass header=None to pandas. The opposite mistake — treating the data row as the header — is just as common. FAQ Q: csv module or pandas?

csv module for streaming row-by-row in production code where memory matters. pandas for analysis, transformation, and anywhere you want vectorized operations. They're complementary, not competitive.

How do I handle CSV files larger than RAM?

pd.read_csv(file, chunksize=N) returns an iterator of DataFrames of N rows each. Or use Polars ( pl.scan_csv ) for lazy evaluation with predicate pushdown, or DuckDB for SQL over CSV without loading.

How do I read just specific columns?

pd.read_csv("file.csv", usecols=["name", "age"]) — much faster than loading everything and dropping later.

Why does my CSV have weird characters like 'â€"' or 'â€™'?

Encoding mismatch. The file is UTF-8 but you're reading as Latin-1 (or vice versa). Try encoding='utf-8' first, then 'utf-8-sig' for files exported from Excel.

How do I append rows to an existing CSV instead of overwriting?

Open with mode "a" (append) and skip the header: writer = csv.writer(f); writer.writerows(new_rows) . For pandas: df.to_csv("file.csv", mode="a", header=False, index=False) .

How To Work With CSV Files in Python Using the csv Module and Pandas

Last Updated: June 01, 2026

Beginner

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

How To Work With CSV Files in Python Using the csv Module and Pandas

Part of the Python Data Stack Hub. See the full hub for related Python tutorials.

Quick Example (TLDR)

Reading a CSV file with Python is simple. Here’s the fastest way using pandas:

# The quick way: pandas
import pandas as pd

# Read CSV file
df = pd.read_csv('data.csv')

# Access data
print(df.head())
print(df['column_name'].mean())

Output:

   name   age  salary
0  Alice   28   65000
1    Bob   34   72000
2  Carol   29   68000
3  David   45   95000
4   Eve   31   71000

Understanding CSV Files

CSV stands for “Comma-Separated Values.” It’s the simplest way to store tabular data: each row is a line, columns are separated by commas. Here’s what a CSV file looks like inside:

name,age,salary,department
Alice,28,65000,Engineering
Bob,34,72000,Sales
Carol,29,68000,Engineering
David,45,95000,Management

Reading CSV with the csv Module

Python’s built-in csv module is lightweight and doesn’t require external dependencies:

import csv

# Open and read CSV file
with open('employees.csv', 'r') as file:
    # csv.reader returns an iterator over rows
    csv_reader = csv.reader(file)
    
    # Get the header row
    headers = next(csv_reader)
    print("Headers:", headers)
    
    # Process each data row
    for row in csv_reader:
        print(f"Name: {row[0]}, Age: {row[1]}, Salary: {row[2]}")

Output:

Headers: ['name', 'age', 'salary', 'department']
Name: Alice, Age: 28, Salary: 65000
Name: Bob, Age: 34, Salary: 72000
Name: Carol, Age: 29, Salary: 68000

Using DictReader for Cleaner Code

DictReader automatically treats the first row as headers and returns dictionaries instead of lists:

import csv

# DictReader uses first row as keys
with open('employees.csv', 'r') as file:
    dict_reader = csv.DictReader(file)
    
    for row in dict_reader:
        # Access by column name instead of index
        print(f"{row['name']} earns $" + row['salary'] + " in " + row['department'])

Output:

Alice earns $65000 in Engineering
Bob earns $72000 in Sales
Carol earns $68000 in Engineering
David earns $95000 in Management

Writing CSV Files

Creating a CSV file is equally straightforward:

import csv

# Data to write
employees = [
    {'name': 'Alice', 'age': 28, 'salary': 65000},
    {'name': 'Bob', 'age': 34, 'salary': 72000},
    {'name': 'Carol', 'age': 29, 'salary': 68000},
]

# Write to CSV
with open('new_employees.csv', 'w', newline='') as file:
    fieldnames = ['name', 'age', 'salary']
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    
    # Write header row
    writer.writeheader()
    
    # Write data rows
    writer.writerows(employees)

print("File written successfully!")

Output:

File written successfully!

Working with Pandas for Advanced Operations

Pandas makes it much easier to filter, transform, and analyze data:

import pandas as pd

# Read CSV into DataFrame
df = pd.read_csv('employees.csv')

# Basic info about the data
print(f"Total rows: {len(df)}")
print(f"Average salary: ${df['salary'].mean():.2f}")

# Filter data: salaries above 70000
high_earners = df[df['salary'] > 70000]
print("
High earners:")
print(high_earners)

# Group by department
print("
Average salary by department:")
print(df.groupby('department')['salary'].mean())

Output:

Total rows: 4
Average salary: $75000.00

High earners:
   name  age  salary department
1   Bob   34   72000     Sales
3 David   45   95000 Management

Average salary by department:
department
Engineering    66500.0
Management     95000.0
Sales          72000.0
Name: salary, dtype: float64

Handling Large CSV Files Efficiently

For massive files that don’t fit in memory, use chunking with pandas:

import pandas as pd

# Read large file in chunks
chunk_size = 10000

# Process file in batches
for chunk in pd.read_csv('huge_file.csv', chunksize=chunk_size):
    # Process each chunk
    print(f"Processing chunk with {len(chunk)} rows")
    
    # Do something with the chunk
    high_value = chunk[chunk['amount'] > 1000]
    print(f"Found {len(high_value)} high-value transactions")

Output:

Processing chunk with 10000 rows
Found 2345 high-value transactions
Processing chunk with 10000 rows
Found 2412 high-value transactions
Processing chunk with 5234 rows
Found 1123 high-value transactions

Real-Life Example: Cleaning and Merging Sales Reports

Here’s a practical example of reading, cleaning, and merging sales data from multiple CSV files:

import pandas as pd

# Read sales data from multiple sources
sales_q1 = pd.read_csv('sales_q1.csv')
sales_q2 = pd.read_csv('sales_q2.csv')

# Combine the datasets
all_sales = pd.concat([sales_q1, sales_q2], ignore_index=True)

# Data cleaning: remove duplicates
all_sales = all_sales.drop_duplicates(subset=['order_id'])
print(f"After removing duplicates: {len(all_sales)} records")

# Clean: remove rows with missing values
all_sales = all_sales.dropna(subset=['customer_id', 'amount'])
print(f"After removing null values: {len(all_sales)} records")

# Transform: add new column for commission (5% of amount)
all_sales['commission'] = all_sales['amount'] * 0.05

# Filter: only successful orders (status='completed')
completed_sales = all_sales[all_sales['status'] == 'completed']

# Analysis: sales by region
print("
Sales by region:")
region_totals = completed_sales.groupby('region')['amount'].sum()
print(region_totals)

# Sort by amount and show top 10
top_sales = completed_sales.nlargest(5, 'amount')
print("
Top 5 sales:")
print(top_sales[['order_id', 'customer_name', 'amount', 'region']])

# Save cleaned data
all_sales.to_csv('cleaned_sales.csv', index=False)
print("
Cleaned data saved to cleaned_sales.csv")

Output:

After removing duplicates: 1997 records
After removing null values: 1985 records

Sales by region:
region
North    45230.50
South    38920.75
East     52340.25
West     41230.00
Name: amount, dtype: float64

Top 5 sales:
  order_id customer_name  amount  region
5    ORD005   Acme Corp   8500.00   East
12   ORD012   TechStart   7200.50   North
18   ORD018   GlobalCo    6950.25   West
24   ORD024   InnovateLabs 6800.00   East
31   ORD031   CloudSys    6550.75   North

Cleaned data saved to cleaned_sales.csv

FAQ

Q: Should I use csv module or pandas?

A: Use csv for simple operations and to avoid dependencies. Use pandas when you need analysis, filtering, or complex transformations. Pandas makes data manipulation much easier and faster to code.

Q: How do I handle CSV files with different delimiters?

A: With csv module: csv.reader(file, delimiter=’;’) or with pandas: pd.read_csv(‘file.csv’, sep=’;’)

Q: What if my CSV has special characters or encoding issues?

A: Specify encoding: pd.read_csv(‘file.csv’, encoding=’utf-8′) or pd.read_csv(‘file.csv’, encoding=’latin-1′)

Q: Can I read CSV directly from a URL?

A: Yes! df = pd.read_csv(‘https://example.com/data.csv’) works directly with pandas.

Q: How do I export a pandas DataFrame to different formats?

A: DataFrame has methods for many formats: to_csv(), to_excel(), to_json(), to_html(), and more.

Conclusion

CSV files are everywhere in data work, and Python makes handling them simple. Start with the built-in csv module for basic needs, then graduate to pandas when you need real analysis power. The combination of these tools covers everything from simple data reading to complex transformations.

References

Loop Larry tangled in data ribbons — line.split(‘,’) works until a field contains a comma. Then you learn why csv.reader exists.

Post Views: 81

How To Work With CSV Files in Python Using the csv Module and Pandas

How To Work With CSV Files in Python Using the csv Module and Pandas

Quick Example (TLDR)

Understanding CSV Files

Reading CSV with the csv Module

Using DictReader for Cleaner Code

Writing CSV Files

Working with Pandas for Advanced Operations

Handling Large CSV Files Efficiently

Real-Life Example: Cleaning and Merging Sales Reports

FAQ

Conclusion

References

Related Articles

Submit a Comment Cancel reply