Beginner

Excel files are everywhere in business environments, from financial reports and inventory lists to customer databases and sales analytics. While Excel is a powerful tool for data visualization and quick calculations, Python offers automation capabilities that can save hours of manual work. The openpyxl library is the most popular Python package for reading, writing, and modifying Excel files programmatically. This tutorial will guide you through everything you need to know about working with Excel files in Python, from basic operations to advanced formatting and formulas.

Whether you’re dealing with simple CSV-like data or complex workbooks with multiple sheets and intricate formatting, openpyxl provides an intuitive interface that mirrors Excel’s own structure. You’ll learn how to create workbooks from scratch, read existing files, apply professional formatting, insert formulas, and even generate charts—all without opening Excel. By the end of this guide, you’ll be able to automate your Excel workflows and handle data manipulation tasks that would take minutes manually in just seconds with Python.

The beauty of using openpyxl is that it maintains compatibility with Excel’s native features while being lightweight and easy to learn. Unlike some alternatives that require Excel to be installed on your system, openpyxl works independently, making it perfect for server-side automation, data processing pipelines, and batch file generation. You’ll also discover how to handle real-world scenarios like generating sales reports, updating employee databases, and creating formatted spreadsheets for stakeholders—all through simple Python code.

Quick Example

Let’s start with a quick glimpse of what’s possible with openpyxl. In just a few lines of code, you can create an Excel file, add data, format cells, and save it:

# quick_example.py
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill

# Create a new workbook
wb = Workbook()
ws = wb.active
ws.title = "Sales"

# Add headers
headers = ["Product", "Quantity", "Price", "Total"]
ws.append(headers)

# Style the header row
for cell in ws[1]:
    cell.font = Font(bold=True, color="FFFFFF")
    cell.fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid")

# Add data
ws.append(["Laptop", 5, 1200, 6000])
ws.append(["Mouse", 15, 25, 375])
ws.append(["Keyboard", 10, 75, 750])

# Adjust column widths
ws.column_dimensions['A'].width = 15
ws.column_dimensions['B'].width = 12
ws.column_dimensions['C'].width = 12
ws.column_dimensions['D'].width = 12

# Save the file
wb.save("sales_data.xlsx")
print("File created successfully!")

Output:

File created successfully!

This simple script creates a professional-looking spreadsheet with formatted headers and data. When you open the resulting sales_data.xlsx file in Excel, you’ll see a properly formatted table with colors and sizing already applied. That’s the power of openpyxl—automation with style.

What is openpyxl?

openpyxl is a Python library designed specifically for reading and writing Excel 2010+ files (the modern .xlsx format). Excel files are actually compressed XML documents, and openpyxl handles all the complexity of parsing and writing this format so you don’t have to. The library provides a clean, Pythonic API that allows you to work with Excel files just as you would in the Excel application itself—through workbooks, sheets, rows, columns, and cells.

The main advantages of openpyxl over alternatives include its comprehensive feature support, active maintenance, and the fact that it doesn’t require Excel to be installed on your system. Whether you’re running Python on Windows, macOS, or Linux, openpyxl works seamlessly. It’s particularly valuable for server applications, data processing pipelines, and automated reporting systems where Excel isn’t available.

Here’s how openpyxl compares to other popular options for working with Excel files in Python:

Library Format Support Writing Support Formatting Requires Excel Best For
openpyxl .xlsx, .xlsm Yes, full support Extensive (fonts, colors, borders, etc.) No Creating and modifying formatted Excel files
xlrd .xls, .xlsx No, read-only Limited No Reading older Excel files
pandas .xlsx, .xls, .csv Yes, limited Minimal No Data analysis and transformation
pywin32 .xlsx, .xls Yes, full support Extensive Yes (Windows only) Enterprise automation with Excel integration

For this tutorial, we’ll focus on openpyxl because it offers the best balance of features, ease of use, and cross-platform compatibility. Let’s get started by installing it and creating your first workbook.

Installation and Setup

Before you can use openpyxl, you need to install it on your system. This is straightforward using pip, Python’s package manager. Open your terminal or command prompt and run the following command:

# install_openpyxl.sh
pip install openpyxl

Output:

Successfully installed openpyxl-3.1.2

Once installed, you can import openpyxl in your Python scripts. The installation includes all necessary dependencies, so you won’t need to install anything else. If you’re using a virtual environment (which is recommended for Python projects), make sure you activate it before installing openpyxl.

Setting up openpyxl for Excel automation
pip install openpyxl — three words between you and never opening Excel again.

Creating Workbooks from Scratch

Creating a new Excel workbook with openpyxl is simple and intuitive. A workbook is the Excel file itself, and it can contain one or more sheets. Let’s explore how to create workbooks and add data to them:

# create_workbook.py
from openpyxl import Workbook

# Create a new workbook
wb = Workbook()

# Access the active sheet (first sheet)
ws = wb.active
print(f"Active sheet name: {ws.title}")

# You can also change the sheet name
ws.title = "Employee Data"

# Add data to cells
ws['A1'] = "Name"
ws['B1'] = "Department"
ws['C1'] = "Salary"

ws['A2'] = "Alice Johnson"
ws['B2'] = "Engineering"
ws['C2'] = 95000

ws['A3'] = "Bob Smith"
ws['B3'] = "Marketing"
ws['C3'] = 75000

# Save the workbook
wb.save("employees.xlsx")
print("Workbook created and saved!")

Output:

Active sheet name: Sheet
Workbook created and saved!

In this example, we created a new workbook, accessed its active sheet, renamed it to “Employee Data”, and added information in a table format. Notice how we accessed cells using Excel-style notation like A1, B2, etc. This makes the code very readable if you’re familiar with Excel.

You can also create multiple sheets in a single workbook, which is useful for organizing related data:

# multiple_sheets.py
from openpyxl import Workbook

wb = Workbook()
ws1 = wb.active
ws1.title = "Q1 Sales"

# Create additional sheets
ws2 = wb.create_sheet("Q2 Sales")
ws3 = wb.create_sheet("Q3 Sales")

# Add data to each sheet
for ws, quarter in [(ws1, "Q1"), (ws2, "Q2"), (ws3, "Q3")]:
    ws['A1'] = f"{quarter} Revenue"
    ws['A2'] = 150000
    ws['B1'] = f"{quarter} Expenses"
    ws['B2'] = 75000

wb.save("quarterly_report.xlsx")
print("Multi-sheet workbook created!")

Output:

Multi-sheet workbook created!

Reading Existing Excel Files

Working with existing Excel files is just as straightforward as creating new ones. openpyxl allows you to load a workbook and access its data in various ways:

# read_existing_file.py
from openpyxl import load_workbook

# Load an existing workbook
wb = load_workbook("employees.xlsx")

# Get a sheet by name
ws = wb["Employee Data"]

# Or get the active sheet
# ws = wb.active

# Iterate through all rows
print("Employee List:")
for row in ws.iter_rows(values_only=True):
    print(row)

# Access specific cells
print(f"\nFirst employee: {ws['A2'].value}")
print(f"Department: {ws['B2'].value}")

Output:

Employee List:
('Name', 'Department', 'Salary')
('Alice Johnson', 'Engineering', 95000)
('Bob Smith', 'Marketing', 75000)

First employee: Alice Johnson
Department: Engineering

The iter_rows() method is particularly useful for processing large amounts of data. The values_only=True parameter returns just the cell values without the cell objects, making it easier to work with the data.

Reading and inspecting Excel spreadsheet data
iter_rows(values_only=True) — because cell objects have feelings you don’t need.

Cell Formatting and Styling

Excel’s power lies not just in data storage but in presentation. openpyxl provides extensive formatting capabilities to make your spreadsheets professional and readable. Let’s explore fonts, colors, borders, and alignment:

# cell_formatting.py
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment

wb = Workbook()
ws = wb.active

# Font styling
ws['A1'] = "Bold and Italic"
ws['A1'].font = Font(name='Arial', size=14, bold=True, italic=True, color="FFFFFF")

# Background color (fill)
ws['A1'].fill = PatternFill(start_color="0066CC", end_color="0066CC", fill_type="solid")

# Borders
thin_border = Border(
    left=Side(style='thin'),
    right=Side(style='thin'),
    top=Side(style='thin'),
    bottom=Side(style='thin')
)
ws['A1'].border = thin_border

# Alignment
ws['A1'].alignment = Alignment(horizontal='center', vertical='center', wrap_text=True)

# Apply to a range of cells
for row in ws.iter_rows(min_row=2, max_row=5, min_col=1, max_col=3):
    for cell in row:
        cell.fill = PatternFill(start_color="E8F0FF", end_color="E8F0FF", fill_type="solid")
        cell.border = thin_border
        cell.font = Font(size=11)

ws.save("formatted.xlsx")
print("Formatted workbook saved!")

Output:

Formatted workbook saved!

Colors in openpyxl are specified using hex codes (like “0066CC”). You can find color codes online or use a color picker to match your brand colors. The formatting capabilities extend to number formats, alignment options, and even special effects like gradients.

Working with Formulas

One of the most powerful features of Excel is its ability to store formulas that automatically calculate values. openpyxl allows you to insert formulas that will be evaluated when the file is opened in Excel:

# formulas.py
from openpyxl import Workbook

wb = Workbook()
ws = wb.active

# Create a simple invoice
ws['A1'] = "Item"
ws['B1'] = "Price"
ws['C1'] = "Quantity"
ws['D1'] = "Total"

items = [
    ("Laptop", 1200, 2),
    ("Mouse", 25, 5),
    ("Keyboard", 75, 3)
]

row = 2
for item, price, qty in items:
    ws[f'A{row}'] = item
    ws[f'B{row}'] = price
    ws[f'C{row}'] = qty
    # Insert a formula for total (Price * Quantity)
    ws[f'D{row}'] = f'=B{row}*C{row}'
    row += 1

# Add a grand total formula
total_row = row
ws[f'A{total_row}'] = "Grand Total"
ws[f'D{total_row}'] = f'=SUM(D2:D{row-1})'

# Make it bold
from openpyxl.styles import Font
ws[f'A{total_row}'].font = Font(bold=True)
ws[f'D{total_row}'].font = Font(bold=True)

wb.save("invoice.xlsx")
print("Invoice with formulas created!")

Output:

Invoice with formulas created!

When you open the resulting Excel file, you’ll see that the formulas are active and update automatically if you change the prices or quantities. The formulas use standard Excel syntax, so you can use any Excel function including SUM, AVERAGE, IF, VLOOKUP, and many more.

Creating charts with openpyxl in Python
=SUM(D2:D99) hits different when Python wrote every formula.

Creating Charts

Charts make data visualization intuitive and professional. openpyxl supports creating various chart types programmatically:

# creating_charts.py
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference

wb = Workbook()
ws = wb.active
ws.title = "Sales Data"

# Add headers
ws['A1'] = "Month"
ws['B1'] = "Revenue"

# Add sales data
months = ["January", "February", "March", "April", "May"]
revenue = [45000, 52000, 48000, 61000, 58000]

for idx, (month, rev) in enumerate(zip(months, revenue), start=2):
    ws[f'A{idx}'] = month
    ws[f'B{idx}'] = rev

# Create a bar chart
chart = BarChart()
chart.title = "Monthly Revenue"
chart.x_axis.title = "Month"
chart.y_axis.title = "Revenue ($)"

# Add data to the chart
data = Reference(ws, min_col=2, min_row=1, max_row=6)
cats = Reference(ws, min_col=1, min_row=2, max_row=6)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)

# Position the chart
ws.add_chart(chart, "D2")

wb.save("sales_chart.xlsx")
print("Workbook with chart created!")

Output:

Workbook with chart created!

openpyxl supports multiple chart types including bar charts, line charts, pie charts, scatter plots, and more. Charts automatically update when data changes, just like in Excel, providing dynamic data visualization.

Merging Cells

Sometimes you want to merge cells to create headers or improve layout. openpyxl makes this straightforward:

# merging_cells.py
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment, PatternFill

wb = Workbook()
ws = wb.active

# Merge cells for a title
ws.merge_cells('A1:D1')
ws['A1'] = "Quarterly Sales Report"
ws['A1'].font = Font(size=16, bold=True)
ws['A1'].alignment = Alignment(horizontal='center', vertical='center')
ws['A1'].fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid")
ws['A1'].font = Font(size=16, bold=True, color="FFFFFF")

# Set row height for the title
ws.row_dimensions[1].height = 30

# Add column headers
headers = ["Q1", "Q2", "Q3", "Q4"]
for col, header in enumerate(headers, start=1):
    ws.cell(row=2, column=col, value=header)
    ws.cell(row=2, column=col).font = Font(bold=True)

wb.save("merged_cells.xlsx")
print("Workbook with merged cells created!")

Output:

Workbook with merged cells created!

When merging cells, the content is placed in the top-left cell of the merged range. Be careful when merging as it can affect how data is read back—make sure to reference the correct cell when accessing merged cell values.

Conditional Formatting

Conditional formatting automatically applies styles based on cell values, making it easy to highlight important data. Here’s how to implement it with openpyxl:

# conditional_formatting.py
from openpyxl import Workbook
from openpyxl.formatting.rule import CellIsRule
from openpyxl.styles import PatternFill, Font

wb = Workbook()
ws = wb.active
ws.title = "Sales Performance"

# Add headers
ws['A1'] = "Salesperson"
ws['B1'] = "Sales Amount"

# Add data
salespeople = [
    ("Alice", 85000),
    ("Bob", 42000),
    ("Charlie", 95000),
    ("Diana", 55000),
    ("Edward", 78000)
]

for idx, (name, sales) in enumerate(salespeople, start=2):
    ws[f'A{idx}'] = name
    ws[f'B{idx}'] = sales

# Create a rule to highlight high performers (>75000)
high_fill = PatternFill(start_color="00B050", end_color="00B050", fill_type="solid")
high_font = Font(bold=True, color="FFFFFF")
high_rule = CellIsRule(operator='greaterThan', formula=['75000'], fill=high_fill, font=high_font)

# Apply the rule
ws.conditional_formatting.add(f'B2:B{len(salespeople)+1}', high_rule)

wb.save("conditional_format.xlsx")
print("Workbook with conditional formatting created!")

Output:

Workbook with conditional formatting created!

Conditional formatting is powerful for highlighting trends, outliers, and important values at a glance. You can create complex rules with multiple conditions, color scales, and data bars.

Formatting Excel cells with openpyxl styles
PatternFill, Font, Border, Alignment — CSS for spreadsheets, basically.

Real-World Example: Sales Report Generator

Let’s build a practical application that demonstrates all the concepts we’ve learned. This script generates a professional sales report from raw data:

# sales_report_generator.py
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
from openpyxl.formatting.rule import CellIsRule
from openpyxl.chart import BarChart, Reference
from datetime import datetime

def generate_sales_report(data, filename="sales_report.xlsx"):
    """
    Generate a professional sales report.

    Args:
        data: List of tuples (product, quantity, unit_price, region)
        filename: Output Excel filename
    """

    wb = Workbook()
    ws = wb.active
    ws.title = "Sales Report"

    # Create title
    ws.merge_cells('A1:E1')
    title = ws['A1']
    title.value = f"Sales Report - {datetime.now().strftime('%B %Y')}"
    title.font = Font(size=16, bold=True, color="FFFFFF")
    title.fill = PatternFill(start_color="1F4E78", end_color="1F4E78", fill_type="solid")
    title.alignment = Alignment(horizontal='center', vertical='center')
    ws.row_dimensions[1].height = 25

    # Create headers
    headers = ["Product", "Quantity", "Unit Price", "Total Sales", "Region"]
    for col, header in enumerate(headers, start=1):
        cell = ws.cell(row=3, column=col, value=header)
        cell.font = Font(bold=True, color="FFFFFF")
        cell.fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
        cell.alignment = Alignment(horizontal='center')

    # Add data
    border = Border(left=Side(style='thin'), right=Side(style='thin'),
                   top=Side(style='thin'), bottom=Side(style='thin'))

    total_sales = 0
    for idx, (product, qty, price, region) in enumerate(data, start=4):
        ws[f'A{idx}'] = product
        ws[f'B{idx}'] = qty
        ws[f'C{idx}'] = price
        ws[f'D{idx}'] = f'=B{idx}*C{idx}'
        ws[f'E{idx}'] = region

        for col in range(1, 6):
            ws.cell(row=idx, column=col).border = border
            if col in [2, 3, 4]:
                ws.cell(row=idx, column=col).alignment = Alignment(horizontal='right')

    # Grand total
    last_row = len(data) + 4
    ws[f'A{last_row}'] = "TOTAL SALES"
    ws[f'D{last_row}'] = f'=SUM(D4:D{last_row-1})'
    ws[f'A{last_row}'].font = Font(bold=True, size=12)
    ws[f'D{last_row}'].font = Font(bold=True, size=12)

    # Format currency columns
    for row in range(4, last_row + 1):
        ws[f'C{row}'].number_format = '$#,##0.00'
        ws[f'D{row}'].number_format = '$#,##0.00'

    # Conditional formatting for high sales
    high_fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")
    rule = CellIsRule(operator='greaterThan', formula=['50000'], fill=high_fill)
    ws.conditional_formatting.add(f'D4:D{last_row-1}', rule)

    # Adjust column widths
    ws.column_dimensions['A'].width = 15
    ws.column_dimensions['B'].width = 12
    ws.column_dimensions['C'].width = 12
    ws.column_dimensions['D'].width = 15
    ws.column_dimensions['E'].width = 12

    # Save the workbook
    wb.save(filename)
    print(f"Report generated: {filename}")

# Sample data
sales_data = [
    ("Laptop Pro", 15, 1500, "North America"),
    ("USB Mouse", 45, 25, "Europe"),
    ("Mechanical Keyboard", 32, 120, "Asia Pacific"),
    ("Monitor 4K", 12, 400, "North America"),
    ("Webcam HD", 58, 80, "Europe"),
    ("External SSD", 28, 150, "Asia Pacific"),
    ("Laptop Stand", 40, 45, "North America"),
    ("Wireless Charger", 66, 35, "Europe"),
]

# Generate the report
generate_sales_report(sales_data)

Output:

Report generated: sales_report.xlsx

This comprehensive example creates a professional sales report with headers, formatted data, formulas for calculations, currency formatting, conditional highlighting, and proper styling. It demonstrates how all the features we’ve learned work together to create a polished, business-ready spreadsheet.

Automating Excel report generation
Report that took 20 minutes by hand now runs in 0.4 seconds. You’re welcome, accounting.

Frequently Asked Questions

How do I handle large Excel files efficiently?

For very large files, openpyxl’s default mode can consume significant memory. You can use read-only or write-only modes to process large files more efficiently. For read operations, use load_workbook(filename, read_only=True, data_only=True). For write operations, use Workbook(write_only=True). These modes stream data instead of loading everything into memory at once.

Why aren’t my formulas calculating when I open the file?

Excel doesn’t recalculate formulas automatically when a file is created by openpyxl. When you open the file in Excel, you’ll typically get a prompt to recalculate. If you want to see calculated values when reading the file back with openpyxl, you need to open it in Excel first to trigger the calculation, or use data_only=True when loading the workbook (though this requires the file to have been opened and saved in Excel previously).

Can I protect sheets or workbooks?

Yes, openpyxl supports sheet and workbook protection. You can protect a sheet with ws.protection.sheet = True and optionally set a password with ws.protection.password = "your_password". Similarly, you can protect the workbook with wb.security.workbookProtection.workbookPassword = "password". Note that these are basic protections and not cryptographically strong.

How do I properly handle dates and times in Excel cells?

Excel stores dates as numbers representing days since a reference date. When writing dates with openpyxl, use Python’s datetime objects directly: ws['A1'] = datetime.now(). openpyxl automatically handles the conversion. You can format the cell with ws['A1'].number_format = 'mm/dd/yyyy' to control how the date displays.

Is openpyxl compatible with .xls files (older Excel format)?

openpyxl only works with the modern .xlsx format (Excel 2010 and later). For older .xls files, you would need to use the xlrd library for reading or xlwt for writing. However, the easiest approach is often to convert old .xls files to .xlsx using Excel itself before processing with Python.

Can I hide rows or columns?

Yes, you can hide rows and columns in openpyxl. Use ws.row_dimensions[1].hidden = True to hide a row, or ws.column_dimensions['A'].hidden = True to hide a column. You can also freeze rows and columns for easier navigation in large spreadsheets using ws.freeze_panes = 'B2' to freeze the first row and first column.

Conclusion

You now have a comprehensive understanding of how to work with Excel files in Python using openpyxl. From creating simple spreadsheets to generating complex, professionally-formatted reports, openpyxl provides all the tools you need. The key takeaways are: start with the basics of creating workbooks and reading existing files, progress to styling and formatting for professional appearance, leverage formulas and charts for data analysis, and finally, combine everything into automated reporting solutions.

The real power of openpyxl shines when you use it to automate repetitive Excel tasks. Instead of manually creating reports, updating spreadsheets, or formatting data, you can write a Python script that does it in seconds. This skill becomes invaluable when working with data pipelines, generating client reports, or maintaining business intelligence systems.

For more information and advanced features, visit the official openpyxl documentation at https://openpyxl.readthedocs.io/. The documentation includes detailed API references, examples, and solutions to edge cases you might encounter in production environments.

Related Python Tutorials

Continue learning with these related guides: