Intermediate
Python’s itertools module is a powerhouse for creating efficient iterators that save memory and speed up your data processing. Instead of building entire lists in memory, itertools generates values on the fly, making it perfect for working with large datasets or creating complex iteration patterns.
If you have ever needed to combine multiple lists, group data by a key, or generate all possible permutations of a set, itertools has a clean, optimized solution ready to go. These tools are implemented in C under the hood, so they run significantly faster than equivalent pure Python code.
In this tutorial, you will learn the most practical itertools functions through real examples. We will cover infinite iterators, combinatoric generators, data grouping, chaining, and filtering — everything you need to write more Pythonic and memory-efficient code.
Quick Answer
The itertools module provides fast, memory-efficient iterator building blocks. Key functions include chain() for combining iterables, groupby() for grouping data, product() for cartesian products, combinations() and permutations() for combinatorics, and islice() for slicing iterators. Import with from itertools import chain, groupby, product, combinations.
Quick Example
from itertools import chain, islice, count
# Chain multiple iterables together seamlessly
combined = chain([1, 2, 3], ['a', 'b'], [True, False])
print(list(combined))
# Take the first 5 even numbers from an infinite counter
evens = (x for x in count(0, 2))
first_five = list(islice(evens, 5))
print(first_five)
[0, 2, 4, 6, 8]
The chain() function combines three separate iterables into one seamless stream without creating a new list in memory. The islice() function safely takes a slice from an infinite iterator, something you cannot do with regular list slicing.
What Is the Itertools Module?
The itertools module is part of Python’s standard library and provides a collection of fast, memory-efficient tools for creating and working with iterators. The module is inspired by constructs from functional programming languages like APL, Haskell, and SML.
The key advantage of itertools is lazy evaluation. Instead of building a complete list in memory, each function produces values one at a time as they are requested. This means you can work with datasets much larger than your available RAM, process infinite sequences, and build complex data pipelines that remain efficient.
The functions in itertools fall into three categories: infinite iterators that produce values forever, finite iterators that process input sequences, and combinatoric generators that produce arrangements of elements. All of them are implemented in C for maximum performance.
Infinite Iterators
count — Infinite Counter
The count() function generates an endless sequence of numbers starting from a given value with a specified step. You must always use it with something that limits the output, like islice() or a break in a loop.
from itertools import count, islice
# Count from 10 with step 5
counter = count(10, 5)
print(list(islice(counter, 6)))
# Useful for generating IDs
def id_generator(prefix="ID"):
for num in count(1):
yield f"{prefix}-{num:04d}"
ids = id_generator("ORD")
print([next(ids) for _ in range(4)])
[‘ORD-0001’, ‘ORD-0002’, ‘ORD-0003’, ‘ORD-0004’]
cycle — Repeat a Sequence Forever
The cycle() function takes an iterable and repeats it infinitely. This is perfect for round-robin scheduling, alternating patterns, or rotating through a fixed set of options.
from itertools import cycle, islice
# Alternate between teams for task assignment
teams = cycle(["Alpha", "Beta", "Gamma"])
tasks = ["Deploy v2.1", "Fix login bug", "Update docs",
"Database migration", "API refactor", "Write tests", "Code review"]
assignments = {task: next(teams) for task in tasks}
for task, team in assignments.items():
print(f" {team}: {task}")
Beta: Fix login bug
Gamma: Update docs
Alpha: Database migration
Beta: API refactor
Gamma: Write tests
Alpha: Code review
repeat — Repeat a Value
The repeat() function produces the same value over and over, either infinitely or a specified number of times. It is commonly used with map() or zip() to provide a constant value alongside changing data.
from itertools import repeat
# Create a list of default configurations
defaults = list(repeat({"enabled": True, "retries": 3}, 4))
print(defaults)
# Use with map for element-wise operations
import operator
bases = [2, 3, 4, 5]
squared = list(map(operator.pow, bases, repeat(2)))
print(f"Squared: {squared}")
cubed = list(map(operator.pow, bases, repeat(3)))
print(f"Cubed: {cubed}")
Squared: [4, 9, 16, 25]
Cubed: [8, 27, 64, 125]
Finite Iterators
chain — Combine Multiple Iterables
The chain() function links multiple iterables together into a single continuous stream. It processes the first iterable completely, then moves to the second, and so on — all without creating an intermediate list.
from itertools import chain
# Merge data from multiple sources
database_users = ["alice", "bob"]
api_users = ["charlie", "diana"]
file_users = ["eve"]
all_users = list(chain(database_users, api_users, file_users))
print(f"All users: {all_users}")
# chain.from_iterable flattens a list of lists
nested_data = [["python", "java"], ["rust", "go"], ["ruby"]]
flat = list(chain.from_iterable(nested_data))
print(f"Flattened: {flat}")
Flattened: [‘python’, ‘java’, ‘rust’, ‘go’, ‘ruby’]
groupby — Group Consecutive Elements
The groupby() function groups consecutive elements that share the same key. The data must be sorted by the grouping key first, or you will get unexpected results.
from itertools import groupby
from operator import itemgetter
# Group transactions by category
transactions = [
{"category": "food", "amount": 25.50},
{"category": "food", "amount": 12.00},
{"category": "transport", "amount": 35.00},
{"category": "transport", "amount": 15.50},
{"category": "entertainment", "amount": 45.00},
{"category": "food", "amount": 8.75},
]
# Sort first, then group
sorted_trans = sorted(transactions, key=itemgetter("category"))
for category, group in groupby(sorted_trans, key=itemgetter("category")):
items = list(group)
total = sum(t["amount"] for t in items)
print(f" {category}: {len(items)} transactions, ${total:.2f}")
food: 3 transactions, $46.25
transport: 2 transactions, $50.50
islice — Slice Any Iterator
The islice() function works like regular list slicing but on any iterator, including infinite ones and generators. Unlike list slicing, it does not support negative indices because iterators cannot go backwards.
from itertools import islice
# Slice a generator (can't use regular slicing)
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Get fibonacci numbers 10 through 15
fib_slice = list(islice(fibonacci(), 10, 16))
print(f"Fibonacci 10-15: {fib_slice}")
# Read first 3 lines from a large dataset (simulated)
data_lines = (f"Row {i}: data_{i}" for i in range(1000000))
preview = list(islice(data_lines, 3))
print(f"Preview: {preview}")
Preview: [‘Row 0: data_0’, ‘Row 1: data_1’, ‘Row 2: data_2’]
Combinatoric Iterators
product — Cartesian Product
The product() function computes the cartesian product of input iterables, equivalent to nested for loops. This is perfect for generating all combinations of options.
from itertools import product
# Generate all t-shirt variants
sizes = ["S", "M", "L"]
colors = ["Red", "Blue"]
styles = ["V-neck", "Crew"]
variants = list(product(sizes, colors, styles))
print(f"Total variants: {len(variants)}")
for v in variants[:6]:
print(f" {v[0]} {v[1]} {v[2]}")
# Generate grid coordinates
grid = list(product(range(3), range(3)))
print(f"\n3x3 grid: {grid}")
S Red V-neck
S Red Crew
S Blue V-neck
S Blue Crew
M Red V-neck
M Red Crew
3×3 grid: [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
combinations and permutations
The combinations() function generates all unique groups of a given size where order does not matter. The permutations() function generates all arrangements where order does matter.
from itertools import combinations, permutations
# Pick 2-person teams from 4 candidates
candidates = ["Alice", "Bob", "Charlie", "Diana"]
teams = list(combinations(candidates, 2))
print(f"Possible teams ({len(teams)}):")
for team in teams:
print(f" {team[0]} & {team[1]}")
# Order matters for race finish positions
runners = ["A", "B", "C"]
finishes = list(permutations(runners))
print(f"\nPossible race finishes ({len(finishes)}):")
for f in finishes:
print(f" 1st: {f[0]}, 2nd: {f[1]}, 3rd: {f[2]}")
Alice & Bob
Alice & Charlie
Alice & Diana
Bob & Charlie
Bob & Diana
Charlie & Diana
Possible race finishes (6):
1st: A, 2nd: B, 3rd: C
1st: A, 2nd: C, 3rd: B
1st: B, 2nd: A, 3rd: C
1st: B, 2nd: C, 3rd: A
1st: C, 2nd: A, 3rd: B
1st: C, 2nd: B, 3rd: A
Real-Life Project: Building a Data Pipeline with Itertools
Let us build a practical data processing pipeline that uses multiple itertools functions to efficiently process log data. This pipeline chains, filters, groups, and summarizes data while keeping memory usage minimal.
from itertools import chain, groupby, islice, accumulate
from operator import itemgetter
from collections import Counter
import operator
# Simulated log data from multiple servers
server1_logs = [
{"timestamp": "2024-01-15 10:01", "level": "INFO", "service": "auth", "message": "User login"},
{"timestamp": "2024-01-15 10:02", "level": "ERROR", "service": "auth", "message": "Invalid token"},
{"timestamp": "2024-01-15 10:05", "level": "INFO", "service": "api", "message": "Request processed"},
{"timestamp": "2024-01-15 10:07", "level": "WARNING", "service": "db", "message": "Slow query"},
]
server2_logs = [
{"timestamp": "2024-01-15 10:01", "level": "INFO", "service": "api", "message": "Health check"},
{"timestamp": "2024-01-15 10:03", "level": "ERROR", "service": "api", "message": "Timeout"},
{"timestamp": "2024-01-15 10:04", "level": "ERROR", "service": "db", "message": "Connection lost"},
{"timestamp": "2024-01-15 10:06", "level": "INFO", "service": "auth", "message": "Token refreshed"},
]
# Step 1: Chain all logs together
all_logs = list(chain(server1_logs, server2_logs))
print(f"Total log entries: {len(all_logs)}")
# Step 2: Sort and group by service
sorted_by_service = sorted(all_logs, key=itemgetter("service"))
print("\nLogs by service:")
for service, logs in groupby(sorted_by_service, key=itemgetter("service")):
log_list = list(logs)
error_count = sum(1 for l in log_list if l["level"] == "ERROR")
print(f" {service}: {len(log_list)} entries ({error_count} errors)")
# Step 3: Sort and group by log level
sorted_by_level = sorted(all_logs, key=itemgetter("level"))
print("\nLogs by level:")
for level, logs in groupby(sorted_by_level, key=itemgetter("level")):
count = len(list(logs))
print(f" {level}: {count}")
# Step 4: Running error count using accumulate
error_flags = [1 if log["level"] == "ERROR" else 0 for log in all_logs]
running_errors = list(accumulate(error_flags, operator.add))
print(f"\nRunning error count: {running_errors}")
print(f"Total errors: {running_errors[-1]}")
# Step 5: Get the latest 3 entries
latest = list(islice(sorted(all_logs, key=itemgetter("timestamp"), reverse=True), 3))
print("\nLatest 3 entries:")
for entry in latest:
print(f" [{entry['level']}] {entry['timestamp']} - {entry['service']}: {entry['message']}")
Logs by service:
api: 3 entries (1 errors)
auth: 3 entries (1 errors)
db: 2 entries (1 errors)
Logs by level:
ERROR: 3
INFO: 4
WARNING: 1
Running error count: [0, 1, 1, 1, 1, 2, 3, 1]
Total errors: 3
Latest 3 entries:
[WARNING] 2024-01-15 10:07 – db: Slow query
[INFO] 2024-01-15 10:06 – auth: Token refreshed
[INFO] 2024-01-15 10:05 – api: Request processed
Common Pitfalls and Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| groupby returns unexpected groups | Data not sorted by grouping key | Always sort by the same key before calling groupby |
| Iterator exhausted after first use | Iterators can only be consumed once | Convert to list if you need multiple passes, or use tee() |
| Memory error with product() | Cartesian product of large sets creates huge output | Use islice() to limit output, or process items one at a time |
| Infinite loop with count() or cycle() | No termination condition | Always pair infinite iterators with islice(), takewhile(), or break |
| accumulate gives wrong type | Initial value type mismatch | Pass an explicit initial value matching your expected type |
| combinations_with_replacement unexpected | Confused with regular combinations | Use combinations() for no repeats, combinations_with_replacement() when repeats are allowed |
Frequently Asked Questions
chain() creates a lazy iterator that processes elements one at a time without creating a new list in memory. List concatenation with + creates an entirely new list containing all elements from both sources. For large datasets, chain is significantly more memory-efficient because it never holds more than one element at a time.
Yes, but with some caveats. Itertools functions work with any iterable, so you can use them on DataFrame columns, rows from iterrows(), or index values. However, pandas has its own optimized methods like groupby, merge, and concat that are usually faster for DataFrame operations. Use itertools when working with pure Python iterables or when pandas does not have an equivalent function.
You cannot restart a consumed iterator. Instead, use itertools.tee() to create independent copies before consuming, store the data in a list if it fits in memory, or recreate the iterator from the original source. For generators, call the generator function again to get a fresh iterator.
product() is cleaner and more readable than nested loops when you need all combinations of multiple iterables. It also makes it easy to dynamically change the number of nested dimensions. Use nested for loops when you need complex logic between iterations, early exits, or when only some combinations should be processed.
For simple operations, list comprehensions and itertools have similar speed. The main advantage of itertools is memory efficiency rather than raw speed. When processing millions of items, itertools avoids building large intermediate lists, which prevents memory issues and can actually be faster due to reduced memory allocation overhead. For small datasets, the difference is negligible.
Conclusion
The itertools module transforms how you handle iteration in Python. You learned how infinite iterators like count() and cycle() create endless streams, how chain() and groupby() organize data efficiently, how combinatoric tools like product() and combinations() generate arrangements, and how islice() safely limits output from any iterator.
The data pipeline example showed how these tools compose naturally to build efficient processing chains. Start by replacing list concatenation with chain() and nested loops with product() in your existing code — those two changes will immediately make your code more readable and memory-efficient. As you get comfortable, explore groupby() and accumulate() for more advanced data processing patterns.