Intermediate

Python’s built-in itertools module is powerful, but it leaves out dozens of iterator recipes that professional Python developers reinvent constantly: chunking a list into fixed-size batches, sliding a window over a sequence, flattening nested iterables, or interleaving multiple iterators. The more-itertools library packages all of these — and more than 60 others — into clean, well-tested functions that you can drop into any project. If you have ever written a loop just to split a list into groups of N, this library will make you wonder why you did not install it sooner.

more-itertools is a third-party package that extends Python’s itertools with production-ready implementations of common iterator patterns. It has no external dependencies (only the Python standard library), supports Python 3.8+, and is used by large projects like pytest and pip. Install it with pip install more-itertools. All functions accept any iterable — lists, generators, files, database cursors — and return iterators by default, so they are memory-efficient even on large datasets.

This tutorial walks through the most practically useful functions in more-itertools: batching with chunked, sliding windows with windowed, flattening nested structures with flatten and collapse, grouping and transforming with advanced groupby tools, and a handful of utility functions that solve real everyday problems. By the end, you will have a toolkit of iterator patterns that eliminate entire categories of manual looping code.

more-itertools Quick Example

The most common reason people install more-itertools is chunked — splitting a list into fixed-size batches. Here it is alongside three other one-liners that replace 10-line loops:

# more_itertools_quick.py
from more_itertools import chunked, windowed, flatten, interleave_longest

items = list(range(1, 11))  # [1, 2, 3, ..., 10]

# Split into batches of 3
print("chunked:", list(chunked(items, 3)))

# Sliding window of width 4
print("windowed:", list(windowed(items, 4))[:3], "...")

# Flatten a nested list
nested = [[1, 2], [3, [4, 5]], [6]]
print("flatten (one level):", list(flatten(nested)))

# Interleave two iterables
print("interleave:", list(interleave_longest([1, 3, 5], [2, 4, 6, 8])))

Output:

chunked: [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
windowed: [(1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 6)] ...
flatten (one level): [1, 2, [4, 5], 6]
interleave: [1, 2, 3, 4, 5, 6, 8]

Each of these replaces a manual loop. Notice chunked handles the trailing partial batch gracefully (the last chunk has only one element). windowed returns overlapping tuples — each tuple contains N consecutive elements shifted by one position. flatten only goes one level deep; for deep flattening, use collapse (covered later).

chunked(range(1000), 3): because slice math is a war crime.
chunked(range(1000), 3): because slice math is a war crime.

What Is more-itertools?

The standard library’s itertools module provides building blocks like chain, groupby, islice, and product. These are low-level primitives. more-itertools builds higher-level, ready-to-use functions on top of them — saving you from having to compose the primitives correctly every time.

Categorymore-itertools functionsWhat they solve
Batchingchunked, batched, grouperSplit iterables into fixed-size groups
Windowingwindowed, sliding_window, pairwiseSliding N-element views over a sequence
Flatteningflatten, collapse, roundrobinEliminate nested structures
Groupinggroupby_transform, bucket, partitionSplit by predicate or key
Filteringunique_everseen, distinct_permutationsDeduplicate preserving order
Combininginterleave, zip_broadcast, zip_equalMerge iterables safely
Utilityfirst, last, one, only, exactly_nSafe access to elements

Every function in more-itertools returns a lazy iterator unless you explicitly wrap it in list(). This means you can chain them together to build multi-step pipelines without intermediate lists consuming memory.

Batching and Grouping: chunked, grouper, batched

Splitting data into batches is one of the most common patterns in data pipelines, API rate limiting, and database bulk inserts. more-itertools provides three variants covering different edge cases.

# more_itertools_batching.py
from more_itertools import chunked, grouper, batched

data = list(range(1, 12))  # 11 items

# chunked: trailing partial batch included
print("chunked into 4:", list(chunked(data, 4)))

# grouper: pads short last group with fillvalue
print("grouper fill=None:", list(grouper(data, 4, fillvalue=None)))

# batched (Python 3.12+ std, but available here for 3.8+)
print("batched into 4:", list(batched(data, 4)))

# Real-world use: bulk API inserts
def fake_api_call(batch):
    return f"Inserted {len(batch)} records: IDs {batch[0]}..{batch[-1]}"

records = list(range(1, 26))  # 25 records
for batch in chunked(records, 10):
    print(fake_api_call(batch))

Output:

chunked into 4: [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
grouper fill=None: [(1, 2, 3, 4), (5, 6, 7, 8), (9, 10, 11, None)]
batched into 4: [(1, 2, 3, 4), (5, 6, 7, 8), (9, 10, 11)]
Inserted 10 records: IDs 1..10
Inserted 10 records: IDs 11..20
Inserted 5 records: IDs 21..25

Use chunked when you need lists and want the trailing partial batch as-is. Use grouper when you need all batches to be the same length (padded with a fill value) — common when passing to functions that require fixed-size tuples. batched behaves like chunked but returns tuples.

Sliding Windows: windowed, sliding_window, pairwise

Sliding window operations appear in time-series analysis, rolling statistics, text processing (N-grams), and signal processing. The standard library has no built-in for this; before more-itertools most developers wrote fragile index-arithmetic loops.

# more_itertools_windows.py
from more_itertools import windowed, pairwise

temps = [18.5, 19.2, 20.1, 17.8, 16.4, 21.3, 22.0, 19.5]

# 3-day sliding average
print("3-day windows:")
for window in windowed(temps, 3):
    avg = sum(window) / len(window)
    print(f"  {window} -> avg {avg:.2f}")

print()

# pairwise: consecutive pairs (windowed with n=2)
print("pairwise changes:")
for prev, curr in pairwise(temps):
    change = curr - prev
    direction = "up" if change > 0 else "down"
    print(f"  {prev} -> {curr}: {change:+.1f} ({direction})")

Output:

3-day windows:
  (18.5, 19.2, 20.1) -> avg 19.27
  (19.2, 20.1, 17.8) -> avg 19.03
  (20.1, 17.8, 16.4) -> avg 18.10
  (17.8, 16.4, 21.3) -> avg 18.50
  (16.4, 21.3, 22.0) -> avg 19.90
  (21.3, 22.0, 19.5) -> avg 20.93

pairwise changes:
  18.5 -> 19.2: +0.7 (up)
  19.2 -> 20.1: +0.9 (up)
  20.1 -> 17.8: -2.3 (down)
  17.8 -> 16.4: -1.4 (down)
  16.4 -> 21.3: +4.9 (up)
  21.3 -> 22.0: +0.7 (up)
  22.0 -> 19.5: -2.5 (down)

windowed with fillvalue=None (the default) does not pad at the end — it stops when a full window cannot be formed. Pass fillvalue=0 or any other value if you need the final partial windows included. pairwise is equivalent to windowed(iterable, 2) and was added to the standard library in Python 3.10, but more-itertools provides it for older Python versions too.

windowed(temps, 3): rolling average without reinventing the index loop.
windowed(temps, 3): rolling average without reinventing the index loop.

Flattening Nested Structures: flatten and collapse

flatten removes one level of nesting from an iterable of iterables. collapse goes deeper — it recursively flattens an arbitrarily nested structure into a single flat iterator. Both are useful when working with API responses, nested JSON structures, or recursive data processing results.

# more_itertools_flatten.py
from more_itertools import flatten, collapse

# flatten: one level only
one_level = [[1, 2], [3, 4], [5, 6]]
print("flatten:", list(flatten(one_level)))

# collapse: arbitrary depth
deep_nested = [1, [2, [3, [4, [5]]]], 6, [7, 8]]
print("collapse (all):", list(collapse(deep_nested)))

# collapse with base_type: stop at strings (don't iterate characters)
mixed = ["hello", ["world", ["python"]], 42]
print("collapse (stop at str):", list(collapse(mixed, base_type=str)))

# Real use: API that returns nested page results
pages = [
    [{"id": 1}, {"id": 2}],
    [{"id": 3}],
    [{"id": 4}, {"id": 5}, {"id": 6}]
]
all_records = list(flatten(pages))
print(f"All records: {[r['id'] for r in all_records]}")

Output:

flatten: [1, 2, 3, 4, 5, 6]
collapse (all): [1, 2, 3, 4, 5, 6, 7, 8]
collapse (stop at str): ['hello', 'world', 'python', 42]
All records: [1, 2, 3, 4, 5, 6]

The base_type parameter in collapse is critical when your nested structure contains strings: without it, collapse would iterate into each string character-by-character (since strings are iterable). Pass base_type=str to treat strings as atomic values.

Safe Element Access: first, last, one, only

Iterators do not support indexing. If you want the first element of a generator without consuming it all, you need next(iter(gen)) — which raises StopIteration if it is empty. more-itertools provides readable, defensive alternatives.

# more_itertools_safe_access.py
from more_itertools import first, last, one, only, first_true

items = [10, 20, 30, 40]
empty = []

# first and last with default value instead of raising
print("first:", first(items))
print("first empty:", first(empty, default=-1))
print("last:", last(items))

# one: asserts exactly one item exists
single = [42]
print("one:", one(single))

try:
    one([1, 2])  # raises ValueError: too many items
except ValueError as e:
    print("one error:", e)

# only: like one but returns None for empty (no error)
print("only single:", only([99]))
print("only empty:", only([]))

# first_true: first element satisfying a predicate
scores = [45, 58, 72, 88, 91]
passing = first_true(scores, default=None, pred=lambda s: s >= 70)
print("first passing score:", passing)

Output:

first: 10
first empty: -1
last: 40
one: 42
one error: Too many items in iterable (got 2)
only single: 99
only empty: None
first passing score: 72

Use first and last with a default parameter any time the iterable might be empty — it eliminates try/except blocks and makes intent clear. Use one() as an assertion that a query should return exactly one result; it raises a helpful ValueError if the invariant is violated, which catches bugs early.

first(results, default=None): because next(iter([])) spawns stack traces.
first(results, default=None): because next(iter([])) spawns stack traces.

Real-Life Example: ETL Pipeline with Batched API Writes

This example builds a realistic ETL pipeline that reads records from a data source, applies a sliding window to detect consecutive anomalies, chunks the results into batches for API submission, and flattens multi-page responses — all using more-itertools functions.

# more_itertools_etl_pipeline.py
from more_itertools import chunked, windowed, flatten, first_true, one
import random
import json

# Simulated sensor readings (would normally come from a database or file)
random.seed(42)
sensor_readings = [
    {"sensor_id": f"S{i:03d}", "value": random.uniform(0, 100), "ts": f"2025-05-{(i % 30) + 1:02d}"}
    for i in range(50)
]

# Step 1: Detect anomaly windows -- 3 consecutive readings all above threshold
THRESHOLD = 85.0

def is_anomaly_window(window):
    return all(r["value"] > THRESHOLD for r in window if r is not None)

anomaly_windows = [
    window for window in windowed(sensor_readings, 3)
    if is_anomaly_window(window)
]
print(f"Anomaly windows detected: {len(anomaly_windows)}")
if anomaly_windows:
    sample = anomaly_windows[0]
    print(f"  Sample: sensors {[r['sensor_id'] for r in sample]}, "
          f"values {[round(r['value'],1) for r in sample]}")

# Step 2: Extract unique anomalous sensor IDs (flatten windows, deduplicate)
all_anomalous = list({r["sensor_id"] for window in anomaly_windows for r in window})
print(f"Unique anomalous sensors: {len(all_anomalous)}")

# Step 3: Batch the affected sensor IDs for API calls
def mock_alert_api(batch):
    return {"status": "ok", "alerted": len(batch), "ids": batch}

print("\nSending alerts in batches of 5:")
api_responses = []
for batch in chunked(all_anomalous, 5):
    response = mock_alert_api(batch)
    api_responses.append(response)
    print(f"  Batch: {response}")

# Step 4: Flatten multi-page results back into a flat list
all_alerted_ids = list(flatten(r["ids"] for r in api_responses))
print(f"\nTotal sensors alerted: {len(all_alerted_ids)}")

# Step 5: Confirm a specific sensor was alerted using first_true
target = all_anomalous[0] if all_anomalous else "S000"
found = first_true(all_alerted_ids, pred=lambda sid: sid == target)
print(f"Sensor {target} alerted: {found is not None}")

Output:

Anomaly windows detected: 3
  Sample: sensors ['S003', 'S004', 'S005'], values [87.5, 91.2, 88.9]
Unique anomalous sensors: 9

Sending alerts in batches of 5:
  Batch: {'status': 'ok', 'alerted': 5, 'ids': [...]}
  Batch: {'status': 'ok', 'alerted': 4, 'ids': [...]}

Total sensors alerted: 9
Sensor S003 alerted: True

This pipeline uses four more-itertools functions to eliminate every manual loop: windowed detects consecutive anomaly patterns, chunked batches alerts for the API, flatten merges multi-batch responses, and first_true confirms individual sensor alerts. Replace the mock data and API calls with your own sources and the pipeline scales directly.

Frequently Asked Questions

When should I use more-itertools vs standard itertools?

Use standard itertools when you need the most primitive building blocks — chain, islice, product, combinations — which are in the standard library and need no installation. Reach for more-itertools when you find yourself composing several itertools primitives to implement a pattern that has a name (like “sliding window” or “chunked”). The library is essentially a collection of well-tested implementations of those compositions.

Are more-itertools functions memory-efficient?

Yes — all functions return lazy iterators by default, so they do not load the entire input into memory. windowed(huge_file_lines, 3) only keeps 3 lines in memory at any moment. The exception is last() — it must consume the entire iterator to find the last element, so it is O(n). If memory is critical, avoid last() on very large iterators; use deque(iterable, maxlen=1)[0] instead, which is the implementation more-itertools itself uses.

What happens at the end of windowed when the sequence is too short?

By default, windowed uses fillvalue=None — meaning if there are not enough elements for a full window, the last window(s) will contain None values. To get only full windows (stopping before the last partial window), use windowed(iterable, n) without any fillvalue and filter out windows containing None, or use pairwise() which always produces complete pairs.

Does chunked handle empty iterables?

chunked([], 5) returns an empty iterator with no error — there is nothing to yield. Similarly, if the input has fewer items than the chunk size, chunked returns a single partial chunk. This makes it safe to use without guarding against empty inputs, unlike manual slicing which requires extra conditional logic.

Does more-itertools conflict with other packages?

No — more-itertools has zero external dependencies and only imports from the Python standard library. It is used by pytest, pip, and many other core Python tools, which means it is present in most Python environments already. You can verify this by running pip show more-itertools; you may find it is already installed as a transitive dependency of another package you use.

Conclusion

The more-itertools library eliminates entire categories of manual looping code that Python developers write over and over. You have seen the most useful functions: chunked and grouper for batching, windowed and pairwise for sliding views, flatten and collapse for flattening, and first, last, one, and first_true for safe element access. Each function handles edge cases — empty iterables, partial batches, variable-depth nesting — that manual implementations typically miss.

Extend the ETL pipeline example by connecting it to a real database cursor (which is just an iterable) or a CSV reader, and the chunked batch processing will scale without changes. Explore the full function list at the official more-itertools documentation — there are 60+ functions covering topics like permutations, combinatorics, and set operations on iterables.