Intermediate
You need to process a list in batches of 50. You need a sliding window over a time series. You need to group consecutive items by a category without sorting the entire list first. The standard library’s itertools module gets you partway there, but you end up writing the same 5-line helper function in every project: the one that chunks a list into fixed-size groups. That function already exists, is battle-tested, and has 100+ siblings in a library called more-itertools.
more-itertools is a pure-Python library that extends the standard itertools with practical, production-ready iteration utilities. It is the most downloaded iteration utility library in the Python ecosystem with tens of millions of downloads per month, and it ships with functions that cover the patterns you repeatedly hand-roll: chunked batching, sliding windows, grouping consecutive items, interleaving sequences, and more. Everything returns lazy iterators, so it handles large sequences without loading them into memory.
In this article, we will cover installing more-itertools, the most useful functions by category, how they compare to hand-written alternatives, and a real-world data pipeline that ties them together. By the end, you will recognize at least five patterns in your current codebase that more-itertools can replace with a single, well-named function call.
more-itertools: Quick Example
The most common use case: splitting a list into fixed-size chunks for batch processing.
# quick_example.py
from more_itertools import chunked
data = list(range(1, 22)) # 21 items
for batch in chunked(data, 5):
print(batch)
Output:
[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]
[11, 12, 13, 14, 15]
[16, 17, 18, 19, 20]
[21]
chunked(iterable, n) splits any iterable into lists of at most n items. The final batch contains whatever remains — it is not padded. Compare this to writing [data[i:i+5] for i in range(0, len(data), 5)], which only works on sequences with a length; chunked works on any iterable including generators and file objects.
Read on for windowed slices, grouping utilities, interleaving, and a full data pipeline example combining several of these tools.
What Is more-itertools and Why Use It?
more-itertools is a community library maintained as a complement to Python’s built-in itertools module. Where itertools provides the low-level combinatorial primitives (chain, product, combinations), more-itertools focuses on practical, higher-level patterns that appear constantly in real data processing code. Every function returns a lazy iterator compatible with the rest of the itertools ecosystem.
| Function | What it does | stdlib alternative |
|---|---|---|
chunked(it, n) | Split into n-sized lists | Manual slice loop |
windowed(it, n) | Sliding window of size n | Manual deque loop |
grouper(it, n) | Fixed-size tuples, pads last | zip_longest + repeat |
run_length.encode(it) | RLE compression of runs | groupby + len |
flatten(it) | One level of nesting removed | chain.from_iterable |
interleave(*its) | Interleave multiple iterables | zip + chain |
peekable(it) | Look ahead without consuming | No clean stdlib option |
first(it) | First item or default | next() + StopIteration |
The main benefit over writing these yourself is correctness: more-itertools handles edge cases (empty iterables, short final chunks, single-item sequences) that hand-rolled versions often miss. Install once, import by name, never write another chunk-splitter.
Installation
# install.sh
pip install more-itertools
python -c "import more_itertools; print(more_itertools.__version__)"
Output:
10.x.x
No dependencies beyond the Python standard library. The module is imported as more_itertools (underscore, not hyphen). Most production projects import specific functions rather than the whole module to keep imports explicit.
Sliding Windows with windowed()
A sliding window moves one step at a time over a sequence, returning overlapping tuples. This is essential for time series analysis, moving averages, and sequence pattern matching:
# windowed_example.py
from more_itertools import windowed
temperatures = [22, 24, 19, 21, 25, 23, 20, 22]
print("3-day sliding window:")
for window in windowed(temperatures, 3):
avg = sum(window) / len(window)
print(f" {window} -> avg {avg:.1f}")
Output:
3-day sliding window:
(22, 24, 19) -> avg 21.7
(24, 19, 21) -> avg 21.3
(19, 21, 25) -> avg 21.7
(21, 25, 23) -> avg 23.0
(25, 23, 20) -> avg 22.7
(23, 20, 22) -> avg 21.7
The equivalent using only the standard library requires a collections.deque and manual appending — roughly 8 lines. windowed(it, n) reduces this to one line. By default, if the iterable is shorter than the window size, windowed pads missing values with None; pass fillvalue=0 to use a different pad value, or use windowed_complete to skip incomplete windows entirely.
Grouping Consecutive Items
Two functions handle consecutive grouping: run_length.encode() compresses repeated values, and consecutive_groups() finds runs of consecutive integers. Both operate lazily:
# grouping_example.py
from more_itertools import run_length, consecutive_groups
# Run-length encoding: compress repeated values
signals = ['A', 'A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C']
encoded = list(run_length.encode(signals))
print("RLE encoded:", encoded)
decoded = list(run_length.decode(encoded))
print("RLE decoded:", decoded)
print()
# Consecutive groups: group sequential integers
page_numbers = [1, 2, 3, 7, 8, 9, 10, 15, 16]
print("Consecutive page ranges:")
for group in consecutive_groups(page_numbers):
pages = list(group)
if len(pages) == 1:
print(f" Page {pages[0]}")
else:
print(f" Pages {pages[0]}-{pages[-1]}")
Output:
RLE encoded: [('A', 3), ('B', 2), ('A', 1), ('C', 4)]
RLE decoded: ['A', 'A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C']
Consecutive page ranges:
Pages 1-3
Pages 7-10
Page 15
Pages 15-16
run_length.encode() returns (value, count) pairs, while run_length.decode() reverses the process. The consecutive_groups() function detects gaps in integer sequences — useful for summarizing page ranges, time slot gaps, or any integer-indexed data where runs matter.
Lookahead with peekable()
Sometimes you need to inspect the next item in an iterator without consuming it — a common pattern in parsers and state machines. peekable wraps any iterator and adds a .peek() method:
# peekable_example.py
from more_itertools import peekable
def process_stream(items):
it = peekable(items)
results = []
while it:
current = next(it)
# Peek at next item without consuming it
try:
upcoming = it.peek()
if upcoming > current:
results.append(f"{current} (next is higher: {upcoming})")
else:
results.append(f"{current} (next is same/lower: {upcoming})")
except StopIteration:
results.append(f"{current} (last item)")
return results
output = process_stream([3, 5, 5, 2, 8, 1])
for line in output:
print(line)
Output:
3 (next is higher: 5)
5 (next is same/lower: 5)
5 (next is higher: 2)
2 (next is higher: 8)
8 (next is same/lower: 1)
1 (last item)
peekable also supports prepending values back with it.prepend(value), which is useful when you have consumed an item and need to “unread” it. The object is truthy when items remain, so while it: works as a clean loop termination condition. Without peekable, implementing lookahead requires maintaining a separate next_item variable and careful StopIteration handling.
Essential Utility Functions
Here are several more functions that cover common one-off patterns:
# utilities_example.py
from more_itertools import (
flatten, interleave, first, last, one,
unique_everseen, only
)
# flatten: remove one level of nesting
nested = [[1, 2], [3, 4], [5]]
print("flatten:", list(flatten(nested)))
# interleave: merge multiple iterables element by element
evens = [2, 4, 6]
odds = [1, 3, 5]
print("interleave:", list(interleave(evens, odds)))
# first / last: safe access with defaults
items = [10, 20, 30]
print("first:", first(items, default=0))
print("last:", last(items, default=0))
print("first of empty:", first([], default=-1))
# unique_everseen: deduplicate while preserving order
dupes = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]
print("unique:", list(unique_everseen(dupes)))
# one: assert exactly one item matches a predicate
numbers = [42]
print("one:", one(numbers)) # Returns 42; raises ValueError if 0 or 2+ items
# only: return the single item from a one-element iterable
print("only:", only([99], default=None))
Output:
flatten: [1, 2, 3, 4, 5]
interleave: [2, 1, 4, 3, 6, 5]
first: 10
last: 30
first of empty: -1
unique: [3, 1, 4, 5, 9, 2, 6]
one: 42
only: 99
unique_everseen is particularly useful as an order-preserving deduplication — it keeps the first occurrence of each value, unlike set() which destroys ordering. first and last cleanly handle empty iterables with a default parameter, avoiding the try/next(iter(...))/except StopIteration pattern. one() is a semantic assertion: “I expect exactly one item here” — it raises a clear error if that invariant is violated.
Real-Life Example: Batch API Processor with Progress Tracking
Here is a practical pipeline that uses chunked, windowed, and peekable together to batch-process API requests with rate limiting and look-ahead logging:
# batch_api_processor.py
import time
from more_itertools import chunked, peekable
import urllib.request
import json
def fetch_post(post_id: int) -> dict:
"""Fetch a single post from a test API."""
url = f"https://jsonplaceholder.typicode.com/posts/{post_id}"
with urllib.request.urlopen(url, timeout=5) as resp:
return json.loads(resp.read())
def process_posts_in_batches(post_ids: list[int], batch_size: int = 5):
"""
Fetch posts in batches with rate limiting.
Uses chunked() to group IDs, peekable() to detect the last batch.
"""
batches = peekable(chunked(post_ids, batch_size))
batch_num = 0
while batches:
batch_ids = next(batches)
batch_num += 1
is_last = not batches # peekable: truthy if more items remain
print(f"\nBatch {batch_num}: fetching IDs {batch_ids}")
results = []
for post_id in batch_ids:
post = fetch_post(post_id)
results.append({
"id": post["id"],
"title": post["title"][:40] + "..." if len(post["title"]) > 40 else post["title"],
"user": post["userId"]
})
for r in results:
print(f" [{r['id']}] user={r['user']}: {r['title']}")
if not is_last:
print(" Rate limiting: sleeping 0.5s...")
time.sleep(0.5)
else:
print(" Last batch -- no sleep needed.")
return batch_num
if __name__ == "__main__":
ids_to_fetch = list(range(1, 12)) # 11 posts -> 3 batches of 5, 5, 1
total_batches = process_posts_in_batches(ids_to_fetch, batch_size=5)
print(f"\nDone. Processed {total_batches} batches.")
Output (truncated for brevity):
Batch 1: fetching IDs [1, 2, 3, 4, 5]
[1] user=1: sunt aut facere repellat provident oc...
[2] user=1: qui est esse...
...
Rate limiting: sleeping 0.5s...
Batch 2: fetching IDs [6, 7, 8, 9, 10]
[6] user=1: dolorem eum magni eos aperiam quia...
...
Rate limiting: sleeping 0.5s...
Batch 3: fetching IDs [11]
[11] user=1: et ea vero quia laudantium aute...
Last batch -- no sleep needed.
Done. Processed 3 batches.
The combination of chunked and peekable enables a clean pattern: split work into batches of the right size, then use look-ahead to detect the last batch and skip the unnecessary trailing sleep. The peekable wrapper’s truthiness check (not batches) replaces the awkward “am I on the last iteration?” tracking that normally requires a counter comparison. Swap jsonplaceholder.typicode.com for any real API and adjust the batch size and sleep time for that API’s rate limits.
Frequently Asked Questions
How does more-itertools relate to the stdlib itertools module?
more-itertools is a pure addition, not a replacement. It imports and re-exports everything from itertools, so you can use from more_itertools import chain, chunked to access both standard and extended functions from one import. The library started as a collection of recipes from the official itertools documentation that were commonly hand-written but not included in the standard library, and has grown to 100+ functions over the years.
When should I use chunked() vs grouper()?
Use chunked(it, n) when you want the final partial chunk as a shorter list — the last group might have fewer than n items. Use grouper(it, n, fillvalue=None) when you need all groups to be exactly the same length — the final group is padded with the fill value. For batch database inserts where a short final batch is fine, use chunked. For fixed-width record processing where every row must be the same length, use grouper.
Does more-itertools load everything into memory?
No — like the standard itertools library, most functions return lazy iterators. chunked yields one list at a time; windowed yields one tuple at a time; none of them buffer the entire input. The exception is functions that inherently require buffering, like unique_everseen (which maintains a set of seen values) and last (which must consume the entire iterable). For truly large iterables, stick to the lazy functions and avoid the buffering ones.
Is more-itertools slow compared to hand-written code?
more-itertools is pure Python with no C extensions, so it is not faster than equivalent optimized code. For hot loops processing millions of items, the overhead of Python function calls in the iterator protocol can matter. In those cases, NumPy vectorized operations or Polars expressions will outperform any pure-Python iteration. But for typical batch sizes (hundreds to thousands of items), more-itertools performance is indistinguishable from hand-written loops, and the readability benefit is significant.
Which functions should I learn first?
Start with the ones you hand-write most often: chunked for batching, windowed for sliding analysis, flatten for one level of nesting, first and last for safe endpoint access, and unique_everseen for order-preserving deduplication. Then browse the docs for peekable, one, and consecutive_groups — these cover patterns that are tricky to get right with the standard library and that come up more often than you expect once you know they exist.
Conclusion
In this article, we covered more-itertools from installation through practical use: chunked for batch splitting, windowed for sliding windows, run_length.encode and consecutive_groups for grouping, peekable for look-ahead iteration, and a set of utility functions including flatten, unique_everseen, first, and one. The real-life example combined chunked and peekable in a rate-limited batch API processor.
The consistent theme is that more-itertools names patterns you already use. Once you know chunked exists, you stop writing the slice loop. Once you know peekable exists, you stop maintaining a next_item variable by hand. The library’s value is not raw performance — it is replacing ad-hoc implementations with functions that handle edge cases correctly and that any Python developer can read and understand immediately.
Browse the full function list in the official more-itertools documentation — there are 100+ functions, and a few of them will solve a problem you are currently solving the hard way.