Intermediate
Imagine you need to process a log file with 10 million lines. The naive approach — reading the whole file into a list first — would use several gigabytes of memory before you even start processing. Python generators solve this elegantly: instead of building the entire sequence in memory, they produce values one at a time, on demand. This “lazy evaluation” approach lets you work with datasets of any size using a constant, tiny amount of memory.
Generators are a core Python feature that you’ll find everywhere in Python’s standard library — range(), zip(), enumerate(), and file iteration all use generator-like lazy evaluation. Understanding generators not only helps you write memory-efficient code, it also makes you a better Python developer because you’ll understand why these built-in tools work the way they do. No installation needed — generators are a built-in language feature.
In this tutorial, you’ll learn how to create generators with yield, understand how the generator protocol works under the hood, use generator expressions as compact alternatives to list comprehensions, delegate to sub-generators with yield from, build generator pipelines for data processing, and apply all of this in a practical log file analysis project.
Generators: Quick Example
Here’s a generator function next to its equivalent list-based function, demonstrating the memory difference:
# generators_quick.py
import sys
# List version: builds entire sequence in memory
def first_n_squares_list(n):
return [i * i for i in range(n)]
# Generator version: produces values one at a time
def first_n_squares_gen(n):
for i in range(n):
yield i * i
# Compare memory usage
squares_list = first_n_squares_list(1000000)
squares_gen = first_n_squares_gen(1000000)
print(f"List size: {sys.getsizeof(squares_list):,} bytes")
print(f"Generator size: {sys.getsizeof(squares_gen):,} bytes")
# Use the generator just like any iterable
total = sum(squares_gen)
print(f"Sum of first 1M squares: {total:,}")
Output:
List size: 8,448,728 bytes
Generator size: 104 bytes
Sum of first 1M squares: 333,332,833,333,500,000
The generator object itself is only 104 bytes regardless of how many values it will produce. The list consumed over 8 MB. Both can be iterated the same way — sum() works with any iterable. The key difference is when and how the values are created.
What Are Generators and How Do They Work?
A generator function looks like a regular function, but uses yield instead of return. When you call a generator function, it doesn’t execute the function body — it returns a generator object. The body executes lazily, only when you ask for the next value.
| Feature | Regular Function | Generator Function |
|---|---|---|
| Returns | A value immediately | A generator object immediately |
| Execution | Runs completely on call | Runs step by step, pausing at each yield |
| Memory | All values in memory at once | One value in memory at a time |
| Re-usable | Yes, call it again | No — exhausted after one iteration |
| Syntax | return value | yield value |
The yield keyword does two things: it sends a value out of the generator, and it pauses execution at that point. The generator’s local variables and execution state are preserved between yields. When next() is called again, execution resumes from right after the yield statement.
The Generator Protocol
Generators implement Python’s iterator protocol: they have a __next__() method. You can call next() manually to see exactly how this works step by step:
# generator_protocol.py
def countdown(n):
print(f"Starting countdown from {n}")
while n > 0:
print(f" About to yield {n}")
yield n
print(f" Resumed after yielding {n}")
n -= 1
print("Countdown complete!")
# Create the generator object (nothing runs yet)
gen = countdown(3)
print(f"Generator object: {gen}")
print()
# Manually advance the generator
val1 = next(gen)
print(f"Got: {val1}\n")
val2 = next(gen)
print(f"Got: {val2}\n")
val3 = next(gen)
print(f"Got: {val3}\n")
# One more next() raises StopIteration
try:
next(gen)
except StopIteration:
print("Generator exhausted -- StopIteration raised")
Output:
Generator object: <generator object countdown at 0x7f8b1c2d3a50>
Starting countdown from 3
About to yield 3
Got: 3
Resumed after yielding 3
About to yield 2
Got: 2
Resumed after yielding 2
About to yield 1
Got: 1
Resumed after yielding 1
Countdown complete!
Generator exhausted -- StopIteration raised
This output reveals the exact sequence: calling countdown(3) did nothing. The first next(gen) started execution, ran until the first yield 3, and paused. The second next(gen) resumed right after that yield. For loops handle StopIteration automatically — they call next() and stop when the exception is raised.
Generator Expressions
Generator expressions are the compact syntax for creating simple generators — they look exactly like list comprehensions but use parentheses instead of square brackets. They’re ideal for single-use transformations passed directly to functions:
# generator_expressions.py
# List comprehension: builds all values immediately
squares_list = [x*x for x in range(10)]
print(f"List: {squares_list}")
# Generator expression: lazy, no brackets
squares_gen = (x*x for x in range(10))
print(f"Generator: {squares_gen}")
print(f"Sum via generator: {sum(squares_gen)}")
# Use generator expressions inline -- no extra parentheses needed
total = sum(x*x for x in range(1000000))
print(f"Sum of 1M squares: {total:,}")
# Filter with generator expressions
big_squares = (x*x for x in range(100) if x*x > 500)
print(f"Squares > 500: {list(big_squares)[:5]}...")
# Chained transformations (memory-efficient pipeline)
data = range(1, 1000001)
even_nums = (x for x in data if x % 2 == 0)
squared = (x*x for x in even_nums)
under_million = (x for x in squared if x < 1_000_000)
result = list(under_million)
print(f"Even squares under 1M: {len(result)} values")
Output:
List: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Generator: <generator object <genexpr> at 0x7f8b1c2d3b60>
Sum via generator: 285
Sum of 1M squares: 333,332,833,333,500,000
Squares > 500: [529, 576, 625, 676, 729]...
Even squares under 1M: 499 values
Delegating with yield from
yield from lets a generator delegate to another iterable -- it's cleaner than looping and yielding each item manually. This is especially useful when building recursive generators or combining multiple generators:
# yield_from.py
# Without yield from -- verbose
def flatten_manual(nested):
for sublist in nested:
for item in sublist:
yield item
# With yield from -- cleaner
def flatten(nested):
for sublist in nested:
yield from sublist
data = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
print("Flattened:", list(flatten(data)))
# Chain multiple generators with yield from
def chain_generators(*iterables):
for it in iterables:
yield from it
gen1 = (x for x in range(3))
gen2 = (x*10 for x in range(3))
gen3 = ['a', 'b', 'c']
chained = list(chain_generators(gen1, gen2, gen3))
print("Chained:", chained)
Output:
Flattened: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Chained: [0, 1, 2, 0, 10, 20, 'a', 'b', 'c']
Real-Life Example: Log File Analyzer
Here's a practical generator pipeline that processes a large log file without loading it all into memory at once. This pattern handles files of any size efficiently:
# log_analyzer.py
import re
from datetime import datetime
# Sample log data (representing a large file in real use)
SAMPLE_LOG = """2026-04-17 09:00:01 INFO User alice logged in
2026-04-17 09:01:15 ERROR Database connection timeout
2026-04-17 09:01:16 ERROR Retrying connection (attempt 1/3)
2026-04-17 09:01:17 INFO Database reconnected successfully
2026-04-17 09:02:33 WARNING High memory usage: 87%
2026-04-17 09:03:45 INFO User bob logged in
2026-04-17 09:04:12 ERROR Disk write failed: /var/log/app.log
2026-04-17 09:05:00 INFO Backup completed successfully
2026-04-17 09:06:22 ERROR Authentication failed for user charlie
2026-04-17 09:07:11 INFO Scheduled job completed in 1.23s"""
# Generator: read lines one at a time (use open() for real files)
def read_lines(text):
for line in text.strip().splitlines():
yield line
# Generator: parse each line into a structured dict
LOG_PATTERN = re.compile(
r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+)\s+(.+)'
)
def parse_logs(lines):
for line in lines:
match = LOG_PATTERN.match(line)
if match:
timestamp_str, level, message = match.groups()
yield {
'timestamp': datetime.strptime(timestamp_str, '%Y-%m-%d %H:%M:%S'),
'level': level,
'message': message
}
# Generator: filter to only error entries
def filter_level(entries, level):
for entry in entries:
if entry['level'] == level:
yield entry
# Build the pipeline (nothing runs yet)
lines = read_lines(SAMPLE_LOG)
parsed = parse_logs(lines)
errors = filter_level(parsed, 'ERROR')
# Consume the pipeline -- data flows through all stages
print("ERROR log entries:")
for entry in errors:
time_str = entry['timestamp'].strftime('%H:%M:%S')
print(f" [{time_str}] {entry['message']}")
# Rebuild and get counts (generator is exhausted after one pass)
lines = read_lines(SAMPLE_LOG)
all_entries = list(parse_logs(lines))
counts = {}
for entry in all_entries:
counts[entry['level']] = counts.get(entry['level'], 0) + 1
print("\nLog level summary:")
for level, count in sorted(counts.items()):
print(f" {level}: {count}")
Output:
ERROR log entries:
[09:01:15] Database connection timeout
[09:01:16] Retrying connection (attempt 1/3)
[09:04:12] Disk write failed: /var/log/app.log
[09:06:22] Authentication failed for user charlie
Log level summary:
ERROR: 4
INFO: 5
WARNING: 1
This pipeline pattern is memory-efficient because at any moment, only one log line exists in memory as it flows through read_lines -> parse_logs -> filter_level. For a 10 GB log file, this approach uses the same tiny amount of memory as it does for a 10-line file. To use this with a real log file, replace read_lines(SAMPLE_LOG) with open('app.log', 'r') -- file objects are themselves iterable generators in Python.
Frequently Asked Questions
Can I iterate a generator multiple times?
No -- generators are single-use. Once exhausted, calling next() always raises StopIteration. If you need to iterate multiple times, either store the values in a list first (items = list(my_gen())), or call the generator function again to create a fresh generator object. This is why generator expressions are usually passed directly to functions like sum() or list() that consume them in one pass.
What is the difference between a generator and an iterator?
Every generator is an iterator, but not every iterator is a generator. An iterator is any object with a __next__() and __iter__() method. A generator is a specific way to create an iterator using a function with yield -- Python automatically implements the iterator protocol for you. Generators are the most convenient way to create iterators in Python.
What does generator.send() do?
The .send(value) method resumes a generator and sends a value in as the result of the current yield expression. This turns generators into coroutines -- two-way communication channels. It's used in advanced patterns like cooperative multitasking. For most use cases, you'll never need .send() -- standard next() iteration is sufficient.
When should I use a generator vs. a list?
Use a generator when: you're processing a large sequence and don't need all values in memory at once, you're reading from a file or network stream, you only need to iterate once, or you're building a pipeline of transformations. Use a list when: you need to index into the sequence, iterate multiple times, get its length with len(), or pass it to code that explicitly expects a list.
Can generators produce infinite sequences?
Yes -- this is one of the most powerful uses of generators. A generator can loop indefinitely, yielding values forever, because it never builds a finite collection in memory. The standard library's itertools.count() and itertools.cycle() are examples of infinite generators. Just make sure your consuming code has a termination condition (like islice or a loop with a break) to stop pulling values.
Conclusion
Generators are one of Python's most elegant features. In this tutorial, you learned how yield turns a function into a lazy generator, how the generator protocol works with next() and StopIteration, how generator expressions provide compact syntax for simple generators, how yield from delegates to sub-iterables, and how to compose generators into data processing pipelines.
The log analyzer project shows the real-world payoff: a memory-efficient pipeline the scales to gigabyte files with no changes. Try extending it to count errors per hour, find the longest gap between errors, or write the filtered entries to a new file.
For more on generators and iteration in Python, see the Python HOWTO: Generators and the itertools documentation for powerful generator-based utilities.