How To Use Python Generators and yield for Memory-Efficient Code

Last Updated: June 01, 2026

Table of Contents

Generators: Quick Example
What Are Generators and How Do They Work?
The Generator Protocol
Generator Expressions
Delegating with yield from
Real-Life Example: Log File Analyzer
Frequently Asked Questions
Conclusion
Related Articles

Intermediate

Imagine you need to process a log file with 10 million lines. The naive approach — reading the whole file into a list first — would use several gigabytes of memory before you even start processing. Python generators solve this elegantly: instead of building the entire sequence in memory, they produce values one at a time, on demand. This “lazy evaluation” approach lets you work with datasets of any size using a constant, tiny amount of memory.

Generators are a core Python feature that you’ll find everywhere in Python’s standard library — range(), zip(), enumerate(), and file iteration all use generator-like lazy evaluation. Understanding generators not only helps you write memory-efficient code, it also makes you a better Python developer because you’ll understand why these built-in tools work the way they do. No installation needed — generators are a built-in language feature.

In this tutorial, you’ll learn how to create generators with yield, understand how the generator protocol works under the hood, use generator expressions as compact alternatives to list comprehensions, delegate to sub-generators with yield from, build generator pipelines for data processing, and apply all of this in a practical log file analysis project.

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

Generators: Quick Example

Here’s a generator function next to its equivalent list-based function, demonstrating the memory difference:

# generators_quick.py
import sys

# List version: builds entire sequence in memory
def first_n_squares_list(n):
    return [i * i for i in range(n)]

# Generator version: produces values one at a time
def first_n_squares_gen(n):
    for i in range(n):
        yield i * i

# Compare memory usage
squares_list = first_n_squares_list(1000000)
squares_gen = first_n_squares_gen(1000000)

print(f"List size:      {sys.getsizeof(squares_list):,} bytes")
print(f"Generator size: {sys.getsizeof(squares_gen):,} bytes")

# Use the generator just like any iterable
total = sum(squares_gen)
print(f"Sum of first 1M squares: {total:,}")

Output:

List size:      8,448,728 bytes
Generator size: 104 bytes
Sum of first 1M squares: 333,332,833,333,500,000

The generator object itself is only 104 bytes regardless of how many values it will produce. The list consumed over 8 MB. Both can be iterated the same way — sum() works with any iterable. The key difference is when and how the values are created.

Using yield to pause execution — yield is just return with commitment issues.

What Are Generators and How Do They Work?

A generator function looks like a regular function, but uses yield instead of return. When you call a generator function, it doesn’t execute the function body — it returns a generator object. The body executes lazily, only when you ask for the next value.

Feature	Regular Function	Generator Function
Returns	A value immediately	A generator object immediately
Execution	Runs completely on call	Runs step by step, pausing at each `yield`
Memory	All values in memory at once	One value in memory at a time
Re-usable	Yes, call it again	No — exhausted after one iteration
Syntax	`return value`	`yield value`

The yield keyword does two things: it sends a value out of the generator, and it pauses execution at that point. The generator’s local variables and execution state are preserved between yields. When next() is called again, execution resumes from right after the yield statement.

The Generator Protocol

Generators implement Python’s iterator protocol: they have a __next__() method. You can call next() manually to see exactly how this works step by step:

# generator_protocol.py

def countdown(n):
    print(f"Starting countdown from {n}")
    while n > 0:
        print(f"  About to yield {n}")
        yield n
        print(f"  Resumed after yielding {n}")
        n -= 1
    print("Countdown complete!")

# Create the generator object (nothing runs yet)
gen = countdown(3)
print(f"Generator object: {gen}")
print()

# Manually advance the generator
val1 = next(gen)
print(f"Got: {val1}\n")

val2 = next(gen)
print(f"Got: {val2}\n")

val3 = next(gen)
print(f"Got: {val3}\n")

# One more next() raises StopIteration
try:
    next(gen)
except StopIteration:
    print("Generator exhausted -- StopIteration raised")

Output:

Generator object: <generator object countdown at 0x7f8b1c2d3a50>

Starting countdown from 3
  About to yield 3
Got: 3

  Resumed after yielding 3
  About to yield 2
Got: 2

  Resumed after yielding 2
  About to yield 1
Got: 1

  Resumed after yielding 1
Countdown complete!
Generator exhausted -- StopIteration raised

This output reveals the exact sequence: calling countdown(3) did nothing. The first next(gen) started execution, ran until the first yield 3, and paused. The second next(gen) resumed right after that yield. For loops handle StopIteration automatically — they call next() and stop when the exception is raised.

Generator pipelines for data processing — Chain generators and watch your data flow like water.

Generator Expressions

Generator expressions are the compact syntax for creating simple generators — they look exactly like list comprehensions but use parentheses instead of square brackets. They’re ideal for single-use transformations passed directly to functions:

# generator_expressions.py

# List comprehension: builds all values immediately
squares_list = [x*x for x in range(10)]
print(f"List: {squares_list}")

# Generator expression: lazy, no brackets
squares_gen = (x*x for x in range(10))
print(f"Generator: {squares_gen}")
print(f"Sum via generator: {sum(squares_gen)}")

# Use generator expressions inline -- no extra parentheses needed
total = sum(x*x for x in range(1000000))
print(f"Sum of 1M squares: {total:,}")

# Filter with generator expressions
big_squares = (x*x for x in range(100) if x*x > 500)
print(f"Squares > 500: {list(big_squares)[:5]}...")

# Chained transformations (memory-efficient pipeline)
data = range(1, 1000001)
even_nums = (x for x in data if x % 2 == 0)
squared = (x*x for x in even_nums)
under_million = (x for x in squared if x < 1_000_000)
result = list(under_million)
print(f"Even squares under 1M: {len(result)} values")

Output:

List: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Generator: <generator object <genexpr> at 0x7f8b1c2d3b60>
Sum via generator: 285
Sum of 1M squares: 333,332,833,333,500,000
Squares > 500: [529, 576, 625, 676, 729]...
Even squares under 1M: 499 values

Delegating with yield from

yield from lets a generator delegate to another iterable -- it's cleaner than looping and yielding each item manually. This is especially useful when building recursive generators or combining multiple generators:

# yield_from.py

# Without yield from -- verbose
def flatten_manual(nested):
    for sublist in nested:
        for item in sublist:
            yield item

# With yield from -- cleaner
def flatten(nested):
    for sublist in nested:
        yield from sublist

data = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
print("Flattened:", list(flatten(data)))

# Chain multiple generators with yield from
def chain_generators(*iterables):
    for it in iterables:
        yield from it

gen1 = (x for x in range(3))
gen2 = (x*10 for x in range(3))
gen3 = ['a', 'b', 'c']

chained = list(chain_generators(gen1, gen2, gen3))
print("Chained:", chained)

Output:

Flattened: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Chained: [0, 1, 2, 0, 10, 20, 'a', 'b', 'c']

Real-Life Example: Log File Analyzer

Here's a practical generator pipeline that processes a large log file without loading it all into memory at once. This pattern handles files of any size efficiently:

# log_analyzer.py
import re
from datetime import datetime

# Sample log data (representing a large file in real use)
SAMPLE_LOG = """2026-04-17 09:00:01 INFO  User alice logged in
2026-04-17 09:01:15 ERROR Database connection timeout
2026-04-17 09:01:16 ERROR Retrying connection (attempt 1/3)
2026-04-17 09:01:17 INFO  Database reconnected successfully
2026-04-17 09:02:33 WARNING High memory usage: 87%
2026-04-17 09:03:45 INFO  User bob logged in
2026-04-17 09:04:12 ERROR Disk write failed: /var/log/app.log
2026-04-17 09:05:00 INFO  Backup completed successfully
2026-04-17 09:06:22 ERROR Authentication failed for user charlie
2026-04-17 09:07:11 INFO  Scheduled job completed in 1.23s"""

# Generator: read lines one at a time (use open() for real files)
def read_lines(text):
    for line in text.strip().splitlines():
        yield line

# Generator: parse each line into a structured dict
LOG_PATTERN = re.compile(
    r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+)\s+(.+)'
)

def parse_logs(lines):
    for line in lines:
        match = LOG_PATTERN.match(line)
        if match:
            timestamp_str, level, message = match.groups()
            yield {
                'timestamp': datetime.strptime(timestamp_str, '%Y-%m-%d %H:%M:%S'),
                'level': level,
                'message': message
            }

# Generator: filter to only error entries
def filter_level(entries, level):
    for entry in entries:
        if entry['level'] == level:
            yield entry

# Build the pipeline (nothing runs yet)
lines = read_lines(SAMPLE_LOG)
parsed = parse_logs(lines)
errors = filter_level(parsed, 'ERROR')

# Consume the pipeline -- data flows through all stages
print("ERROR log entries:")
for entry in errors:
    time_str = entry['timestamp'].strftime('%H:%M:%S')
    print(f"  [{time_str}] {entry['message']}")

# Rebuild and get counts (generator is exhausted after one pass)
lines = read_lines(SAMPLE_LOG)
all_entries = list(parse_logs(lines))
counts = {}
for entry in all_entries:
    counts[entry['level']] = counts.get(entry['level'], 0) + 1

print("\nLog level summary:")
for level, count in sorted(counts.items()):
    print(f"  {level}: {count}")

Output:

ERROR log entries:
  [09:01:15] Database connection timeout
  [09:01:16] Retrying connection (attempt 1/3)
  [09:04:12] Disk write failed: /var/log/app.log
  [09:06:22] Authentication failed for user charlie

Log level summary:
  ERROR: 4
  INFO: 5
  WARNING: 1

This pipeline pattern is memory-efficient because at any moment, only one log line exists in memory as it flows through read_lines -> parse_logs -> filter_level. For a 10 GB log file, this approach uses the same tiny amount of memory as it does for a 10-line file. To use this with a real log file, replace read_lines(SAMPLE_LOG) with open('app.log', 'r') -- file objects are themselves iterable generators in Python.

Memory-efficient iteration with generators — Why load a million rows when you can yield one at a time?

Frequently Asked Questions

Can I iterate a generator multiple times?

No -- generators are single-use. Once exhausted, calling next() always raises StopIteration. If you need to iterate multiple times, either store the values in a list first (items = list(my_gen())), or call the generator function again to create a fresh generator object. This is why generator expressions are usually passed directly to functions like sum() or list() that consume them in one pass.

What is the difference between a generator and an iterator?

Every generator is an iterator, but not every iterator is a generator. An iterator is any object with a __next__() and __iter__() method. A generator is a specific way to create an iterator using a function with yield -- Python automatically implements the iterator protocol for you. Generators are the most convenient way to create iterators in Python.

What does generator.send() do?

The .send(value) method resumes a generator and sends a value in as the result of the current yield expression. This turns generators into coroutines -- two-way communication channels. It's used in advanced patterns like cooperative multitasking. For most use cases, you'll never need .send() -- standard next() iteration is sufficient.

When should I use a generator vs. a list?

Use a generator when: you're processing a large sequence and don't need all values in memory at once, you're reading from a file or network stream, you only need to iterate once, or you're building a pipeline of transformations. Use a list when: you need to index into the sequence, iterate multiple times, get its length with len(), or pass it to code that explicitly expects a list.

Can generators produce infinite sequences?

Yes -- this is one of the most powerful uses of generators. A generator can loop indefinitely, yielding values forever, because it never builds a finite collection in memory. The standard library's itertools.count() and itertools.cycle() are examples of infinite generators. Just make sure your consuming code has a termination condition (like islice or a loop with a break) to stop pulling values.

Conclusion

Generators are one of Python's most elegant features. In this tutorial, you learned how yield turns a function into a lazy generator, how the generator protocol works with next() and StopIteration, how generator expressions provide compact syntax for simple generators, how yield from delegates to sub-iterables, and how to compose generators into data processing pipelines.

The log analyzer project shows the real-world payoff: a memory-efficient pipeline the scales to gigabyte files with no changes. Try extending it to count errors per hour, find the longest gap between errors, or write the filtered entries to a new file.

For more on generators and iteration in Python, see the Python HOWTO: Generators and the itertools documentation for powerful generator-based utilities.

Continue Learning Python

Tutorials you might also find useful:

Post Views: 70

How To Use Python Generators and yield for Memory-Efficient Code

Generators: Quick Example

What Are Generators and How Do They Work?

The Generator Protocol

Generator Expressions

Delegating with yield from

Real-Life Example: Log File Analyzer

Frequently Asked Questions

Can I iterate a generator multiple times?

What is the difference between a generator and an iterator?

What does generator.send() do?

When should I use a generator vs. a list?

Can generators produce infinite sequences?

Conclusion

Related Articles

Continue Learning Python

Submit a Comment Cancel reply