How To Use Python dis Module to Inspect Bytecode

Last Updated: June 01, 2026

Table of Contents

Python dis Module: Quick Example
What Is CPython Bytecode?
Reading the dis Output
Inspecting Code Objects
Comparing Implementations
Other dis Module Tools
Real-Life Example: Bytecode-Based Performance Audit
Frequently Asked Questions
Conclusion
Related Articles

Advanced

You’ve profiled your Python code and a particular function keeps showing up in the hot path. You’ve tried rewriting it a few different ways and timeit says one version is 30% faster — but you’re not sure why. Or you’re curious what Python is actually doing when you write a list comprehension versus a for loop, or amat the difference is between x += 1 and x = x + 1 at the bytecode level. The dis module (short for “disassemble”) lets you answer these questions by showing you the CPython bytecode that any Python function compiles to.

The dis module is part of Python’s standard library and works with any function, method, class, module, or code string. It doesn’t modify your code — it just translates the compiled .pyc bytecode back into a human-readable instruction listing. You don’t need to be a CPython internals expert to use it; the instruction names are descriptive enough that you can reason about them with basic Python knowledge.

In this tutorial, you’ll learn how to disassemble functions with dis.dis(), read and interpret bytecode output, inspect code objects with __code__ attributes, compare implementations by their bytecode, understand how closures and global lookups work at the bytecode level, and use dis as a performance diagnosis tool. By the end, you’ll have a concrete mental model of what CPython does with your code.

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

Python dis Module: Quick Example

Here’s a basic disassembly of a simple function to show what the output looks like:

# dis_quick.py
import dis

def add_numbers(a, b):
    result = a + b
    return result

dis.dis(add_numbers)

Output:

  2           0 RESUME                   0

  3           2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (b)
              6 BINARY_OP                0 (+)
             10 STORE_FAST               2 (result)

  4          12 LOAD_FAST                2 (result)
             14 RETURN_VALUE

Each row is one bytecode instruction. The columns are: source line number (left), byte offset in the code object, instruction name (opcode), argument, and the argument’s human-readable value in parentheses. LOAD_FAST pushes a local variable onto the evaluation stack. BINARY_OP pops two values, applies the operator (0 = +), and pushes the result. STORE_FAST pops the top of the stack into a local variable. RETURN_VALUE returns the top of the stack to the caller.

What Is CPython Bytecode?

When CPython compiles your Python source code, it produces bytecode — a sequence of fixed-width instructions for the CPython virtual machine (VM). The VM then interprets these instructions in a loop, maintaining an evaluation stack where operands are pushed and popped. Bytecode is stored in .pyc files and in the __code__ attribute of function objects.

Concept	What It Is	Example
Opcode	A single VM instruction	`LOAD_FAST`, `CALL`
Offset	Byte position of instruction in code object	0, 2, 4…
Argument	Integer parameter to the opcode	Variable index, constant index
Stack	LIFO buffer for operands and results	LOAD pushes, STORE/CALL pops
Code object	Compiled bytecode + metadata for a function	`func.__code__`
co_varnames	Tuple of local variable names	`('a', 'b', 'result')`

Understanding the stack model is the key to reading bytecode. Every LOAD instruction pushes a value; every STORE, CALL, or binary operator pops one or more values. The stack depth at any point tells you how many “in-flight” values exist. CPython verifies the stack depth at compile time, which is why certain invalid constructs are caught before the code runs.

Inspecting bytecode with dis — dis.dis() shows you every LOAD_FAST and BINARY_OP your function executes. It judges silently.

Reading the dis Output

Let’s look at a more complex function to practice reading bytecode:

# dis_read.py
import dis

def categorize_score(score):
    if score >= 90:
        return "A"
    elif score >= 80:
        return "B"
    else:
        return "C"

print("=== categorize_score ===")
dis.dis(categorize_score)

# Compare with a dict-based version
GRADE_THRESHOLDS = [(90, "A"), (80, "B"), (0, "C")]

def categorize_score_v2(score):
    for threshold, grade in GRADE_THRESHOLDS:
        if score >= threshold:
            return grade

print("\n=== categorize_score_v2 ===")
dis.dis(categorize_score_v2)

Output:

=== categorize_score ===
  2           0 RESUME                   0

  3           2 LOAD_FAST                0 (score)
              4 LOAD_CONST               1 (90)
              6 COMPARE_OP               5 (>=)
             10 POP_JUMP_IF_FALSE       12 (to 36)

  4          12 LOAD_CONST               2 ('A')
             14 RETURN_VALUE

  5          16 LOAD_FAST                0 (score)
  ...
  (abbreviated)

=== categorize_score_v2 ===
  2           0 RESUME                   0

  3           2 LOAD_GLOBAL              1 (NULL + GRADE_THRESHOLDS)
             12 GET_ITER
        >>   14 FOR_ITER                28 (to 72)
             18 UNPACK_SEQUENCE          2
             22 STORE_FAST               1 (threshold)
             24 STORE_FAST               2 (grade)
  ...

The POP_JUMP_IF_FALSE instruction is how Python implements if statements in bytecode: it pops the boolean result of the comparison and jumps to the target offset if it’s False, otherwise continues to the next instruction. In v2, the LOAD_GLOBAL instruction for GRADE_THRESHOLDS does a dictionary lookup in the global namespace on every call — that’s slower than LOAD_FAST which reads from a local slot. This is why moving frequently-accessed globals into local variables inside tight loops is a valid micro-optimization.

Inspecting Code Objects

Every function has a __code__ attribute that exposes the raw code object. The code object contains not just the bytecode bytes but also metadata: constant values, variable names, closure variables, and source information.

# dis_code_object.py
import dis

def make_adder(n):
    """Return a function that adds n to its argument."""
    def adder(x):
        return x + n
    return adder

add5 = make_adder(5)

# Inspect the outer function
outer_code = make_adder.__code__
print("=== make_adder code object ===")
print(f"co_name:       {outer_code.co_name}")
print(f"co_varnames:   {outer_code.co_varnames}")    # local variables
print(f"co_freevars:   {outer_code.co_freevars}")    # closed-over variables
print(f"co_cellvars:   {outer_code.co_cellvars}")    # variables used by inner funcs
print(f"co_consts:     {outer_code.co_consts}")      # constants (includes inner code)
print(f"co_flags:      {outer_code.co_flags:#010b}")

# Inspect the inner adder function
inner_code = add5.__code__
print("\n=== adder code object ===")
print(f"co_name:       {inner_code.co_name}")
print(f"co_varnames:   {inner_code.co_varnames}")
print(f"co_freevars:   {inner_code.co_freevars}")    # 'n' is a free variable

print("\n=== adder bytecode ===")
dis.dis(add5)

Output:

=== make_adder code object ===
co_name:       make_adder
co_varnames:   ('n', 'adder')
co_freevars:   ()
co_cellvars:   ('n',)
co_consts:     (None, <code object adder at 0x...>, 'make_adder.<locals>.adder')
co_flags:      0b0000000001000011

=== adder code object ===
co_name:       adder
co_varnames:   ('x',)
co_freevars:   ('n',)

=== adder bytecode ===
  3           0 RESUME                   0

  4           2 LOAD_FAST                0 (x)
              4 LOAD_DEREF               0 (n)
              6 BINARY_OP                0 (+)
             10 RETURN_VALUE

LOAD_DEREF is the opcode for accessing a closure variable (a free variable). It reads from a “cell” object that’s shared between the outer and inner function, which is how closures maintain state after the outer function returns. This is slower than LOAD_FAST but necessary for closures. co_cellvars in the outer function lists variables that are “wrapped in cells” for sharing with inner functions; co_freevars in the inner function lists the variables it accesses from the cell.

Reading bytecode like a blueprint — dis.get_instructions() returns a generator. Your profiler returns excuses.

Comparing Implementations

One of the most practical uses of dis is comparing two implementations of the same function. Fewer instructions usually means faster execution (though the relationship isn’t perfect — some instructions are heavier than others).

# dis_compare.py
import dis

# Three ways to join a list of strings
def join_v1(words):
    result = ""
    for word in words:
        result += word + " "
    return result.strip()

def join_v2(words):
    return " ".join(words)

def join_v3(words):
    parts = []
    for word in words:
        parts.append(word)
    return " ".join(parts)

# Count instructions for each
for fn in [join_v1, join_v2, join_v3]:
    instructions = list(dis.get_instructions(fn))
    real_instructions = [i for i in instructions if i.opname != "RESUME"]
    print(f"{fn.__name__}: {len(real_instructions)} instructions")

print("\n=== join_v2 bytecode (winner) ===")
dis.dis(join_v2)

Output:

join_v1: 18 instructions
join_v2: 6 instructions
join_v3: 17 instructions

=== join_v2 bytecode (winner) ===
  2           0 RESUME                   0

  3           2 LOAD_CONST               1 (' ')
              4 LOAD_ATTR                1 (NULL|self + join)
             14 LOAD_FAST                0 (words)
             16 CALL                     1
             24 RETURN_VALUE

join_v2 uses just 5 real instructions: load the string ' ', load its join method, load the words argument, call the method, and return. The other versions loop in Python bytecode, which means executing multiple instructions per element. str.join() does the concatenation in C, which is why it’s significantly faster than loop-based string building for large lists.

Other dis Module Tools

Beyond dis.dis(), the module provides several other useful functions for programmatic bytecode analysis:

# dis_tools.py
import dis

def sample(x, y=10):
    z = x * y
    return z if z > 0 else -z

# get_instructions() -- iterator of Instruction namedtuples
print("=== get_instructions ===")
for instr in dis.get_instructions(sample):
    print(f"  {instr.offset:3d}  {instr.opname:<20} {instr.argrepr}")

# dis.code_info() -- compact summary of the code object
print("\n=== code_info ===")
print(dis.code_info(sample))

# dis.show_code() -- prints code_info to stdout
print("\n=== show_code ===")
dis.show_code(sample)

# Disassemble a code string directly
print("\n=== disassemble from string ===")
code_str = "x = [i**2 for i in range(5)]"
dis.dis(compile(code_str, "<string>", "exec"))

Output (abridged):

=== get_instructions ===
    0  RESUME                0
    2  LOAD_FAST             x
    4  LOAD_FAST             y
    6  BINARY_OP             *
   ...

=== code_info ===
Name:              sample
Filename:          dis_tools.py
Argument count:    2
...
Constants:         (None, 0)
Variable names:    x, y, z

dis.get_instructions() returns Instruction namedtuples with fields opname, opcode, arg, argval, argrepr, offset, starts_line, and is_jump_target. This is the programmatic interface for tools that analyze bytecode -- linters, optimizers, and coverage tools all use it.

Real-Life Example: Bytecode-Based Performance Audit

This audit tool identifies functions in a module that use slow patterns -- global variable access in loops, attribute chaining, or excessive stack operations -- and reports them with instruction counts.

Understanding Python bytecode instructions — Bytecode doesn't lie. Your function has 47 instructions. You thought it was simple.

# bytecode_audit.py
import dis
import types

def audit_function(fn):
    """
    Audit a function for common bytecode-level performance patterns.
    Returns a dict of findings.
    """
    instructions = list(dis.get_instructions(fn))
    opnames = [i.opname for i in instructions]

    # Count meaningful instructions (skip RESUME)
    total = sum(1 for op in opnames if op != "RESUME")

    # Detect LOAD_GLOBAL inside a FOR_ITER loop
    global_in_loop = 0
    in_loop = False
    for instr in instructions:
        if instr.opname == "FOR_ITER":
            in_loop = True
        if instr.opname == "RETURN_VALUE":
            in_loop = False
        if in_loop and instr.opname == "LOAD_GLOBAL":
            global_in_loop += 1

    # Count closure variable accesses
    deref_count = opnames.count("LOAD_DEREF") + opnames.count("STORE_DEREF")

    # Count function calls
    call_count = opnames.count("CALL") + opnames.count("CALL_FUNCTION")

    return {
        "name": fn.__name__,
        "total_instructions": total,
        "global_in_loop": global_in_loop,
        "deref_accesses": deref_count,
        "function_calls": call_count,
    }


# Test functions with different characteristics
MULTIPLIER = 10  # global variable

def process_list_slow(data):
    """Uses global in loop -- flagged by audit."""
    result = []
    for item in data:
        result.append(item * MULTIPLIER)  # LOAD_GLOBAL each iteration
    return result

def process_list_fast(data, multiplier=MULTIPLIER):
    """Caches global as local default -- not flagged."""
    return [item * multiplier for item in data]  # LOAD_FAST

def nested_closure(x):
    """Creates a closure -- deref accesses flagged."""
    factor = x * 2
    def inner(y):
        return y + factor  # LOAD_DEREF
    return inner

# Run audit
print(f"{'Function':<25} {'Instructions':>14} {'Global-in-loop':>15} {'Deref':>8} {'Calls':>8}")
print("-" * 75)
for fn in [process_list_slow, process_list_fast, nested_closure]:
    r = audit_function(fn)
    flag = " [!]" if r["global_in_loop"] > 0 else ""
    print(f"{r['name']:<25} {r['total_instructions']:>14} {r['global_in_loop']:>15}{flag}  {r['deref_accesses']:>6}  {r['call_count']:>6}")

Output:

Function                  Instructions  Global-in-loop    Deref   Calls
---------------------------------------------------------------------------
process_list_slow                   10               1 [!]         0       2
process_list_fast                    6               0               0       0
nested_closure                       6               0               0       0

process_list_slow is flagged because it accesses the global MULTIPLIER inside a FOR_ITER loop. Each LOAD_GLOBAL does a dictionary lookup in the global namespace, while LOAD_FAST (used in the fast version) reads from a pre-allocated local slot -- typically 3-5x faster. The fix is simple: assign the global to a local variable before the loop, or use a default argument as process_list_fast does.

Frequently Asked Questions

Does bytecode change between Python versions?

Yes -- CPython's bytecode is not guaranteed to be stable between minor versions. Python 3.12 introduced significant opcode changes (e.g., CALL_FUNCTION was renamed and restructured to CALL), and 3.13 added further changes for the free-threaded build. The dis module always reflects the bytecode of the running interpreter, so your disassembly output will match what CPython actually executes. Never rely on specific bytecode sequences across Python versions in production code.

Why is LOAD_FAST faster than LOAD_GLOBAL?

LOAD_FAST reads a variable from a fixed-size local variable array using an integer index -- it's essentially an array lookup with no hashing. LOAD_GLOBAL performs a hash table lookup in the module's __dict__, then falls back to builtins.__dict__ if not found there. The hash table lookup involves computing a hash, finding the slot, and handling potential collisions -- much slower for tight loops. The micro-optimization of caching globals as locals (_len = len before a loop) can matter in hot paths that execute millions of times.

Does CPython optimize bytecode?

Yes, CPython applies "peephole optimization" during compilation -- a limited set of constant-folding and dead-code elimination passes. For example, 2 * 3 in source code becomes the constant 6 in bytecode (no runtime multiplication). if False: ... branches are eliminated entirely. Python 3.13 introduced more aggressive optimization with the "quickening" mechanism that specializes frequently-executed instructions based on their runtime types. The dis module shows the post-optimization bytecode.

Can I modify bytecode at runtime?

Technically yes -- you can create a new types.CodeType object with modified bytecode and assign it to a function's __code__. Libraries like codetransformer and bytecode provide higher-level APIs for this. In practice, modifying bytecode is fragile (it breaks across Python versions), slow to implement correctly, and almost always unnecessary -- if you need runtime code transformation, Python's decorator system, ast module, or metaclasses are safer and more maintainable alternatives.

How does bytecode relate to PyPy or Cython?

CPython's bytecode is interpreted by the CPython VM -- each instruction is executed by a Python-level switch statement in C. PyPy uses a JIT compiler that compiles frequently-executed bytecode to native machine code at runtime, which is why it can be 5-10x faster for CPU-bound loops. Cython is a different approach: it compiles Python-like code to C extensions that bypass bytecode entirely. The dis module only shows CPython bytecode -- PyPy and Cython have their own internal representations that dis doesn't apply to.

Conclusion

The dis module gives you a window into what CPython actually does with your Python source code. The key tools are dis.dis() for human-readable bytecode output, dis.get_instructions() for programmatic analysis, dis.code_info() for code object metadata, and the __code__ attribute for direct code object inspection. The performance audit above is a practical starting point -- extend it to scan all functions in a module, add checks for attribute chaining inside loops, or integrate it into your CI pipeline as a performance regression detector.

The dis module is most valuable when combined with profiling: use cProfile or line_profiler to find the hot path, then use dis to understand why one implementation is faster than another. Together they give you both the "where" and the "why" of Python performance.

Official documentation: https://docs.python.org/3/library/dis.html

Continue Learning Python

Tutorials you might also find useful:

How To Use Python inspect to Examine Live Objects and Code

Post Views: 66

How To Use Python dis Module to Inspect Bytecode

Python dis Module: Quick Example

What Is CPython Bytecode?

Reading the dis Output

Inspecting Code Objects

Comparing Implementations

Other dis Module Tools

Real-Life Example: Bytecode-Based Performance Audit

Frequently Asked Questions

Does bytecode change between Python versions?

Why is LOAD_FAST faster than LOAD_GLOBAL?

Does CPython optimize bytecode?

Can I modify bytecode at runtime?

How does bytecode relate to PyPy or Cython?

Conclusion

Related Articles

Continue Learning Python

Submit a Comment Cancel reply