Intermediate

Your Python service passes all tests and runs fine in development — then hits production and balloons to 4 GB of RAM. You restart it, it climbs again, and now you have a memory leak you cannot locate. You add a few print(sys.getsizeof(...)) calls, but they only measure individual objects, not the full allocation tree. You try the standard tracemalloc module and get a list of the top 10 allocations with no call-stack context. The problem could be anywhere across 50 modules and 300 functions.

memray is a memory profiler for Python developed by Bloomberg Engineering. It instruments every memory allocation and deallocation in your program — including C extensions and native code — and records a full call stack for each one. After a run you get a flame graph showing exactly which call path is responsible for each byte: not just which object, but which function called which function that eventually caused the allocation. It supports command-line profiling, pytest integration, and a live tracking mode so you can watch allocations happen in real time. Install it with pip install memray.

In this article we will cover how memray works and how it differs from tracemalloc, how to profile a script from the command line, how to read the flame graph and table reports, how to track live allocations, how to use the pytest-memray plugin to add memory limits to tests, how to profile specific code blocks with the Python API, and how to interpret results to find and fix real leaks. By the end you will have a complete toolkit for diagnosing memory problems in any Python application.

memray Quick Example: Finding a Memory Hog in 30 Seconds

Here is the minimum setup to profile a script and open the flame graph. First, create a script that has an obvious memory problem:

# leaky_script.py
def load_big_list():
    return [i * 2 for i in range(5_000_000)]

def process(data):
    # Creates a second copy -- doubles memory
    return [str(x) for x in data]

if __name__ == "__main__":
    nums = load_big_list()
    strs = process(nums)
    print(f"Processed {len(strs)} items")

Run it under memray from the terminal:

# run_memray.sh
python -m memray run leaky_script.py
Writing profile results into memray-leaky_script.py.1234.bin
[memray] Successfully generated profile results.

 Run id: 1234
 Command line: leaky_script.py
 Start time: 2026-06-11 09:00:00.000
 End time: 2026-06-11 09:00:02.345
 Duration: 2.345 seconds
 Total allocations: 5,000,132
 Total memory allocated: 312.4 MB
 Peak memory usage: 271.2 MB

Then generate the flame graph report and open it in a browser:

# generate_flamegraph.sh
python -m memray flamegraph memray-leaky_script.py.1234.bin
Wrote flamegraph to memray-flamegraph-leaky_script.py.1234.html

The flame graph shows two large bars: load_big_list responsible for ~150 MB (the integer list) and process responsible for ~162 MB (the string list). The call path is clear — both functions allocate heavily and neither releases until the script exits. The sections below cover the full memray toolkit.

What Is memray and How Does It Work?

memray is a deterministic memory profiler. Unlike sampling profilers that check memory usage at intervals, memray intercepts every call to the Python allocator (and to malloc/free in C extensions) and records the exact call stack at the moment of each allocation. This gives complete, lossless data rather than a statistical sample.

Python already ships with tracemalloc, which also tracks allocations. The key differences are scope and output. tracemalloc only tracks Python-level allocations and presents flat top-N lists. memray tracks both Python and native (C) allocations, records the full call chain, and produces interactive flame graphs, timeline views, and summary tables that show not just what allocated memory but the entire path through your code that led to the allocation.

Featurememraytracemallocmemory_profiler
Tracks C extension allocationsYesNoNo
Full call stack per allocationYesPartialNo
Flame graph outputYes (HTML)NoNo
Live tracking modeYesNoNo
pytest integrationYes (pytest-memray)NoNo
Performance overheadModerate (2-5x)LowHigh (line-by-line)
Platform supportLinux, macOSAllAll

memray is built on Linux’s LD_PRELOAD mechanism and macOS’s interpose feature to hook the allocator at the C level. This is why it works on Linux and macOS but not Windows. It writes a compact binary trace file that you then convert to reports using the memray CLI.

Python developer inspecting memory allocation blocks with magnifying glass
tracemalloc showed the symptom. memray shows the crime scene.

Installing memray

Install memray with pip. It requires Python 3.7+ and works on Linux and macOS (not Windows):

# install_memray.sh
pip install memray
Successfully installed memray-1.13.0

Verify the installation:

# verify_memray.sh
python -m memray --version
memray, version 1.13.0

To use the pytest integration, also install the plugin:

# install_pytest_memray.sh
pip install pytest-memray
Successfully installed pytest-memray-1.6.0

Command-Line Profiling

The simplest way to profile a script is the memray run subcommand. It runs your script and writes a binary trace to a .bin file in the current directory:

# profile_script.sh
python -m memray run my_script.py
Writing profile results into memray-my_script.py.9821.bin

You can also profile a module with -m, exactly like running Python normally:

# profile_module.sh
python -m memray run -m pytest tests/
Writing profile results into memray-run-pytest.9955.bin

Key flags for memray run:

# memray_run_flags.sh

# Custom output file
python -m memray run -o my_profile.bin leaky_script.py

# Track only native (C) allocations as well
python -m memray run --native leaky_script.py

# Compress the output file
python -m memray run --compress-on-exit leaky_script.py

# Set a specific memory limit (kills process if exceeded)
python -m memray run --memory-limit 500MB leaky_script.py

The --native flag adds C extension allocations to the trace. Use it when you suspect NumPy, Pandas, or other native extensions are the source of a leak — without it, memray only sees Python-level allocations from those libraries.

Python developer watching terminal memory profiling output
memray run. Then coffee. Then answers.

Reading the Reports: Flame Graph and Stats Table

memray produces several report types from the same .bin trace file. The flame graph is the most useful for finding the source of large allocations:

# generate_reports.sh

# Flame graph (HTML -- open in browser)
python -m memray flamegraph memray-leaky_script.py.1234.bin
# Output: memray-flamegraph-leaky_script.py.1234.html

# Summary table (terminal output)
python -m memray stats memray-leaky_script.py.1234.bin

# Tree view (allocations as a call tree)
python -m memray tree memray-leaky_script.py.1234.bin

The stats command gives a quick summary directly in the terminal without opening a browser:

# stats_output.txt

---- Top 10 allocations by size ----

1) size=162.4 MB, allocated in process (leaky_script.py:7)
       -> process (leaky_script.py:8)

2) size=150.1 MB, allocated in load_big_list (leaky_script.py:2)
       -> __main__ (leaky_script.py:11)

3) size=1.2 MB, allocated in _bootstrap (importlib._bootstrap:1)
       -> ...

Total allocations: 5,000,132
Total memory allocated: 312.4 MB
Peak memory: 271.2 MB

In the flame graph HTML, each box represents a function. The width of the box is proportional to the amount of memory allocated by that function and all its callees. Click a box to zoom in. The call path reads top-to-bottom: the widest box at the bottom is usually your entry point (__main__), and the widest box at the top is the function doing the most allocation. Use the “Show only allocations” toggle to filter out functions that only pass memory through without allocating.

Live Tracking Mode

Instead of profiling a full run and analyzing afterward, live mode streams allocations to a terminal UI in real time. This is especially useful for long-running servers or scripts where you want to watch memory grow and correlate it with specific operations:

# live_tracking.sh
python -m memray run --live leaky_script.py
   Allocation            Location                              Size        Count
---------------------------------------------------------------------------
              leaky_script.py:2                  148.2 MB    5,000,000
              leaky_script.py:7                  122.1 MB    3,892,451
   list_to_str          leaky_script.py:8                    8.4 MB      108,441
   ...

 Peak memory: 271.2 MB     Current: 249.8 MB     [q]uit  [r]eset

The live view updates every 0.1 seconds. Press q to stop the run early. The --live-port flag lets you connect a second terminal to the same live stream, which is useful for profiling a server process without interrupting it:

# live_remote.sh

# In terminal 1 -- start the server with live tracking
python -m memray run --live-port 5001 server.py

# In terminal 2 -- attach the viewer
python -m memray live 5001
Python developer watching live memory tracking dashboard
–live mode. Because grep-ing production logs is a last resort.

Using the Python API for Targeted Profiling

If you only want to profile a specific section of a larger application — not the whole program — use memray’s Python context manager. This avoids noise from startup, shutdown, and unrelated code paths:

# targeted_profiling.py
import memray

def build_index(documents):
    """Build an inverted index from a list of documents."""
    index = {}
    for doc_id, text in enumerate(documents):
        for word in text.lower().split():
            if word not in index:
                index[word] = []
            index[word].append(doc_id)
    return index

def search(index, query):
    """Return document IDs matching all query terms."""
    terms = query.lower().split()
    results = set(index.get(terms[0], []))
    for term in terms[1:]:
        results &= set(index.get(term, []))
    return list(results)

# Sample data
docs = [
    "Python memory profiling with memray",
    "How to find memory leaks in Python",
    "memray flame graph tutorial",
] * 10_000

# Profile only the index build -- not the search or data setup
with memray.Tracker("index_build.bin"):
    index = build_index(docs)

# Analysis later:
# python -m memray flamegraph index_build.bin
print(f"Index built: {len(index)} unique terms")
print(f"Search results: {search(index, 'python memory')}")
Index built: 12 unique terms
Search results: [0, 1, 2, 3, ...]

The memray.Tracker context manager starts recording on entry and writes the .bin file on exit. You can also add native=True to catch C allocations: memray.Tracker("profile.bin", native=True). Use the targeted profiling approach in production services where you cannot afford to instrument the entire process — wrap only the suspicious function or request handler.

Testing Memory Usage with pytest-memray

pytest-memray integrates memray into your test suite. Run your tests with memory profiling and optionally enforce per-test memory limits:

# test_memory_limits.py
import pytest

def build_report(n_rows):
    """Build a report dict with n_rows entries."""
    return {f"row_{i}": {"value": i, "label": f"Item {i}"} for i in range(n_rows)}

# This test will fail if it allocates more than 50 MB
@pytest.mark.limit_memory("50 MB")
def test_small_report_memory():
    report = build_report(100_000)
    assert len(report) == 100_000

# This test passes -- no limit set, just profiled
def test_large_report_memory():
    report = build_report(1_000_000)
    assert len(report) == 1_000_000

Run with the --memray flag to enable profiling:

# run_memray_tests.sh
pytest tests/test_memory_limits.py --memray
FAILED test_memory_limits.py::test_small_report_memory - Failed: Test was
limited to 50.0MB but allocated 89.3MB

PASSED test_memory_limits.py::test_large_report_memory
 - Total memory allocated: 421.7MB

========== 1 failed, 1 passed in 2.34s ==========

The @pytest.mark.limit_memory("50 MB") decorator sets a hard ceiling. If the test allocates more than the limit, it fails with a clear message showing the actual allocation. Add this marker to any function that processes large data structures in a tight loop — it turns memory regressions into CI failures instead of production surprises. You can also pass --memray-bin-path=./profiles/ to save trace files from all tests for post-run analysis.

Python developer checking failed CI memory limit test results
@pytest.mark.limit_memory. Because ‘it worked fine locally’ ends here.

Identifying and Fixing Memory Leaks

A genuine memory leak in Python is usually one of three things: a growing container that is never cleared, a reference cycle that the garbage collector cannot break (often involving __del__ methods), or a native extension that leaks memory at the C level. memray’s flame graph makes all three visible.

Here is a realistic example of a container-based leak and how memray exposes it:

# cache_leak.py
import memray

# Global cache that is never evicted
_query_cache = {}

def expensive_query(key):
    """Simulates a database query with a result cache."""
    if key not in _query_cache:
        # Caches a 10KB result for every unique key -- forever
        _query_cache[key] = b"x" * 10_240
    return _query_cache[key]

def handle_requests(n):
    """Simulates n incoming requests with unique keys."""
    for i in range(n):
        result = expensive_query(f"user:{i}:profile")
    return len(_query_cache)

with memray.Tracker("cache_leak.bin"):
    total = handle_requests(5_000)

print(f"Cache size after run: {total} entries")
# python -m memray flamegraph cache_leak.bin
# Flame graph will show _query_cache holding ~50 MB with no deallocation path
Cache size after run: 5000 entries

When you open the flame graph, expensive_query will show a wide bar with a path leading to dict.__setitem__ — the cache assignment. Since there is no eviction, all 50 MB stays live until the process exits. The fix is to bound the cache with functools.lru_cache or cachetools.LRUCache. After fixing, run the same profile and verify the peak memory drops dramatically.

Real-Life Example: Profiling a Data Processing Pipeline

Here is a realistic data pipeline that reads a large dataset and produces an aggregate report. We will use memray to identify which stage uses the most memory and then refactor to reduce the peak.

# data_pipeline.py
import memray
import csv
import io
import random

# --- Generate sample CSV data in memory ---
def make_sample_csv(n_rows=500_000):
    buf = io.StringIO()
    writer = csv.writer(buf)
    writer.writerow(["user_id", "product_id", "amount", "category"])
    categories = ["electronics", "clothing", "food", "books", "sports"]
    for i in range(n_rows):
        writer.writerow([
            f"user_{i % 10_000}",
            f"prod_{random.randint(1, 1000)}",
            round(random.uniform(1, 500), 2),
            random.choice(categories),
        ])
    buf.seek(0)
    return buf

# --- Stage 1: Load everything into memory (naive approach) ---
def load_all_rows(csv_buf):
    reader = csv.DictReader(csv_buf)
    return list(reader)   # entire dataset in a list of dicts

# --- Stage 2: Aggregate totals by category ---
def aggregate_by_category(rows):
    totals = {}
    for row in rows:
        cat = row["category"]
        amt = float(row["amount"])
        if cat not in totals:
            totals[cat] = {"count": 0, "total": 0.0}
        totals[cat]["count"] += 1
        totals[cat]["total"] += amt
    return totals

with memray.Tracker("pipeline.bin"):
    csv_data = make_sample_csv()
    rows = load_all_rows(csv_data)          # Stage 1
    report = aggregate_by_category(rows)    # Stage 2

for cat, stats in sorted(report.items()):
    avg = stats["total"] / stats["count"]
    print(f"{cat:12s}: {stats['count']:6d} orders, avg ${avg:.2f}")
books       :  99812 orders, avg $250.33
clothing    : 100203 orders, avg $249.88
electronics :  99876 orders, avg $250.14
food        : 100442 orders, avg $249.71
sports      :  99667 orders, avg $249.92

Running python -m memray flamegraph pipeline.bin will show load_all_rows responsible for the majority of peak memory — all 500,000 rows are held in a list of dicts at the same time. The fix is to stream the CSV row-by-row instead of loading it all at once. Replace load_all_rows with a streaming aggregator and the peak memory drops from ~200 MB to ~2 MB, because only one row is ever in memory at a time. This is the memray workflow in practice: profile, identify the stage, refactor, re-profile to confirm the improvement.

Two data pipeline approaches compared -- bulk load vs streaming
list(reader) vs. for row in reader. One of these is a 200 MB decision.

Frequently Asked Questions

Does memray work on Windows?

No. memray uses Linux’s LD_PRELOAD and macOS’s interpose mechanism to hook the allocator, neither of which exists on Windows. On Windows, consider using tracemalloc for Python allocations or Fil (also open-source) if you need C extension tracking. If you develop on Windows and deploy to Linux, you can run memray via WSL2 or in a Docker container for profiling purposes while keeping your main development on Windows.

How much does memray slow down my code?

Expect 2x to 5x slowdown in programs that allocate heavily. Programs that allocate infrequently (mostly numeric computation on pre-allocated arrays) may see only 10-20% overhead. memray is not designed for production use — run it in a staging or development environment. If you need in-production memory monitoring, use a metrics approach (periodic psutil.Process().memory_info().rss readings) rather than a deterministic profiler.

Does memray track garbage-collected objects?

memray tracks allocations and deallocations at the allocator level, which includes objects collected by Python’s cyclic garbage collector. When gc.collect() frees a cycle, memray records those deallocations. You can see “temporary” allocations (objects allocated and freed within the profiled window) by using the --show-temporary-allocations flag with the flame graph command. This is useful for diagnosing churn — code that creates and throws away millions of short-lived objects, driving CPU time in the allocator even if peak memory looks normal.

Does memray work with async code and FastAPI/aiohttp?

Yes. Since memray hooks the allocator at the C level, it is transparent to Python’s async machinery. Wrap your ASGI/WSGI app with memray.Tracker for a fixed profiling window, or use memray run --live to watch allocations as requests come in. For per-request profiling in FastAPI, add a middleware that starts a Tracker context at request start and stops it at response end, writing one trace file per request to a temp directory.

Why does memray show less memory than Task Manager for my NumPy script?

NumPy allocates memory through its own internal pools which may not map 1:1 to Python allocator calls. Use the --native flag (python -m memray run --native script.py) to also track C-level allocations including NumPy’s internal pools. Without --native, memray only sees the Python-side wrapper objects, which are much smaller than the actual array data stored in native memory.

Can I use memray inside Docker?

Yes, with one requirement: the container must have SYS_PTRACE capability to allow native tracing. Add --cap-add SYS_PTRACE to your docker run command or add cap_add: [SYS_PTRACE] to your docker-compose.yml service. If you only need Python-level profiling (not --native), the capability is not required. For Kubernetes deployments, add capabilities.add: ["SYS_PTRACE"] to the container’s securityContext.

Conclusion

memray turns memory debugging from a guessing game into a structured investigation. Run python -m memray run script.py to capture the full allocation trace, generate a flame graph with python -m memray flamegraph *.bin, and follow the widest call paths down to the function doing the actual allocating. The Python API’s memray.Tracker context manager lets you surgically profile one subsystem without the noise of a full run, and pytest-memray prevents memory regressions from reaching production by turning allocation spikes into CI failures.

The real-life pipeline example shows the workflow end to end: profile, read the flame graph, refactor the offending stage, re-profile to confirm the improvement. Try extending it by adding a streaming version of load_all_rows using a generator, re-running the profile, and comparing the two flame graphs side by side. The official documentation at bloomberg.github.io/memray covers advanced topics including attaching to running processes, the timeline view, and custom reporters.