Beginner

Scripts that process data often need somewhere to park a file briefly — during a test run, between two processing steps, or while waiting for a slow external service. You could create files in the current directory with a hardcoded name, but that creates a minefield: name collisions between parallel runs, files left behind after crashes, and security vulnerabilities on shared systems where other processes can predict and intercept your file. Every one of these problems is solved by Python’s tempfile module.

The tempfile module creates files and directories in a platform-appropriate temp location (/tmp on Linux/Mac, %TEMP% on Windows) with unpredictable names that prevent guessing attacks. The objects it returns are context managers, so files are automatically deleted when you are done — even if your script crashes halfway through. The module is in the standard library, no installation required.

In this tutorial you will learn to use NamedTemporaryFile for named temp files, TemporaryDirectory for scratch directories, mkstemp and mkdtemp for manual control, and SpooledTemporaryFile for memory-first buffering. We will also build a real-world data-processing script that uses temp files as a safe intermediate storage layer.

Temporary Files in Python: Quick Example

Here is the simplest useful pattern — write data to a temp file, read it back, and let the context manager clean up automatically.

# tempfile_quick.py
import tempfile
import json

data = {'user': 'alice', 'score': 42, 'tags': ['python', 'beginner']}

with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=True) as f:
    json.dump(data, f)
    f.flush()
    path = f.name
    print(f"Temp file created: {path}")

    # Read it back while still open
    f.seek(0)
    loaded = json.load(f)
    print(f"Data loaded: {loaded}")

print(f"File exists after context: {__import__('os').path.exists(path)}")
Temp file created: /tmp/tmpk7x3q8ab.json
Data loaded: {'user': 'alice', 'score': 42, 'tags': ['python', 'beginner']}
File exists after context: False

The file was created in the system temp directory with a random name and the .json suffix we requested. The moment we exited the with block, the file was deleted. Notice we called f.flush() before seeking back to position 0 — this is essential when reading back data you just wrote to ensure the buffer is flushed to disk first.

What Is the tempfile Module and Why Use It?

The tempfile module provides a safe, cross-platform way to create temporary files and directories. “Safe” here has two meanings: security (names are unpredictable, files are created with restricted permissions) and resource management (files are automatically deleted when no longer needed).

Function / ClassReturnsAuto-delete?Best for
NamedTemporaryFileFile object with .nameYes (by default)Passing to external programs by path
TemporaryFileFile object (no name on disk)YesPure in-process buffering
SpooledTemporaryFileFile object (starts in memory)YesSmall data that might not need disk
TemporaryDirectoryPath stringYesScratch directories for multiple files
mkstemp(fd, path) tupleManualWhen you need the OS file descriptor
mkdtempPath stringManualDirectory with manual lifetime control

The context manager versions (NamedTemporaryFile, TemporaryDirectory) are the right choice for almost all use cases because they guarantee cleanup. Use mkstemp/mkdtemp only when you need the low-level OS file descriptor or the temp item must outlive the creating scope.

NamedTemporaryFile in Depth

The most useful class in the module is NamedTemporaryFile. Its key feature is that the file has a real path on the filesystem (f.name), which means you can pass the path to external programs and libraries that need to open files themselves rather than receiving file objects.

# tempfile_named.py
import tempfile
import os

# --- Basic usage: write binary data ---
with tempfile.NamedTemporaryFile(
    mode='wb',
    suffix='.bin',
    prefix='myapp_',
    dir='/tmp',        # force a specific directory
    delete=True
) as f:
    f.write(b'\x00\x01\x02\x03' * 100)
    print(f"Name: {f.name}")
    print(f"Size: {os.path.getsize(f.name)} bytes")

# --- On Windows: keep open for reading by another process ---
# On Windows, the file CANNOT be opened by another process while
# NamedTemporaryFile holds it. Use delete=False to work around this:
with tempfile.NamedTemporaryFile(
    mode='w', suffix='.txt', delete=False
) as f:
    f.write("Important intermediate data\n")
    temp_path = f.name

# Now the file is closed and another process could open it
print(f"Temp path: {temp_path}")
with open(temp_path) as f:
    print(f"Contents: {f.read().strip()}")

# Clean up manually since delete=False
os.unlink(temp_path)
print(f"Cleaned up: {not os.path.exists(temp_path)}")
Name: /tmp/myapp_tmp8k2xp19z.bin
Size: 400 bytes
Temp path: /tmp/tmpq7w3r5mn.txt
Contents: Important intermediate data
Cleaned up: True

The prefix and suffix parameters let you give the file a recognizable name pattern for debugging. The dir parameter overrides the default temp directory — useful when the external program needs the file on the same filesystem or partition. When using delete=False, you become responsible for calling os.unlink(path) yourself — always do this in a finally block so it runs even on exceptions.

TemporaryDirectory for Scratch Workspaces

When your script needs multiple related files — a batch of generated reports, intermediate data files, or test fixtures — create a TemporaryDirectory and write everything inside it. The whole directory tree is wiped on exit.

# tempfile_directory.py
import tempfile
import os
import json

students = [
    {'name': 'Alice', 'grade': 'A', 'score': 95},
    {'name': 'Bob',   'grade': 'B', 'score': 82},
    {'name': 'Carol', 'grade': 'A', 'score': 91},
]

with tempfile.TemporaryDirectory(prefix='grades_') as tmpdir:
    print(f"Working in: {tmpdir}")

    # Write one file per student
    for student in students:
        filename = os.path.join(tmpdir, f"{student['name'].lower()}.json")
        with open(filename, 'w') as f:
            json.dump(student, f)

    # List what we created
    files = os.listdir(tmpdir)
    print(f"Files created: {files}")

    # Read them back
    all_data = []
    for fname in files:
        with open(os.path.join(tmpdir, fname)) as f:
            all_data.append(json.load(f))

    top_students = [s for s in all_data if s['grade'] == 'A']
    print(f"A-grade students: {[s['name'] for s in top_students]}")

print(f"Dir exists after: {os.path.exists(tmpdir)}")
Working in: /tmp/grades_tmpx9k2m1pq
Files created: ['alice.json', 'bob.json', 'carol.json']
A-grade students: ['Alice', 'Carol']
Dir exists after: False

The entire grades_tmp... directory and all three JSON files inside it were deleted when we exited the with block. This is the correct pattern for any batch processing task that generates intermediate files — create everything inside a TemporaryDirectory, do your work, and let Python handle the cleanup.

mkstemp and mkdtemp for Manual Control

mkstemp creates a temporary file and returns an OS-level file descriptor alongside the path. You get more control but you must manage cleanup yourself. Use it when you need to pass the raw file descriptor to a C library or OS call, or when the file must persist after the creating scope ends.

# tempfile_mkstemp.py
import tempfile
import os

# mkstemp returns (file_descriptor, absolute_path)
fd, path = tempfile.mkstemp(suffix='.csv', prefix='export_')

try:
    # Write via os.write (works with raw file descriptors)
    header = b"name,score,grade\n"
    os.write(fd, header)
    os.write(fd, b"Alice,95,A\n")
    os.write(fd, b"Bob,82,B\n")
    os.close(fd)  # MUST close the fd before re-opening

    # Now open normally for verification
    with open(path) as f:
        print(f.read())
finally:
    # Always clean up manually
    if os.path.exists(path):
        os.unlink(path)
        print("Temp file deleted.")

# mkdtemp for directories
tmpdir = tempfile.mkdtemp(prefix='build_')
print(f"Build dir: {tmpdir}")
# ... do work in tmpdir ...
import shutil
shutil.rmtree(tmpdir)  # Manual cleanup required
print(f"Cleaned up: {not os.path.exists(tmpdir)}")
name,score,grade
Alice,95,A
Bob,82,B

Temp file deleted.
Build dir: /tmp/build_tmpm3r9k2qa
Cleaned up: True

Two critical rules for mkstemp: always close the returned file descriptor with os.close(fd) before reopening the file with open(path), and always delete the file in a finally block. Forgetting either step is a common source of “too many open files” errors or disk filling up with uncleaned temp files in long-running services.

SpooledTemporaryFile for Memory-First Buffering

SpooledTemporaryFile starts the file in memory and only spills to disk when the data exceeds a size threshold you set. This is ideal for processing data that is usually small but occasionally large — you avoid disk I/O for the common case while still handling large payloads safely.

# tempfile_spooled.py
import tempfile
import sys

def process_data(data: bytes) -> bytes:
    """Simulate processing that reads from a file-like object."""
    with tempfile.SpooledTemporaryFile(
        max_size=1024 * 10,  # 10 KB stays in memory; larger spills to disk
        mode='w+b'
    ) as f:
        f.write(data)
        f.seek(0)

        # Check if we spilled to disk
        spilled = hasattr(f, '_rolled') and f._rolled
        src = "disk" if spilled else "memory"
        print(f"  Data size: {len(data):,} bytes -- stored in {src}")

        return f.read()

small = b"x" * 5000    # 5 KB  -- stays in memory
large = b"y" * 50000   # 50 KB -- spills to disk

r1 = process_data(small)
r2 = process_data(large)
print(f"Small result: {len(r1):,} bytes")
print(f"Large result: {len(r2):,} bytes")
  Data size: 5,000 bytes -- stored in memory
  Data size: 50,000 bytes -- stored in disk
Small result: 5,000 bytes
Large result: 50,000 bytes

The transition from memory to disk is handled automatically and transparently. Your code reads and writes the SpooledTemporaryFile exactly like any regular file object — the mode switch happens behind the scenes. The max_size threshold is tunable per use case: a web server handling file uploads might set it to 1 MB, while a batch ETL process might use 64 MB.

Real-Life Example: Safe CSV Transformation Pipeline

Here is a practical script that downloads CSV data, transforms it in a temporary directory, and produces a final output file — using temp files as safe intermediate storage throughout.

# tempfile_pipeline.py
import tempfile
import csv
import os
import json
from pathlib import Path

# Sample raw data (in practice this would come from a file upload or API)
RAW_CSV = """id,name,raw_score,notes
1,Alice,"  95 ","top student"
2,Bob," 82","needs follow-up"
3,Carol," 91 ","excellent work"
4,Dave,"  67","below average"
"""

def clean_row(row: dict) -> dict:
    """Clean a single CSV row: strip whitespace, convert types."""
    return {
        'id': int(row['id']),
        'name': row['name'].strip(),
        'score': float(row['raw_score'].strip()),
        'notes': row['notes'].strip(),
        'grade': 'A' if float(row['raw_score'].strip()) >= 90 else
                 'B' if float(row['raw_score'].strip()) >= 80 else
                 'C' if float(row['raw_score'].strip()) >= 70 else 'D'
    }

def run_pipeline(raw_data: str, output_path: str) -> int:
    """Transform raw CSV into clean JSON, using temp files as intermediates."""
    with tempfile.TemporaryDirectory(prefix='csv_pipeline_') as tmpdir:
        # Step 1: Write raw data to temp CSV
        raw_path = os.path.join(tmpdir, 'raw.csv')
        with open(raw_path, 'w', newline='') as f:
            f.write(raw_data)

        # Step 2: Clean and transform
        clean_path = os.path.join(tmpdir, 'clean.json')
        with open(raw_path) as infile, open(clean_path, 'w') as outfile:
            reader = csv.DictReader(infile)
            cleaned = [clean_row(row) for row in reader]
            json.dump(cleaned, outfile, indent=2)

        # Step 3: Summarize
        summary_path = os.path.join(tmpdir, 'summary.txt')
        with open(clean_path) as f:
            data = json.load(f)

        grade_counts = {}
        for student in data:
            g = student['grade']
            grade_counts[g] = grade_counts.get(g, 0) + 1

        with open(summary_path, 'w') as f:
            f.write("Grade distribution:\n")
            for grade, count in sorted(grade_counts.items()):
                f.write(f"  {grade}: {count} student(s)\n")

        # Step 4: Copy final output from temp dir to permanent location
        import shutil
        shutil.copy(clean_path, output_path)
        print(f"Pipeline complete. Output written to: {output_path}")

        # Temp directory auto-deleted here
    return len(data)

total = run_pipeline(RAW_CSV, '/tmp/students_final.json')
print(f"Processed {total} students.")

with open('/tmp/students_final.json') as f:
    print(json.dumps(json.load(f), indent=2))
Pipeline complete. Output written to: /tmp/students_final.json
Processed 4 students.
[
  {"id": 1, "name": "Alice", "score": 95.0, "notes": "top student", "grade": "A"},
  {"id": 2, "name": "Bob", "score": 82.0, "notes": "needs follow-up", "grade": "B"},
  {"id": 3, "name": "Carol", "score": 91.0, "notes": "excellent work", "grade": "A"},
  {"id": 4, "name": "Dave", "score": 67.0, "notes": "below average", "grade": "D"}
]

The entire multi-step transformation happens inside a TemporaryDirectory. Only the final, verified output is copied to a permanent location. If anything fails mid-pipeline, the temp directory and all its intermediate files are cleaned up automatically — no stale partials left on disk.

Frequently Asked Questions

Why does NamedTemporaryFile fail on Windows when I try to open it again?

On Windows, a file opened by one process cannot be opened by another while the first handle is open (file locking is stricter than on Unix). NamedTemporaryFile holds an open handle while the context is active, so opening f.name in a subprocess or a separate open() call will fail with a permission error. The fix is delete=False: write and close the file, let the external process open it, then delete manually with os.unlink(path) in a finally block.

When should I use delete=False?

Use delete=False when the temp file must outlive the with block — for example, to pass to an external program that opens files by path, or to hand off to another function that will manage deletion. Always pair delete=False with explicit cleanup code in a try/finally block. Forgetting the cleanup is the number one source of disk accumulation issues in long-running Python services.

Can I change where temp files are created?

Yes, in three ways. Per-call: pass dir='/your/path' to any tempfile function. Per-process: set the environment variable TMPDIR (Unix), TEMP, or TMP (Windows). Globally in code: call tempfile.tempdir = '/your/path' before any temp file calls. Using a custom directory is important when you need the temp file on the same filesystem as the final destination (to allow atomic rename) or when the default temp directory is too small for your data.

Are tempfile names actually unpredictable?

Yes. tempfile uses os.urandom() to generate the random portion of file names, which is cryptographically random and not guessable by other processes on the system. This prevents symlink attacks and race conditions. The older approach of constructing /tmp/myapp_12345.tmp from a PID was vulnerable — an attacker could predict the name and create a symlink before your script did. Always use the tempfile module rather than rolling your own temp file names.

Can I use tempfile without the with statement?

Yes, but you lose automatic cleanup. f = tempfile.NamedTemporaryFile(delete=False) gives you the file object, and you must call f.close() and os.unlink(f.name) yourself. A safer pattern for non-context use is try/finally: create the file before the try block, do your work inside it, and put all cleanup in the finally clause so it runs whether the code succeeds or raises an exception.

Conclusion

The tempfile module gives you safe, cross-platform temporary storage with automatic cleanup. You learned NamedTemporaryFile for files with real paths, TemporaryDirectory for scratch workspaces, SpooledTemporaryFile for memory-first buffering, and mkstemp/mkdtemp for manual lifecycle control. The context manager forms are the right default in almost every situation — they eliminate the “forgot to clean up” bug class entirely.

To deepen your understanding, try adding error injection to the pipeline example — raise an exception mid-processing and confirm that the temp directory is still cleaned up. Then extend the pipeline to support multiple output formats by writing both JSON and CSV to the temp directory before choosing which to copy as the final output.

For the full API reference, see the official documentation at https://docs.python.org/3/library/tempfile.html.