Intermediate

Python’s Global Interpreter Lock — the GIL — is the reason your multi-threaded Python code doesn’t actually run in parallel. Since CPython’s early days, the GIL has ensured that only one thread executes Python bytecode at a time, making the interpreter simpler and C extension development safer, but effectively capping CPU-bound throughput at one core. Developers who need real parallelism have always had to reach for multiprocessing (high overhead), concurrent.futures.ProcessPoolExecutor (still processes), or abandon CPython entirely for PyPy or Jython. It’s been a persistent, well-known limitation that every Python developer learns to work around.

Python 3.13 changes the equation. It ships an experimental free-threaded build — a separate CPython variant compiled with --disable-gil — that removes the GIL and allows multiple threads to genuinely execute Python bytecode in parallel on multiple CPU cores. This is PEP 703’s “Making the Global Interpreter Lock Optional in CPython,” and it’s the most significant change to Python’s threading model in the language’s history. The free-threaded build is marked experimental in 3.13 and 3.14, meaning some C extensions won’t work yet and there’s a modest single-threaded performance overhead, but it’s production-ready for CPU-bound workloads that can tolerate those constraints.

In this article we’ll cover what the GIL is and why removing it matters, how to install and use the free-threaded Python build, how to benchmark the difference, what breaks or changes without the GIL, and when free-threading actually helps (and when it doesn’t). By the end you’ll know exactly when to reach for the free-threaded build and how to write thread-safe code that takes advantage of it.

Free-Threaded Python: Quick Example

Here’s a benchmark that shows the core difference. Run this first with standard Python 3.13, then with the free-threaded build (python3.13t):

# threading_benchmark.py
import threading
import time
import sys

def cpu_work(n: int) -> int:
    """CPU-bound work: count down from n."""
    total = 0
    for i in range(n):
        total += i * i
    return total

WORKLOAD = 5_000_000
THREADS = 4

# Sequential baseline
start = time.perf_counter()
for _ in range(THREADS):
    cpu_work(WORKLOAD)
seq_time = time.perf_counter() - start

# Parallel threads
threads = [threading.Thread(target=cpu_work, args=(WORKLOAD,)) for _ in range(THREADS)]
start = time.perf_counter()
for t in threads: t.start()
for t in threads: t.join()
thread_time = time.perf_counter() - start

gil_status = "DISABLED" if not sys._is_gil_enabled() else "ENABLED"
print(f"Python {sys.version.split()[0]} | GIL: {gil_status}")
print(f"Sequential ({THREADS}x): {seq_time:.2f}s")
print(f"Parallel ({THREADS} threads): {thread_time:.2f}s")
print(f"Speedup: {seq_time / thread_time:.2f}x")

Output (standard Python 3.13, GIL enabled):

Python 3.13.0 | GIL: ENABLED
Sequential (4x): 3.84s
Parallel (4 threads): 3.91s
Speedup: 0.98x

Output (free-threaded Python 3.13t, GIL disabled):

Python 3.13.0 | GIL: DISABLED
Sequential (4x): 4.10s
Parallel (4 threads): 1.12s
Speedup: 3.66x

With the GIL enabled, 4 threads doing CPU work takes essentially the same time as doing it sequentially — no benefit at all. With the GIL disabled, you get near-linear scaling across cores. That 3.66x speedup on 4 cores is exactly what we’d expect from true parallelism. The sequential time is slightly higher in the free-threaded build due to the overhead of finer-grained locking, but the parallel case more than compensates.

What Is the GIL and Why Does Removing It Matter?

The Global Interpreter Lock is a mutex — a mutual exclusion lock — that protects CPython’s internal state. CPython uses reference counting for memory management: every Python object carries a count of how many references point to it, and when the count hits zero, the object is freed. Without protection, two threads incrementing and decrementing reference counts simultaneously would corrupt memory. The GIL is the simplest possible solution: only one thread runs Python at a time, so reference counts are always modified by exactly one thread.

The GIL doesn’t block all concurrency. I/O-bound threads (network requests, file reads, database queries) release the GIL while waiting, so multiple threads can make progress concurrently on I/O — that’s why threading is genuinely useful for web scrapers and API clients. But CPU-bound threads never release the GIL voluntarily (except at interval checks), so they can’t make real parallel progress.

Workload typeStandard Python (GIL on)Free-threaded (GIL off)Better choice
CPU-bound (math, parsing, ML)No parallel speedupNear-linear scalingFree-threaded
I/O-bound (HTTP, files, DB)Good concurrencySame or betterEither works
Mixed CPU + I/OPartial benefitFull benefitFree-threaded
Single-threadedBaseline speed~5-10% slowerStandard Python

Free-threading replaces the single GIL with many smaller, finer-grained locks protecting individual objects and data structures. This is safer for reference counting but adds overhead per-operation. The result is that single-threaded code runs slightly slower in the free-threaded build, while multi-threaded CPU-bound code can be dramatically faster.

Threads queued up waiting for a single lock to release
The GIL: nature’s most passive-aggressive traffic cop

Installing the Free-Threaded Python Build

The free-threaded build is a separate Python binary. On most systems it installs alongside your regular Python and is accessible as python3.13t (the t suffix stands for “free-threaded”). Here’s how to get it on different platforms:

Linux (pyenv)

On Linux, pyenv is the recommended way to manage multiple Python versions side-by-side. It lets you install the free-threaded build without touching your system Python. Note the t suffix in 3.13t — that’s what tells pyenv to install the experimental no-GIL variant specifically.

# install_freethreaded.sh
# Install pyenv if you don't have it already
curl https://pyenv.run | bash

# Install the free-threaded variant (suffix 't')
pyenv install 3.13t

# Activate it globally or per-project
pyenv global 3.13t
python --version     # Python 3.13.x (free-threaded build)
python -c "import sys; print('GIL:', sys._is_gil_enabled())"

Output:

Python 3.13.2
GIL: False

macOS (python.org installer)

On macOS, the official python.org installer is the simplest path. During installation you’ll see a checkbox labelled “Install free-threaded Python 3.13” — make sure to tick it. This installs both python3.13 (standard) and python3.13t (no-GIL) as separate binaries, so you can use either one on a per-project basis.

# install_freethreaded_mac.sh
# Download the macOS installer from python.org/downloads
# During install, check "Install free-threaded Python 3.13" option
# This gives you both python3.13 and python3.13t

# Verify
python3.13t --version
python3.13t -c "import sys; print('GIL:', sys._is_gil_enabled())"

Windows

On Windows, download the installer from python.org and make sure to check both “Add Python to PATH” and “Install free-threaded Python” during setup. This registers both python3.13 and python3.13t with the py.exe launcher so you can select which variant to run per terminal session or project.

# Download from https://www.python.org/downloads/windows/
# Check "Add Python to PATH" and "Install free-threaded Python"
# This adds py.exe launcher entries for both 3.13 and 3.13t

# In PowerShell:
py -3.13t --version
py -3.13t -c "import sys; print('GIL:', sys._is_gil_enabled())"

You can verify at runtime whether the GIL is enabled using sys._is_gil_enabled(). This is useful for diagnostics and writing code that adapts its strategy based on whether it’s running in the free-threaded build. The leading underscore signals that this is a private, unstable API that may change.

Programmer celebrating the removal of the GIL with a shattered padlock
No GIL. No rules. Pure chaos — the productive kind.

Thread Safety Without the GIL

The GIL was a safety blanket. With it gone, code that was accidentally thread-safe because of the GIL may now have race conditions. Python’s built-in types like list, dict, and set remain thread-safe for individual operations (append, update, add) because they have their own internal locks in the free-threaded build. But compound operations — read-then-write sequences — are not atomic and need explicit locking:

# thread_safety.py
import threading
import sys

# UNSAFE: read-modify-write without a lock
counter_unsafe = 0

def increment_unsafe(n: int):
    global counter_unsafe
    for _ in range(n):
        counter_unsafe += 1  # NOT atomic: read, add, write — race condition!

# SAFE: protected with a lock
counter_safe = 0
lock = threading.Lock()

def increment_safe(n: int):
    global counter_safe
    for _ in range(n):
        with lock:
            counter_safe += 1

ITERATIONS = 100_000
THREADS = 4

# Run unsafe version
counter_unsafe = 0
threads = [threading.Thread(target=increment_unsafe, args=(ITERATIONS,)) for _ in range(THREADS)]
for t in threads: t.start()
for t in threads: t.join()

# Run safe version
counter_safe = 0
threads = [threading.Thread(target=increment_safe, args=(ITERATIONS,)) for _ in range(THREADS)]
for t in threads: t.start()
for t in threads: t.join()

expected = THREADS * ITERATIONS
print(f"GIL enabled: {sys._is_gil_enabled()}")
print(f"Expected: {expected:,}")
print(f"Unsafe counter: {counter_unsafe:,} (correct: {counter_unsafe == expected})")
print(f"Safe counter:   {counter_safe:,}   (correct: {counter_safe == expected})")

Output (free-threaded build):

GIL enabled: False
Expected: 400,000
Unsafe counter: 287,453 (correct: False)
Safe counter:   400,000   (correct: True)

The unsafe counter loses updates because counter += 1 expands to three bytecode instructions (LOAD, BINARY_ADD, STORE), and another thread can execute between any of them. With the GIL, thread switches only happened at specific “safe points,” which accidentally protected this pattern in most cases. Without the GIL, real data races happen. Use threading.Lock(), threading.RLock(), or queue.Queue for any shared mutable state.

When Free-Threading Actually Helps

The free-threaded build shines in specific workloads. Understanding which workloads benefit helps you decide whether the switch is worth it:

# workload_comparison.py
import threading
import time
import sys
import math

def cpu_bound_task(n: int) -> float:
    """Pure CPU: computing square roots and logs."""
    result = 0.0
    for i in range(1, n):
        result += math.sqrt(i) + math.log(i)
    return result

def run_parallel(task, n: int, num_threads: int) -> float:
    threads = [threading.Thread(target=task, args=(n,)) for _ in range(num_threads)]
    start = time.perf_counter()
    for t in threads: t.start()
    for t in threads: t.join()
    return time.perf_counter() - start

def run_sequential(task, n: int, num_threads: int) -> float:
    start = time.perf_counter()
    for _ in range(num_threads):
        task(n)
    return time.perf_counter() - start

N = 500_000
THREADS = 4

seq = run_sequential(cpu_bound_task, N, THREADS)
par = run_parallel(cpu_bound_task, N, THREADS)

print(f"GIL: {'ON' if sys._is_gil_enabled() else 'OFF'}")
print(f"Sequential: {seq:.3f}s")
print(f"Parallel:   {par:.3f}s")
print(f"Speedup:    {seq/par:.2f}x")

Output (GIL ON):

GIL: ON
Sequential: 2.41s
Parallel:   2.38s
Speedup:    1.01x

Output (GIL OFF):

GIL: OFF
Sequential: 2.58s
Parallel:   0.70s
Speedup:    3.69x

For CPU-bound work, the free-threaded build delivers near-linear scaling. The sequential time is about 7% slower (the cost of finer-grained locking), but the parallel case is 3.7x faster on 4 cores — a net win any time you’re running multiple CPU-bound threads. Single-threaded scripts will run marginally slower, so for command-line tools and scripts that don’t use threading, stick with the standard build.

Multiple workers processing tasks in parallel on conveyor belts
When all your threads finally get to work at the same time

C Extensions and Compatibility

The biggest practical limitation of the free-threaded build in 3.13 is C extension compatibility. Extensions compiled against the standard CPython ABI need to be recompiled for the free-threaded ABI, and many popular packages haven’t done this yet. Check for cp313t wheels on PyPI, or look up package status at py-free-threading.github.io. Major scientific Python packages (NumPy, pandas, SciPy, Pillow) shipped free-threaded wheels for 3.13, and the ecosystem is catching up quickly for 3.14.

Real-Life Example: Parallel Image Processing

Here’s a realistic use case where free-threading shines: processing a batch of images in parallel. With the standard build you’d need multiprocessing for CPU parallelism. With the free-threaded build, threads are enough:

# parallel_image_processing.py
import threading
import time
import sys
import math
from dataclasses import dataclass
from typing import List

@dataclass
class ImageResult:
    filename: str
    mean_brightness: float
    sharpness_score: float

def simulate_image_processing(filename: str, width: int, height: int) -> ImageResult:
    """Simulate CPU-bound image analysis: brightness and sharpness."""
    pixels = width * height

    # Simulate brightness calculation
    total_brightness = 0.0
    for i in range(pixels):
        r = (i * 73) % 256
        g = (i * 137) % 256
        b = (i * 199) % 256
        total_brightness += 0.299 * r + 0.587 * g + 0.114 * b
    mean_brightness = total_brightness / pixels

    # Simulate sharpness score
    sharpness = sum(
        abs(math.sin(i * 0.1) * math.cos(i * 0.07))
        for i in range(1, min(pixels, 10000))
    ) / 10000

    return ImageResult(filename=filename, mean_brightness=mean_brightness,
                       sharpness_score=sharpness)

def process_batch_threaded(images: List[tuple]) -> List[ImageResult]:
    results = [None] * len(images)

    def worker(idx: int, fname: str, w: int, h: int):
        results[idx] = simulate_image_processing(fname, w, h)

    threads = [threading.Thread(target=worker, args=(i, f, w, h))
               for i, (f, w, h) in enumerate(images)]
    for t in threads: t.start()
    for t in threads: t.join()
    return results

image_batch = [(f"photo_{i:03d}.jpg", 800, 600) for i in range(8)]

start = time.perf_counter()
seq_results = [simulate_image_processing(f, w, h) for f, w, h in image_batch]
seq_time = time.perf_counter() - start

start = time.perf_counter()
thread_results = process_batch_threaded(image_batch)
thread_time = time.perf_counter() - start

print(f"GIL: {'ON' if sys._is_gil_enabled() else 'OFF'}")
print(f"Sequential: {seq_time:.2f}s")
print(f"Threaded:   {thread_time:.2f}s")
print(f"Speedup:    {seq_time/thread_time:.2f}x")
print(f"Sample — {thread_results[0].filename}: brightness={thread_results[0].mean_brightness:.1f}")

Output (free-threaded, 8-core machine):

GIL: OFF
Sequential: 6.84s
Threaded:   0.98s
Speedup:    6.98x
Sample — photo_000.jpg: brightness=127.5

Near-linear scaling across all 8 cores using nothing but threading.Thread — no process spawning, no shared memory setup, no pickling. This is exactly the use case that sent developers to multiprocessing for years.

Frequently Asked Questions

Is free-threaded Python production-ready in 3.13?

The free-threaded build is marked “experimental” in Python 3.13 and 3.14, meaning the core interpreter is functional but C extension support is incomplete. For CPU-bound workloads using well-supported packages (NumPy, SciPy, pandas), it’s practical to evaluate in production. For workloads that depend on packages without free-threaded wheels, wait until those packages ship cp313t support. Check the status at py-free-threading.github.io.

Does asyncio benefit from free-threading?

Not significantly. asyncio is single-threaded by design — it uses cooperative multitasking on one event loop thread, which never competes with other threads for the GIL. If you’re running CPU-bound tasks inside asyncio via loop.run_in_executor() with a ThreadPoolExecutor, those threads will benefit from free-threading. Pure async I/O code sees minimal change.

Should I replace multiprocessing with free-threaded threads?

For CPU-bound parallelism on a single machine, yes — threads in the free-threaded build are simpler, lower-overhead, and share memory naturally. You don’t need to pickle data, spawn new interpreter processes, or use multiprocessing.Queue for communication. For work that needs to scale across multiple machines, or that needs to isolate failures, multiprocessing is still the right choice.

Do I need to change my existing threading code?

Code that already uses proper synchronization (locks, queues, events) should work correctly. Code that relied on the GIL’s accidental thread safety — particularly compound read-modify-write operations on shared variables — may develop race conditions. Run your test suite on the free-threaded build and look for intermittent failures in tests that use global state or counters.

How much slower is single-threaded code in the free-threaded build?

Benchmarks show roughly 5–10% slower single-threaded performance in Python 3.13t compared to 3.13. This is due to finer-grained per-object locking replacing the coarse GIL. The CPython team is actively working to reduce this overhead in 3.14 and beyond — the goal is parity with the standard build by Python 3.15 or 3.16. For pure single-threaded scripts, use the standard build.

Conclusion

Python 3.13’s free-threaded mode is the most significant change to Python’s threading model since the GIL was introduced decades ago. By removing the single global interpreter lock and replacing it with finer-grained per-object locking, CPython finally enables true parallel execution of Python bytecode across multiple CPU cores using the familiar threading module.

We covered what the GIL is and why it limited parallelism, how to install the free-threaded python3.13t build, the thread safety responsibilities that come without the GIL, benchmarks showing near-linear scaling for CPU-bound workloads, and a realistic image processing example showing threads replacing multiprocessing. The ecosystem of free-threaded C extension wheels is growing rapidly.

For deeper reading, see PEP 703 and the Python 3.13 What’s New page on free-threaded CPython.