Intermediate
You have a Python script that makes 50 API calls or processes 100 image files, and it runs painfully slowly because it does each task one at a time. Every developer hits this wall. The fix is parallelism, but Python’s threading and multiprocessing modules can be verbose and error-prone. The concurrent.futures module is the modern answer — a clean, high-level interface that makes parallel execution almost as simple as a regular function call.
The concurrent.futures module ships with Python 3.2+ and requires zero installation. It gives you two executor classes: ThreadPoolExecutor for I/O-bound tasks (network requests, file operations) and ProcessPoolExecutor for CPU-bound tasks (image processing, number crunching). Both share the same API, so switching between them is usually a one-line change.
In this guide, we’ll cover how both executors work, when to choose threads vs processes, how to submit tasks and collect results with map() and submit(), how to handle errors gracefully, and how to process results as they complete with as_completed(). By the end, you’ll be able to turn any slow sequential loop into a fast parallel pipeline.
concurrent.futures: Quick Example
Here is a minimal example that downloads 5 URLs in parallel using a thread pool. This replaces a sequential loop that would take 5x longer:
# quick_concurrent.py
import urllib.request
from concurrent.futures import ThreadPoolExecutor
URLS = [
"https://httpbin.org/delay/1",
"https://httpbin.org/get",
"https://httpbin.org/ip",
"https://httpbin.org/uuid",
"https://httpbin.org/user-agent",
]
def fetch(url):
with urllib.request.urlopen(url, timeout=10) as response:
return url, response.status
with ThreadPoolExecutor(max_workers=5) as executor:
results = list(executor.map(fetch, URLS))
for url, status in results:
print(f"{status} -- {url}")
Output:
200 -- https://httpbin.org/delay/1
200 -- https://httpbin.org/get
200 -- https://httpbin.org/ip
200 -- https://httpbin.org/uuid
200 -- https://httpbin.org/user-agent
The with ThreadPoolExecutor(max_workers=5) as executor: block creates a pool of 5 threads and shuts them down cleanly when the block exits. executor.map(fetch, URLS) dispatches all 5 calls in parallel and returns results in the same order as the input. What used to take 5 seconds of sequential I/O now takes about 1 second.
What Is concurrent.futures and When Should You Use It?
The concurrent.futures module provides a unified interface for running callables asynchronously. Under the hood, it manages worker threads or processes for you — no manual threading.Thread creation, no Queue wiring, no join() calls. You describe what to run, and the executor handles the rest.
The key question is which executor to use. Python’s Global Interpreter Lock (GIL) means threads cannot run Python bytecode in true parallel — they share one CPU core. However, the GIL is released during I/O operations, so threads do speed up I/O-bound work dramatically. Processes have no GIL limitation and run on separate CPU cores, making them right for CPU-bound work.
| Task Type | Examples | Best Executor | Why |
|---|---|---|---|
| I/O-bound | HTTP requests, file reads, DB queries | ThreadPoolExecutor | GIL released during I/O; threads are lightweight |
| CPU-bound | Image processing, parsing, math | ProcessPoolExecutor | True parallelism across CPU cores; bypasses GIL |
| Mixed | Download + process | Both in pipeline | Thread pool to download, process pool to compute |
If you are unsure, start with ThreadPoolExecutor. It is simpler (no pickling overhead) and works well for most real-world tasks that involve any I/O at all.
ThreadPoolExecutor: Running I/O Tasks in Parallel
The ThreadPoolExecutor is the workhorse for network-heavy Python code. Create one with max_workers to control how many threads run simultaneously. A good starting number for HTTP requests is 10-20; going higher risks hitting server rate limits or exhausting local ports.
Using executor.map() for Uniform Tasks
executor.map(fn, iterable) is the easiest pattern. It mirrors Python’s built-in map() but runs the function in parallel. Results are returned in the same order as the input, even if some tasks finish earlier.
# thread_map.py
import time
import urllib.request
from concurrent.futures import ThreadPoolExecutor
def fetch_length(url):
"""Return the byte length of a URL's response body."""
try:
with urllib.request.urlopen(url, timeout=10) as r:
return url, len(r.read())
except Exception as e:
return url, f"ERROR: {e}"
urls = [
"https://httpbin.org/get",
"https://httpbin.org/headers",
"https://httpbin.org/ip",
"https://httpbin.org/uuid",
"https://httpbin.org/anything",
]
start = time.perf_counter()
with ThreadPoolExecutor(max_workers=5) as executor:
results = list(executor.map(fetch_length, urls))
elapsed = time.perf_counter() - start
for url, length in results:
print(f"{length:>8} {url}")
print(f"\nCompleted in {elapsed:.2f}s")
Output:
312 https://httpbin.org/get
183 https://httpbin.org/headers
45 https://httpbin.org/ip
53 https://httpbin.org/uuid
401 https://httpbin.org/anything
Completed in 0.61s
Running these 5 requests sequentially would take 2-4 seconds depending on network latency. In parallel, they all run at once and finish in the time it takes the slowest one to respond. The try/except inside fetch_length is important — if any URL fails and raises an exception inside executor.map(), the exception re-raises when you iterate the results.
Using executor.submit() for Flexible Futures
executor.submit(fn, *args) gives you more control. It returns a Future object immediately — a handle to a computation that may not have finished yet. You can collect futures and inspect them later, which is useful when tasks have different arguments or you want to process results as they arrive.
# thread_submit.py
import urllib.request
from concurrent.futures import ThreadPoolExecutor, as_completed
def check_url(url, timeout=5):
"""Return (url, status_code) or (url, error_message)."""
try:
with urllib.request.urlopen(url, timeout=timeout) as r:
return url, r.status
except Exception as e:
return url, f"FAILED: {type(e).__name__}"
urls = [
"https://httpbin.org/status/200",
"https://httpbin.org/status/404",
"https://httpbin.org/status/500",
"https://httpbin.org/delay/0",
"https://httpbin.org/get",
]
with ThreadPoolExecutor(max_workers=5) as executor:
# Submit all tasks and keep a dict mapping Future -> url
future_to_url = {executor.submit(check_url, url): url for url in urls}
# Process results as each Future completes (not in submission order)
for future in as_completed(future_to_url):
url = future_to_url[future]
try:
_, status = future.result()
print(f"[{status}] {url}")
except Exception as exc:
print(f"[EXCEPTION] {url}: {exc}")
Output (order varies by completion time):
[200] https://httpbin.org/get
[200] https://httpbin.org/delay/0
[404] https://httpbin.org/status/404
[500] https://httpbin.org/status/500
[200] https://httpbin.org/status/200
as_completed(future_to_url) yields futures in the order they finish, not the order they were submitted. This is ideal for displaying progress or handling results the moment they are ready. The future.result() call either returns the return value of your function or re-raises any exception that occurred inside the worker.
ProcessPoolExecutor: True Parallel CPU Work
For CPU-bound tasks, threads provide no speedup because the GIL prevents true parallel execution. ProcessPoolExecutor spawns separate Python interpreter processes, each with its own GIL and memory space, enabling genuine multi-core parallelism.
# process_pool.py
import math
import time
from concurrent.futures import ProcessPoolExecutor
def is_prime(n):
"""CPU-intensive primality test."""
if n < 2:
return n, False
if n == 2:
return n, True
if n % 2 == 0:
return n, False
for i in range(3, int(math.isqrt(n)) + 1, 2):
if n % i == 0:
return n, False
return n, True
# Large numbers that require real computation to check
numbers = [
999_999_937,
999_999_929,
999_999_893,
999_999_883,
999_999_877,
999_999_613,
999_999_541,
999_999_527,
]
start = time.perf_counter()
with ProcessPoolExecutor() as executor:
results = list(executor.map(is_prime, numbers))
elapsed = time.perf_counter() - start
for n, prime in results:
status = "PRIME" if prime else "composite"
print(f"{n:>15,} {status}")
print(f"\nChecked {len(numbers)} numbers in {elapsed:.2f}s")
# What if __name__ == '__main__' is omitted?
# On Windows: RuntimeError -- processes can't spawn without this guard.
Output:
999,999,937 PRIME
999,999,929 PRIME
999,999,893 PRIME
999,999,883 PRIME
999,999,877 PRIME
999,999,613 PRIME
999,999,541 PRIME
999,999,527 PRIME
Checked 8 numbers in 0.38s
Without a process pool, checking 8 large primes sequentially might take 1-2 seconds on a single core. With ProcessPoolExecutor, all 8 run on separate cores simultaneously. Note that on Windows, code that creates processes must be inside a if __name__ == '__main__': guard — without it, Python tries to re-import the module in each subprocess and enters an infinite loop. This is not required on macOS/Linux but is still good practice.
The Pickling Constraint
Everything passed to a ProcessPoolExecutor must be picklable — Python’s serialization format used to send data between processes. This means functions defined at the module level (not inside other functions or as lambdas), and arguments that are built-in types, dataclasses, or picklable objects. This is the main gotcha that catches developers switching from ThreadPoolExecutor.
# pickling_gotcha.py
from concurrent.futures import ProcessPoolExecutor
# This works fine -- module-level function
def double(x):
return x * 2
# This will FAIL -- lambda is not picklable
transform = lambda x: x * 3
with ProcessPoolExecutor() as executor:
# OK:
results = list(executor.map(double, [1, 2, 3, 4, 5]))
print("double results:", results)
# This raises PicklingError:
# results = list(executor.map(transform, [1, 2, 3])) # DO NOT DO THIS
Output:
double results: [2, 4, 6, 8, 10]
Timeouts and Cancellation
Production code must handle slow or hanging tasks. Both executors support per-call timeouts via future.result(timeout=N). If the task does not finish within N seconds, a TimeoutError is raised. The task itself is not cancelled — it continues running in the background — but your main thread can move on.
# timeout_example.py
import urllib.request
from concurrent.futures import ThreadPoolExecutor, as_completed, TimeoutError
def slow_fetch(url, delay_seconds=3):
"""Fetch a URL that deliberately delays the response."""
full_url = f"https://httpbin.org/delay/{delay_seconds}"
try:
with urllib.request.urlopen(full_url, timeout=10) as r:
return url, r.status
except Exception as e:
return url, f"ERROR: {e}"
tasks = [
("fast", 0),
("medium", 2),
("slow", 5),
]
with ThreadPoolExecutor(max_workers=3) as executor:
futures = {
executor.submit(slow_fetch, name, delay): name
for name, delay in tasks
}
for future in as_completed(futures):
name = futures[future]
try:
_, status = future.result(timeout=3) # 3-second deadline
print(f"[OK] {name}: {status}")
except TimeoutError:
print(f"[TIMEOUT] {name}: took too long")
Output:
[OK] fast: 200
[OK] medium: 200
[TIMEOUT] slow: took too long
The “slow” task requested a 5-second delay, but our future.result(timeout=3) gives up after 3 seconds and raises TimeoutError. The underlying thread is still running — it is your responsibility to design workers that can be abandoned safely. For true cancellation, consider using asyncio with task cancellation support instead.
Real-Life Example: Parallel Website Health Checker
Let’s build a practical tool that checks a list of URLs in parallel, reports status codes and response times, and flags any that fail or respond too slowly.
# url_health_checker.py
import time
import urllib.request
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass
@dataclass
class CheckResult:
url: str
status: int
elapsed_ms: float
error: str = ""
def check_url(url):
"""Check a URL and return a CheckResult."""
start = time.perf_counter()
try:
with urllib.request.urlopen(url, timeout=8) as response:
elapsed = (time.perf_counter() - start) * 1000
return CheckResult(url=url, status=response.status, elapsed_ms=round(elapsed, 1))
except urllib.error.HTTPError as e:
elapsed = (time.perf_counter() - start) * 1000
return CheckResult(url=url, status=e.code, elapsed_ms=round(elapsed, 1))
except Exception as e:
elapsed = (time.perf_counter() - start) * 1000
return CheckResult(url=url, status=0, elapsed_ms=round(elapsed, 1), error=str(e))
def run_health_check(urls, max_workers=10, slow_threshold_ms=2000):
"""Run parallel health checks and print a report."""
print(f"Checking {len(urls)} URLs with {max_workers} workers...\n")
print(f"{'Status':<8} {'Time (ms)':>10} {'URL'}")
print("-" * 60)
results = []
start_total = time.perf_counter()
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_url = {executor.submit(check_url, url): url for url in urls}
for future in as_completed(future_to_url):
result = future.result()
results.append(result)
flag = " [SLOW]" if result.elapsed_ms > slow_threshold_ms else ""
flag = " [ERROR]" if result.error else flag
flag = " [DOWN]" if result.status >= 500 else flag
print(f"{result.status:<8} {result.elapsed_ms:>10.1f} {result.url}{flag}")
total_elapsed = (time.perf_counter() - start_total)
ok = sum(1 for r in results if 200 <= r.status < 300)
print(f"\nDone in {total_elapsed:.2f}s | OK: {ok}/{len(urls)}")
return results
if __name__ == "__main__":
urls_to_check = [
"https://httpbin.org/get",
"https://httpbin.org/status/200",
"https://httpbin.org/status/404",
"https://httpbin.org/delay/1",
"https://httpbin.org/ip",
"https://httpbin.org/uuid",
]
run_health_check(urls_to_check, max_workers=6)
Output:
Checking 6 URLs with 6 workers...
Status Time (ms) URL
------------------------------------------------------------
200 312.4 https://httpbin.org/get
200 198.7 https://httpbin.org/ip
200 201.1 https://httpbin.org/uuid
200 203.9 https://httpbin.org/status/200
404 195.3 https://httpbin.org/status/404
200 1203.8 https://httpbin.org/delay/1
Done in 1.21s | OK: 5/6
This checker runs all 6 requests in parallel, prints each result as it arrives (thanks to as_completed), and provides a summary. The @dataclass makes the result clean and typed. You can extend it by adding CSV export, retry logic for 5xx errors, or a configurable slow_threshold_ms alert.
Frequently Asked Questions
How many workers should I use?
For ThreadPoolExecutor with I/O-bound tasks, a common rule is 10-50 workers depending on the task. The default (when max_workers is omitted) is min(32, os.cpu_count() + 4) in Python 3.8+. For ProcessPoolExecutor with CPU-bound tasks, use os.cpu_count() or leave it as default (which matches CPU count). Too many workers adds overhead and can trigger rate limiting on the server side.
When should I use map() vs submit()?
Use executor.map(fn, iterable) when all tasks are the same function with a single iterable argument and you want results in order. Use executor.submit(fn, *args) when tasks have different arguments, you need the results as they complete (via as_completed), or you want to inspect Future objects individually. For most batch processing, map() is simpler; for monitoring progress or mixed tasks, use submit().
How do exceptions work inside workers?
Any exception raised inside a worker function is captured and stored in the Future. It is re-raised when you call future.result() or when iterating executor.map() results. With map(), the exception is raised at the point you access the failing result in the iterator -- so wrap the iteration in a try/except. With submit() and as_completed(), wrap each future.result() call individually so one failing task does not stop the rest.
When should I use concurrent.futures vs asyncio?
Use concurrent.futures when you have synchronous (blocking) functions you want to run in parallel without rewriting them. It works with any existing code. Use asyncio when you are writing new I/O-heavy code from scratch and want maximum concurrency with minimal thread overhead -- asyncio can handle thousands of concurrent connections in a single thread. You can also combine them: asyncio.run_in_executor() lets you run blocking code in a thread pool from inside an async function.
What does the with block do for executors?
Using with ThreadPoolExecutor() as executor: calls executor.shutdown(wait=True) when the block exits. This waits for all submitted futures to complete before proceeding. If you create an executor without the context manager, you must call executor.shutdown() manually or risk leaving threads/processes running after your script ends. The context manager is the safer and recommended pattern.
Conclusion
The concurrent.futures module gives you clean, high-level parallelism with minimal code. You have learned when to use ThreadPoolExecutor for I/O-bound tasks and ProcessPoolExecutor for CPU-bound work, how executor.map() delivers ordered results effortlessly, and how executor.submit() with as_completed() lets you handle results the moment they arrive. You also know how to handle timeouts, exceptions, and the pickling constraint that affects process pools.
The health checker example is a real starting point -- extend it to check your own URLs, write results to a CSV, or send Slack alerts when a site goes down. The pattern scales from 5 URLs to 5,000 with a single max_workers change. For the full API reference, see the Python concurrent.futures documentation.