Intermediate
Sometimes Python alone isn’t enough — you need to run a shell command, launch another program, or query your operating system directly from a script. Maybe you want to zip a directory, ping a server, run a linter, or call a tool that only exists as a command-line binary. Python’s subprocess module is exactly what you need for all of these tasks, and it’s built right into the standard library.
The subprocess module lets you spawn new processes, connect to their input/output/error pipes, and obtain their return codes. It replaced the older os.system() and os.popen() functions with a cleaner, more powerful API. The primary function you’ll use is subprocess.run(), introduced in Python 3.5, which covers the vast majority of use cases. No third-party packages required — just import subprocess.
In this article we’ll cover the core subprocess.run() function, capturing standard output and error, handling return codes and exceptions, running commands with shell features, using Popen for advanced control, and building a real-world disk usage scanner. By the end, you’ll be able to integrate shell commands seamlessly into any Python script.
Running a Command: Quick Example
Let’s get a win immediately. Here’s how to run a simple command and capture its output in three lines of Python:
# quick_subprocess.py
import subprocess
result = subprocess.run(['echo', 'Hello from subprocess!'], capture_output=True, text=True)
print(result.stdout)
print('Return code:', result.returncode)
Output:
Hello from subprocess!
Return code: 0
We pass the command as a list of strings (the command and its arguments separately), set capture_output=True to capture stdout and stderr, and text=True to decode the output as a string instead of bytes. A return code of 0 means the command succeeded. We’ll dive deeper into each of these parameters below.
What Is subprocess and When Should You Use It?
The subprocess module is Python’s interface for spawning external processes — programs that run independently from your Python interpreter but whose output and status you can monitor and collect. Think of it as Python holding a terminal in one hand, running a command, and handing you the results.
Common use cases include: running shell utilities (ls, grep, curl), calling compiled binaries or other language tools, automating build pipelines, interacting with version control (git commands), and checking system status (disk usage, process lists, network info).
| Approach | When to Use | Notes |
|---|---|---|
subprocess.run() | Most use cases | Waits for process to finish, returns CompletedProcess |
subprocess.Popen() | Need streaming I/O or async control | Low-level, non-blocking |
os.system() | Legacy code only | No output capture, avoid in new code |
os.popen() | Legacy code only | Deprecated, no error handling |
In virtually all new code, use subprocess.run(). The older os.system() and os.popen() functions are still available but offer no output capture, no error handling, and no return codes — they’re strictly inferior.
Capturing stdout and stderr
Capturing command output is the most common subprocess task. You need to see what a command printed before your script can act on it — whether you’re parsing a list of files, checking a version number, or validating command output.
# capture_output.py
import subprocess
# Run 'python --version' and capture the output
result = subprocess.run(['python3', '--version'], capture_output=True, text=True)
print('stdout:', result.stdout.strip())
print('stderr:', result.stderr.strip())
print('returncode:', result.returncode)
Output:
stdout: Python 3.12.0
stderr:
returncode: 0
When capture_output=True, both stdout and stderr are captured as strings (with text=True) and stored in the result.stdout and result.stderr attributes. Without capture_output=True, output goes directly to the terminal and you can’t read it in Python.
Capturing Error Output
Some commands write their useful output to stderr (like ffmpeg, git, and many C tools). Always capture both streams:
# capture_stderr.py
import subprocess
# A command that writes to stderr -- 'ls' on a nonexistent directory
result = subprocess.run(
['ls', '/nonexistent/path'],
capture_output=True,
text=True
)
print('stdout:', repr(result.stdout))
print('stderr:', repr(result.stderr))
print('returncode:', result.returncode)
Output:
stdout: ''
stderr: "ls: cannot access '/nonexistent/path': No such file or directory\n"
returncode: 2
A non-zero return code (here, 2) signals that the command failed. The error message landed in stderr, not stdout — which is why capturing both is important.
Handling Errors and Return Codes
Checking whether a subprocess succeeded is essential for any production script. There are two patterns: checking returncode manually, or using check=True to raise an exception automatically on failure.
# handle_errors.py
import subprocess
# Pattern 1: Check return code manually
result = subprocess.run(['ls', '/tmp'], capture_output=True, text=True)
if result.returncode == 0:
print('Files in /tmp:')
print(result.stdout[:200])
else:
print('Error:', result.stderr)
# Pattern 2: Raise exception on failure (CalledProcessError)
try:
subprocess.run(['ls', '/nonexistent'], capture_output=True, text=True, check=True)
except subprocess.CalledProcessError as e:
print(f'Command failed with code {e.returncode}: {e.stderr.strip()}')
Output:
Files in /tmp:
tmpXYZ123
tmpabc456
Command failed with code 2: ls: cannot access '/nonexistent': No such file or directory
Use check=True when failure should abort your script. Use manual return code checking when you want to handle errors gracefully and keep running. The CalledProcessError exception carries .returncode, .stdout, and .stderr for full context.
Running Shell Commands with shell=True
Sometimes you need shell features: pipes (|), redirects (>), glob expansion (*.txt), or environment variable substitution. Passing a command string with shell=True runs it through the system shell, giving you access to all of these.
# shell_true.py
import subprocess
# Using shell=True to pipe commands together
result = subprocess.run(
'echo "line1\nline2\nline3" | wc -l',
shell=True,
capture_output=True,
text=True
)
print('Line count:', result.stdout.strip())
# Using shell=True for glob expansion
result2 = subprocess.run(
'ls /tmp/*.log 2>/dev/null | head -5',
shell=True,
capture_output=True,
text=True
)
print('Log files:', result2.stdout.strip() or 'None found')
Output:
Line count: 3
Log files: None found
When using shell=True, pass the command as a single string rather than a list. This is convenient for complex shell pipelines but carries a security risk: if any part of the string comes from user input, an attacker could inject malicious commands. For production code, always use the list form when possible and reserve shell=True for controlled, developer-written strings.
Setting Timeouts
External commands can hang — a network request that never finishes, a process waiting for user input, or a badly-behaved tool. The timeout parameter raises subprocess.TimeoutExpired if the process doesn’t complete in time.�����O���O��[Y[�]�^[\K�B�[\ܝ�X����\���N����H��[�H��[X[�KH�]H�\�X�ۙ[Y[�]��\�[H�X����\�˜�[�����Y\ � �L �K��\\�W��]]U�YK�^U�YK�[Y[�]L
B�^�\�X����\�˕[Y[�]^\�Y\�N���[�
����[X[�[YY�]Y�\��K�[Y[�]H�X�ۙ��B��[�
��\�X[��]��K���]I�B����O���O������ۙϓ�]]����ۙϏ����O���O���[X[�[YY�]Y�\���X�ۙ\�X[��]��ۙB����O���O����[�^\��][Y[�]��[��[[���]�ܚ����^\��[T\�܈[�H���\��]ZY�����[�Y�[�][K�H�\�X�ۙ[Y[�]\�H�X\�ۘX�HY�][�܈[���]�ܚ��\�][ۜ��Y�\��\�Yۈ[�\�^X�Y�\�][ۈ[YK�����KKHSPQ�W�P�R�T��\��]H�]�[��H��[��ۈ����ۈH\�Z[�[�ܙY[��]�[Y\�[��X��ܛ�[���\[ێ��[Y[�]^\�Y�H���\���ZY ��Y\L �[�[�H�ZY ��Y\�]�\�ˈ�KO����YH��[���Y�[��Y����]�[��������܈�\�\��\�H[�H�YY�[�\�X��]H���\���[H] ���[��[��KH��X[Z[���]][�H�H[�Kܚ][���]��[�܈�[��[��][�H�X��ܛ�[�KH\�H��O��X����\�˔�[����O�\�X�K���O��X����\�˜�[�
O���O�\��Z[ۈ�و��O��[����O�[��Z]��܈H���\����[�\����O��[����O��]�\�[�H�ۋX����[�����������O���O���[����X[Z[�˜B�[\ܝ�X����\�����X[H�]][�H�H[�H\�H���\���[����\��H�X����\�˔�[����[��� �X�� � � ���� �K���]\�X����\�˔TK��\��\�X����\�˔TK�^U�YB�B���[�
���X[Z[��[���]]��B��܈[�H[����\�˜��]���[�
� �[�K���\
JB�����\�˝�Z]
H��Z]�܈���\����[�\���[�
�[����\]K��]\����N�����\�˜�]\����JB����O���O������ۙϓ�]]����ۙϏ����O���O���X[Z[��[���]]��S�����
���
N�
M�]H�]\
��]\����H����X�\��\OLMMH[YOLL���\
��]\����H����X�\��\OLHMMH[YOLLK�\
��]\����H����X�\��\OL�MMH[YOLL��H\
��]\����H����X�\��\OL�MMH[YOLL��\[����\]K��]\����N�����O���O����H�^HY��\�[��N��]��O��[����O��]]�����[�\�]ۈ��H\�H���\����X�\�]�]\�[��Z[���Y��\�Y[�[H���\��^]ˈ\�\�\��[�X[�܈ۙ�\�[��[����[X[����ܙ\���\ܝ[��܈[�\�X�]�H���\��\��\�H[�H�YY��\�ۙ��]][��X[[YK������YH�[��\�ۛY[���\��[��[��\�ۛY[��\�XX�\�������[�H�[�\���\��H[��\�ۛY[��\�XX�\���X����\��\�\�[��H��O�[�����O�\�[Y]\��\�\�\�Y�[�܈\��[��TH�^\��ۙ�Y�\�][ۈ�Y��܈UY�\�Y[���]�][�Y�Z[��[�\��[[��\�ۛY[�\�X[�[�K������O���O��[��ݘ\�XX�\˜B�[\ܝ�X����\�[\ܝ���\���\��H[��\�ۛY[��H�X����\��\��W�[��H�˙[��\�ۋ���J
H��\��]�\��[�[����\��W�[����VW�T�S�I�HH ���X�[ۉ�\��W�[����P�Q��HH �
result = subprocess.run(
[‘env’],
capture_output=True,
text=True,
env=custom_env
)
# Filter and show just our custom vars
for line in result.stdout.splitlines():
if ‘MY_APP_MODE’ in line or ‘DEBUG=0’ in line:
print(line)
Output:
MY_APP_MODE=production
DEBUG=0
Always start from os.environ.copy() rather than building a fresh dict — most programs need PATH and other standard variables to function at all. Add or override specific keys on top of the copy.
Real-Life Example: Disk Usage Scanner
Here’s a practical script that uses subprocess to scan directories for disk usage, sorts them by size, and formats the results in a human-readable report. This is the kind of system automation tool where subprocess truly shines.
# disk_scanner.py
import subprocess
import sys
import os
def get_disk_usage(path='.'):
"""Return a list of (size_bytes, path) tuples for top-level items in path."""
try:
result = subprocess.run(
['du', '-sb', '--max-depth=1', path],
capture_output=True,
text=True,
timeout=30,
check=True
)
except subprocess.CalledProcessError as e:
print(f'Error scanning {path}: {e.stderr.strip()}')
return []
except subprocess.TimeoutExpired:
print(f'Scan timed out for {path}')
return []
items = []
for line in result.stdout.strip().splitlines():
parts = line.split('\t', 1)
if len(parts) == 2:
size_bytes = int(parts[0])
dir_path = parts[1]
if dir_path != path: # Skip the total line
items.append((size_bytes, dir_path))
return sorted(items, reverse=True)
def format_size(size_bytes):
"""Convert bytes to human-readable string."""
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
if size_bytes < 1024:
return f'{size_bytes:.1f} {unit}'
size_bytes /= 1024
return f'{size_bytes:.1f} PB'
def print_report(path='.', top_n=10):
"""Print a disk usage report for the given directory."""
scan_path = os.path.abspath(path)
print(f'\nDisk Usage Report: {scan_path}')
print('-' * 50)
items = get_disk_usage(scan_path)
if not items:
print('No items found or scan failed.')
return
for i, (size_bytes, item_path) in enumerate(items[:top_n], start=1):
name = os.path.basename(item_path)
size_str = format_size(size_bytes)
bar = '#' * min(int(size_bytes / items[0][0] * 30), 30)
print(f'{i:2}. {size_str:>10} {bar:<30} {name}')
total = sum(s for s, _ in items)
print('-' * 50)
print(f'Total (top {min(top_n, len(items))} items): {format_size(total)}')
if __name__ == '__main__':
scan_dir = sys.argv[1] if len(sys.argv) > 1 else '/tmp'
print_report(scan_dir, top_n=10)
Output:
Disk Usage Report: /tmp
--------------------------------------------------
1. 2.4 MB ############################## cache
2. 512.0 KB ###### logs
3. 64.0 KB # tmp_abc123
--------------------------------------------------
Total (top 3 items): 2.9 MB
This script demonstrates several subprocess best practices in one place: always passing check=True and timeout, handling both CalledProcessError and TimeoutExpired, using the list form of the command, and building a clean output format from parsed command output. You can extend this with email alerts when disk usage exceeds a threshold, or integrate it into a monitoring dashboard.
Frequently Asked Questions
Should I pass the command as a list or a string?
Use a list in almost all cases: ['ls', '-la', '/tmp'] instead of 'ls -la /tmp'. The list form avoids shell injection risks and handles arguments with spaces correctly without needing to escape them. Only use a string when you need shell features like pipes or glob expansion, and pair it with shell=True.
Is shell=True dangerous?
It can be, if the command string includes any input from users or external sources. An attacker could inject ; rm -rf / into a shell command and it would execute. When you control the entire command string yourself (no user input), shell=True is fine. When any part of the string is dynamic, sanitize it carefully or restructure using the list form instead.
How do I pass input to a subprocess?
Use the input parameter of subprocess.run(): subprocess.run(['cat'], input='hello world', text=True, capture_output=True). The string is written to the process’s stdin. For interactive processes that need back-and-forth communication, use Popen with communicate() or explicit stdin.write() and stdout.read().
How do I run a process in the background?
Use Popen without calling wait(): proc = subprocess.Popen(['my_long_process']). The process runs independently while your Python script continues. Call proc.poll() to check if it’s finished, proc.wait() to block until it finishes, or proc.terminate() to stop it. This is useful for parallel jobs or launching daemon processes.
Does subprocess work the same on Windows?
Mostly yes, with some differences. On Windows, many Unix commands (ls, grep, echo) aren’t available by default — use dir instead of ls, or install Git Bash/WSL. The shell=True parameter uses cmd.exe on Windows instead of /bin/sh. For cross-platform scripts, consider using Python’s pathlib, os, and shutil modules for filesystem operations instead of shelling out to OS-specific commands.
Conclusion
Python’s subprocess module is a powerful bridge between your Python code and the operating system. We covered subprocess.run() for most use cases, capturing stdout and stderr with capture_output=True, handling errors via return codes and check=True, running shell pipelines with shell=True, preventing hangs with timeout, streaming output with Popen, and passing custom environment variables. These tools cover roughly 95% of real-world subprocess needs.
Try extending the disk usage scanner to send an email alert when usage exceeds 80%, or build a deployment script that chains together git pull, pip install, and service restart commands with proper error handling at each step. The subprocess module makes all of this straightforward.
For the full API reference including communicate(), DEVNULL, and platform-specific notes, see the official subprocess documentation.