Beginner

Every real Python project touches the file system sooner or later — reading config files, creating output directories, scanning folders for data files, or reading environment variables set by deployment tools. Without the right tools, these tasks involve fragile hardcoded paths that break when you move the project to another machine or operating system. Python’s built-in os module gives you a portable, consistent interface to file system operations that works the same on Windows, macOS, and Linux.

The os module is part of Python’s standard library — no installation needed. It provides functions for working with file paths, creating and removing directories, listing directory contents, reading environment variables, and running system-level operations. For modern path handling, Python 3.4+ also offers the pathlib module as an object-oriented alternative — but understanding os is essential because you’ll encounter it in virtually every Python codebase.

In this tutorial, you’ll learn how to navigate and inspect the file system with os.path, create and remove directories, list and filter files, read environment variables, walk directory trees recursively, and apply all of this in a practical file organizer project. By the end, you’ll have a solid foundation for writing scripts that interact reliably with the file system on any platform.

Quick Example: File System Basics

Here’s a quick demonstration of the most commonly used os functions:

# os_quick.py
import os

# Current working directory
cwd = os.getcwd()
print(f"Working directory: {cwd}")

# List files and folders
entries = os.listdir('.')
print(f"Entries in current dir: {len(entries)}")

# Build a cross-platform path
config_path = os.path.join(cwd, 'config', 'settings.json')
print(f"Config path: {config_path}")

# Check what exists
print(f"Path exists: {os.path.exists(config_path)}")
print(f"Is file: {os.path.isfile(config_path)}")
print(f"Is directory: {os.path.isdir(os.path.dirname(config_path))}")

# Read an environment variable with a default
home = os.environ.get('HOME', '/tmp')
print(f"Home directory: {home}")

Output:

Working directory: /home/user/myproject
Entries in current dir: 12
Config path: /home/user/myproject/config/settings.json
Path exists: False
Is file: False
Is directory: False
Home directory: /home/user

Notice that os.path.join() assembles paths using the correct separator for your operating system (forward slash on Unix, backslash on Windows). This is the correct way to build file paths — never concatenate strings with hardcoded slashes.

What Is the os Module?

The os module is Python’s interface to operating system functionality. It abstracts over the differences between Windows, macOS, and Linux, letting you write cross-platform code that works on any system. Think of it as a Python wrapper around the file system commands you’d normally type in a terminal.

CategoryFunctionsPurpose
Navigationgetcwd(), chdir()Get or change the current working directory
Listinglistdir(), scandir()List files and directories
Pathsos.path.join(), os.path.exists()Build and inspect paths
Directoriesmkdir(), makedirs(), rmdir()Create and remove directories
Filesremove(), rename()Delete and rename files
Environmentenviron, environ.get()Read environment variables
Walkingwalk()Recursively traverse directories

For the pure path manipulation parts (join, exists, basename, etc.), the newer pathlib module provides a more ergonomic object-oriented interface. But os is still the right tool for operations like walking directories and reading environment variables.

Working with Paths

The os.path submodule contains functions for inspecting and manipulating file paths. These are the most frequently used functions in day-to-day Python scripting.

Core Path Functions

Use these functions to check what exists and extract information from paths without actually opening files:

# path_functions.py
import os

# Build paths portably
base_dir = '/home/user/project'
data_path = os.path.join(base_dir, 'data', 'input.csv')
print(f"Full path: {data_path}")

# Extract parts of a path
print(f"Directory: {os.path.dirname(data_path)}")
print(f"Filename:  {os.path.basename(data_path)}")
name, ext = os.path.splitext(os.path.basename(data_path))
print(f"Name: {name}, Extension: {ext}")

# Expand special shortcuts
home_config = os.path.expanduser('~/.bashrc')
print(f"Home config: {home_config}")

# Absolute path (resolves ./ and ../ references)
relative = '../data/file.txt'
absolute = os.path.abspath(relative)
print(f"Absolute: {absolute}")

# Check file/directory properties
test_path = '/etc/hosts'  # Exists on most Unix systems
print(f"\n/etc/hosts exists: {os.path.exists(test_path)}")
print(f"Is file: {os.path.isfile(test_path)}")
print(f"Is dir:  {os.path.isdir(os.path.dirname(test_path))}")

Output:

Full path: /home/user/project/data/input.csv
Directory: /home/user/project/data
Filename:  input.csv
Name: input, Extension: .csv
Home config: /home/user/.bashrc
Absolute: /data/file.txt

/etc/hosts exists: True
Is file: True
Is dir:  True

Creating and Managing Directories

Scripts frequently need to create output directories before writing files. The key is using makedirs with exist_ok=True so your script doesn’t crash if the directory already exists:

# directories.py
import os

# Create a single directory
os.makedirs('output', exist_ok=True)
print("Created 'output/' directory (or it already existed)")

# Create nested directories in one call
nested = os.path.join('output', '2026', 'april', 'reports')
os.makedirs(nested, exist_ok=True)
print(f"Created nested directories: {nested}")

# List what we created
for item in os.listdir('output'):
    item_path = os.path.join('output', item)
    kind = 'DIR' if os.path.isdir(item_path) else 'FILE'
    print(f"  [{kind}] {item}")

# Remove an empty directory
os.rmdir('output/2026/april/reports')
print("\nRemoved 'reports' directory")
# Note: os.rmdir() only removes EMPTY directories
# For non-empty directories, use shutil.rmtree()

Output:

Created 'output/' directory (or it already existed)
Created nested directories: output/2026/april/reports
  [DIR] 2026
Removed 'reports' directory

Listing and Filtering Files

Two functions list directory contents: os.listdir() returns filenames as strings, while os.scandir() returns DirEntry objects with built-in file metadata — which is more efficient when you need to check whether entries are files or directories.

# list_files.py
import os

# Create some test files to list
os.makedirs('test_dir', exist_ok=True)
for fname in ['report.csv', 'data.json', 'notes.txt', 'archive.zip']:
    open(os.path.join('test_dir', fname), 'w').close()

# listdir: simple string list
all_entries = os.listdir('test_dir')
print("All entries:", all_entries)

# Filter by extension
csv_files = [f for f in all_entries if f.endswith('.csv')]
print("CSV files:", csv_files)

# scandir: more efficient when you need metadata
print("\nUsing scandir:")
with os.scandir('test_dir') as entries:
    for entry in entries:
        if entry.is_file():
            size = entry.stat().st_size
            print(f"  {entry.name} ({size} bytes)")

# Clean up test files
import shutil
shutil.rmtree('test_dir')

Output:

All entries: ['report.csv', 'data.json', 'notes.txt', 'archive.zip']
CSV files: ['report.csv']

Using scandir:
  archive.zip (0 bytes)
  data.json (0 bytes)
  notes.txt (0 bytes)
  report.csv (0 bytes)

Reading Environment Variables

Environment variables are the standard way to pass configuration to Python scripts — API keys, database URLs, feature flags, and deployment settings. Never hardcode secrets in your code; read them from the environment instead:

# env_vars.py
import os

# Read an environment variable (returns None if not set)
api_key = os.environ.get('MY_API_KEY')
print(f"API key set: {api_key is not None}")

# Read with a default value
debug_mode = os.environ.get('DEBUG', 'false').lower() == 'true'
print(f"Debug mode: {debug_mode}")

# Raise an error if a required variable is missing
try:
    db_url = os.environ['DATABASE_URL']
except KeyError:
    print("DATABASE_URL not set -- using default SQLite")
    db_url = 'sqlite:///local.db'

print(f"Database: {db_url}")

# Set an environment variable for child processes
os.environ['APP_ENV'] = 'testing'
print(f"APP_ENV: {os.environ.get('APP_ENV')}")

# List all environment variables (just the keys)
env_keys = sorted(os.environ.keys())
print(f"\nTotal env vars: {len(env_keys)}")

Output:

API key set: False
Debug mode: False
DATABASE_URL not set -- using default SQLite
Database: sqlite:///local.db
APP_ENV: testing

Total env vars: 47

Use os.environ.get('KEY', 'default') for optional settings and os.environ['KEY'] (which raises KeyError if missing) for required settings. This way, your code fails fast with a clear error when required configuration is absent rather than failing later with a cryptic message.

Walking Directory Trees with os.walk

os.walk() recursively traverses a directory tree, yielding a tuple of (dirpath, dirnames, filenames) for every directory it visits. This is invaluable for finding files deep in nested folder structures:

# walk_example.py
import os

def find_files_by_extension(root_dir, extension):
    """Find all files with a given extension in a directory tree."""
    found = []
    for dirpath, dirnames, filenames in os.walk(root_dir):
        # Skip hidden directories (those starting with '.')
        dirnames[:] = [d for d in dirnames if not d.startswith('.')]

        for filename in filenames:
            if filename.endswith(extension):
                full_path = os.path.join(dirpath, filename)
                found.append(full_path)
    return found

# Example: find all Python files in the current directory
python_files = find_files_by_extension('.', '.py')
for f in python_files[:5]:  # Show first 5
    print(f)

print(f"\nTotal .py files found: {len(python_files)}")

Output:

./walk_example.py
./os_quick.py
./path_functions.py

Total .py files found: 3

The key trick here is dirnames[:] = [...] — modifying the list in-place tells os.walk() which subdirectories to skip. This “prune” technique prevents the walker from descending into directories you don’t want to search (like .git, __pycache__, or node_modules).

Real-Life Example: File Organizer Script

Let’s build a script that organizes files in a folder by sorting them into subdirectories based on their file extension — the classic “Downloads folder cleaner” project:

# file_organizer.py
import os
import shutil

# Map extensions to folder names
EXTENSION_MAP = {
    '.pdf':  'Documents',
    '.docx': 'Documents',
    '.txt':  'Documents',
    '.jpg':  'Images',
    '.jpeg': 'Images',
    '.png':  'Images',
    '.gif':  'Images',
    '.mp4':  'Videos',
    '.mov':  'Videos',
    '.mp3':  'Audio',
    '.wav':  'Audio',
    '.zip':  'Archives',
    '.tar':  'Archives',
    '.py':   'Code',
    '.js':   'Code',
    '.csv':  'Data',
    '.json': 'Data',
}

def organize_folder(source_dir):
    """Sort files in source_dir into subdirectories by type."""
    moved = 0
    skipped = 0

    for filename in os.listdir(source_dir):
        src_path = os.path.join(source_dir, filename)

        # Skip directories and hidden files
        if os.path.isdir(src_path) or filename.startswith('.'):
            skipped += 1
            continue

        # Get the file extension
        _, ext = os.path.splitext(filename)
        folder_name = EXTENSION_MAP.get(ext.lower(), 'Other')

        # Create destination directory if needed
        dest_dir = os.path.join(source_dir, folder_name)
        os.makedirs(dest_dir, exist_ok=True)

        # Move the file
        dest_path = os.path.join(dest_dir, filename)
        if not os.path.exists(dest_path):  # Don't overwrite
            shutil.move(src_path, dest_path)
            moved += 1
            print(f"  Moved: {filename} -> {folder_name}/")
        else:
            print(f"  Skipped (exists): {filename}")
            skipped += 1

    print(f"\nDone: {moved} files moved, {skipped} skipped.")

# Set up a test directory with sample files
test_dir = 'test_downloads'
os.makedirs(test_dir, exist_ok=True)
for fname in ['report.pdf', 'photo.jpg', 'backup.zip',
              'script.py', 'data.csv', 'video.mp4']:
    open(os.path.join(test_dir, fname), 'w').close()

print(f"Organizing '{test_dir}'...")
organize_folder(test_dir)

# Show result
print("\nResulting structure:")
for item in sorted(os.listdir(test_dir)):
    print(f"  {item}/")

# Clean up test dir
shutil.rmtree(test_dir)

Output:

Organizing 'test_downloads'...
  Moved: backup.zip -> Archives/
  Moved: data.csv -> Data/
  Moved: photo.jpg -> Images/
  Moved: report.pdf -> Documents/
  Moved: script.py -> Code/
  Moved: video.mp4 -> Videos/

Done: 6 files moved, 0 skipped.

Resulting structure:
  Archives/
  Code/
  Data/
  Documents/
  Images/
  Videos/

This project uses os.listdir(), os.path.join(), os.path.isdir(), os.path.splitext(), and os.makedirs() together. The shutil.move() function handles the actual file moving (the shutil module complements os for higher-level file operations). Extend this by reading the extension-to-folder mapping from a JSON config file, adding a dry-run mode that shows what would be moved without doing it, or recursively organizing nested subfolders.

Frequently Asked Questions

Should I use os.path or pathlib?

For new code targeting Python 3.4+, pathlib is generally preferred for path operations because it’s more readable and object-oriented. For example, Path.home() / 'data' / 'file.txt' is cleaner than os.path.join(os.path.expanduser('~'), 'data', 'file.txt'). However, os remains necessary for environment variables, os.walk(), and other OS-level operations. In practice, most projects use both.

How do I delete a non-empty directory?

os.rmdir() only removes empty directories and raises OSError if the directory has contents. To delete a directory and all its contents, use shutil.rmtree(path) from the shutil module. Be careful — this is irreversible and doesn’t send files to the Recycle Bin. Always double-check the path before calling rmtree.

How do I get a file’s size and modification time?

Use os.stat(path) to read file metadata. The result has st_size (bytes), st_mtime (last modification time as a Unix timestamp), and other fields. Use datetime.fromtimestamp(os.stat(p).st_mtime) to convert the timestamp to a readable date. When using scandir(), you can call entry.stat() directly without an extra system call.

How do I rename or move a file?

os.rename(src, dst) renames or moves a file within the same filesystem. If you need to move across filesystems or drives, use shutil.move(src, dst) instead. os.rename() will overwrite the destination on Unix systems if it already exists, but raises an error on Windows. Always check os.path.exists(dst) first if portability matters.

When should I use os.getcwd() vs os.path.abspath?

os.getcwd() returns the process’s current working directory. Use it when you need to know where the script is running from. os.path.abspath(path) resolves a relative path against the cwd and normalizes any ../ segments. If you’re building paths from a relative reference, use abspath to make them unambiguous.

Conclusion

You’ve seen the core functions of Python’s os module: getcwd() and chdir() for navigation, the os.path submodule for portable path manipulation, makedirs() and rmdir() for directory management, listdir() and scandir() for listing files, environ.get() for reading environment variables, and walk() for recursive directory traversal. Take the file organizer example and extend it with a JSON config file, a dry-run flag, or a log file that records every file move.

As your projects grow, you’ll find yourself reaching for pathlib for new path-manipulation code but sticking with os for environment variables and walking. Both modules are well-documented in the official Python documentation.