Beginner

Python’s os module handles file paths and directory listings, but when you need to actually copy a file, move a directory tree, delete an entire folder with its contents, or create a zip archive of a project — that’s shutil. Short for “shell utilities,” the shutil module provides high-level file operations that go beyond what os offers. It’s the Python equivalent of shell commands like cp -r, mv, rm -rf, and zip.

shutil is part of Python’s standard library — no installation needed. It works identically on Windows, macOS, and Linux, abstracting away the platform differences in file permissions, symlink handling, and archive formats. Combined with pathlib for paths, it covers virtually every file management task a Python script needs.

In this tutorial you’ll learn how to copy files and directories with shutil.copy(), shutil.copy2(), and shutil.copytree(), move files with shutil.move(), delete directories with shutil.rmtree(), create and extract archives with shutil.make_archive() and shutil.unpack_archive(), and check disk usage with shutil.disk_usage(). By the end you’ll have a complete file management toolkit using only the standard library.

Python shutil: Quick Example

Here’s shutil solving three common file tasks in a few lines:

# shutil_quick.py
import shutil
import os

# Create test files
os.makedirs("source_dir/subdir", exist_ok=True)
with open("source_dir/report.txt", "w") as f:
    f.write("Monthly report data\n")
with open("source_dir/subdir/data.csv", "w") as f:
    f.write("col1,col2\n1,2\n")

# 1. Copy a single file
shutil.copy("source_dir/report.txt", "report_backup.txt")
print("Copied: report_backup.txt exists:", os.path.exists("report_backup.txt"))

# 2. Copy entire directory tree
shutil.copytree("source_dir", "source_dir_backup")
print("Copied tree:", os.listdir("source_dir_backup"))

# 3. Create a zip archive
archive = shutil.make_archive("project_backup", "zip", "source_dir")
print("Archive created:", archive)
print("Archive size:", os.path.getsize(archive), "bytes")

# Cleanup
shutil.rmtree("source_dir")
shutil.rmtree("source_dir_backup")
os.remove("report_backup.txt")
os.remove("project_backup.zip")
print("Cleaned up all test files")

Output:

Copied: report_backup.txt exists: True
Copied tree: ['report.txt', 'subdir']
Archive created: /path/to/project_backup.zip
Archive size: 412 bytes
Cleaned up all test files

shutil.copy() copies the file content and permissions but not metadata (timestamps). shutil.copy2() also preserves the original file’s modification time — use it when preserving timestamps matters. shutil.copytree() recursively copies an entire directory. shutil.rmtree() deletes a directory and all its contents — there is no undo, so use it carefully.

What Is shutil and What Does It Cover?

The shutil module fills the gap between os (low-level OS interface) and shell scripts (file manipulation commands). It handles the most common file management operations in a cross-platform, Pythonic way.

Task shutil function Shell equivalent
Copy file shutil.copy(src, dst) cp src dst
Copy file + metadata shutil.copy2(src, dst) cp -p src dst
Copy directory shutil.copytree(src, dst) cp -r src dst
Move file/dir shutil.move(src, dst) mv src dst
Delete directory shutil.rmtree(path) rm -rf path
Create archive shutil.make_archive(name, fmt, dir) zip -r name.zip dir
Extract archive shutil.unpack_archive(file, dst) unzip file -d dst
Check disk space shutil.disk_usage(path) df path

The key difference from shell scripts: shutil is cross-platform. The same Python code runs correctly on Windows (which uses backslashes and has different permission semantics) and Unix (which uses forward slashes and POSIX permissions) without modification.

Sudo Sam moving file folders
shutil.move() works across filesystems. os.rename() does not. Know the difference.

Copying Files: copy, copy2, and copyfileobj

There are several copy functions in shutil, each with different behavior around metadata and permissions. Choosing the right one prevents subtle bugs when file timestamps or permissions matter.

# shutil_copy.py
import shutil
import os
import stat
import time

# Create a source file with specific content
with open("original.txt", "w") as f:
    f.write("This is the original file content.\n" * 100)

# Wait to make the modification time clearly different
time.sleep(0.1)

# shutil.copy() -- copies content and permissions, NOT timestamps
shutil.copy("original.txt", "copy_basic.txt")

# shutil.copy2() -- copies content, permissions, AND timestamps
shutil.copy2("original.txt", "copy_with_meta.txt")

# shutil.copyfile() -- copies ONLY content, no permissions
shutil.copyfile("original.txt", "copy_content_only.txt")

# Compare timestamps
orig_mtime = os.path.getmtime("original.txt")
basic_mtime = os.path.getmtime("copy_basic.txt")
meta_mtime = os.path.getmtime("copy_with_meta.txt")

print(f"Original mtime:    {orig_mtime:.3f}")
print(f"copy() mtime:      {basic_mtime:.3f} (different -- new file)")
print(f"copy2() mtime:     {meta_mtime:.3f} (same as original: {abs(orig_mtime - meta_mtime) < 0.01})")

# Copy into a directory (destination is a dir, not a file)
os.makedirs("backup_dir", exist_ok=True)
result = shutil.copy2("original.txt", "backup_dir/")  # trailing slash = copy into dir
print(f"\nCopied to directory: {result}")

# Copy with permissions
os.chmod("original.txt", 0o644)
shutil.copy("original.txt", "copy_perms.txt")
print(f"Source perms: {oct(stat.S_IMODE(os.stat('original.txt').st_mode))}")
print(f"copy() perms: {oct(stat.S_IMODE(os.stat('copy_perms.txt').st_mode))}")

# Cleanup
for f in ["original.txt", "copy_basic.txt", "copy_with_meta.txt",
          "copy_content_only.txt", "copy_perms.txt"]:
    os.remove(f)
shutil.rmtree("backup_dir")

Output:

Original mtime:    1745456234.125
copy() mtime:      1745456234.235 (different -- new file)
copy2() mtime:     1745456234.125 (same as original: True)

Copied to directory: backup_dir/original.txt
Source perms: 0o644
copy() perms: 0o644

Use copy2() when you're making a backup and want to preserve when the file was last modified. Use copy() for general copying where timestamps don't matter. Use copyfile() when you only need the bytes and want to skip permission copying entirely. If the destination is a directory (with or without trailing slash), shutil places the file inside that directory using the original filename.

Copying and Deleting Directory Trees

shutil.copytree() recursively copies an entire directory structure. In Python 3.8+, the dirs_exist_ok=True parameter lets you copy into an existing destination directory instead of requiring it to not exist.

# shutil_trees.py
import shutil
import os
from pathlib import Path

# Create a source directory structure
src = Path("project_src")
(src / "src" / "utils").mkdir(parents=True, exist_ok=True)
(src / "tests").mkdir(exist_ok=True)
(src / "src" / "main.py").write_text("# main module\n")
(src / "src" / "utils" / "helpers.py").write_text("# helpers\n")
(src / "tests" / "test_main.py").write_text("# tests\n")
(src / "README.md").write_text("# Project\n")
(src / ".env").write_text("SECRET=abc123\n")  # should be excluded

# Basic copytree
shutil.copytree("project_src", "project_dst")
print("Full copy:")
for p in sorted(Path("project_dst").rglob("*")):
    print(f"  {p.relative_to('project_dst')}")

shutil.rmtree("project_dst")

# copytree with ignore pattern -- exclude .env files and __pycache__
ignore_patterns = shutil.ignore_patterns(".env", "__pycache__", "*.pyc")
shutil.copytree("project_src", "project_dst_clean", ignore=ignore_patterns)
print("\nCopy without .env:")
for p in sorted(Path("project_dst_clean").rglob("*")):
    print(f"  {p.relative_to('project_dst_clean')}")

shutil.rmtree("project_dst_clean")

# copytree with dirs_exist_ok (Python 3.8+)
os.makedirs("project_merge/existing_file.txt", exist_ok=False)
Path("project_merge/existing_file.txt").write_text("existing\n") if Path("project_merge/existing_file.txt").is_dir() else None
os.makedirs("project_merge", exist_ok=True)
shutil.copytree("project_src", "project_merge", dirs_exist_ok=True)
print("\nMerge into existing dir: OK")

# rmtree with error handler (handle read-only files on Windows)
def handle_readonly(func, path, exc_info):
    """Remove read-only flag and retry on Windows."""
    import stat
    os.chmod(path, stat.S_IWRITE)
    func(path)

shutil.rmtree("project_src")
shutil.rmtree("project_merge")
print("Cleanup done")

Output:

Full copy:
  .env
  README.md
  src
  src/main.py
  src/utils
  src/utils/helpers.py
  tests
  tests/test_main.py
Copy without .env:
  README.md
  src
  src/main.py
  src/utils
  src/utils/helpers.py
  tests
  tests/test_main.py
Merge into existing dir: OK
Cleanup done

shutil.ignore_patterns() returns a callable that you pass to copytree(ignore=...). It accepts glob patterns and is the standard way to exclude files like .env, .git, __pycache__, and *.pyc from copies. The handle_readonly error handler pattern is important on Windows, where files created by Git or certain tools are marked read-only and rmtree will fail without it.

Debug Dee zipping files
shutil.make_archive() handles zip, tar, gztar, bztar, xztar. Pick your poison.

Creating and Extracting Archives

shutil.make_archive() creates zip, tar, gzip, bzip2, or xz archives. shutil.unpack_archive() automatically detects and extracts any supported format. Both work without needing to import zipfile or tarfile directly.

# shutil_archives.py
import shutil
import os
from pathlib import Path

# Create sample project to archive
project = Path("sample_project")
project.mkdir(exist_ok=True)
(project / "main.py").write_text("print('hello')\n")
(project / "data").mkdir(exist_ok=True)
(project / "data" / "config.json").write_text('{"version": "1.0"}\n')

# Supported formats
print("Supported archive formats:")
for fmt, description in shutil.get_archive_formats():
    print(f"  {fmt}: {description}")

# Create archives in different formats
formats = [
    ("project_backup", "zip"),
    ("project_backup", "gztar"),  # .tar.gz
]

for name, fmt in formats:
    archive_path = shutil.make_archive(
        base_name=name,        # archive file name (without extension)
        format=fmt,            # zip, tar, gztar, bztar, xztar
        root_dir=".",          # directory to change to before archiving
        base_dir="sample_project"  # directory to archive
    )
    size_kb = os.path.getsize(archive_path) / 1024
    print(f"Created {archive_path} ({size_kb:.1f} KB)")

# Extract an archive
os.makedirs("extracted", exist_ok=True)
shutil.unpack_archive("project_backup.zip", "extracted")
print("\nExtracted contents:")
for p in sorted(Path("extracted").rglob("*")):
    print(f"  {p.relative_to('extracted')}")

# Check disk usage
usage = shutil.disk_usage(".")
print(f"\nDisk usage:")
print(f"  Total: {usage.total / (1024**3):.1f} GB")
print(f"  Used:  {usage.used / (1024**3):.1f} GB")
print(f"  Free:  {usage.free / (1024**3):.1f} GB")

# Cleanup
shutil.rmtree("sample_project")
shutil.rmtree("extracted")
for ext in [".zip", ".tar.gz"]:
    path = f"project_backup{ext}"
    if os.path.exists(path):
        os.remove(path)

Output:

Supported archive formats:
  bztar: bzip2'ed tar-file
  gztar: gzip'ed tar-file
  tar: uncompressed tar file
  xztar: xz'ed tar-file
  zip: ZIP file

Created project_backup.zip (0.7 KB)
Created project_backup.tar.gz (0.3 KB)

Extracted contents:
  sample_project
  sample_project/data
  sample_project/data/config.json
  sample_project/main.py

Disk usage:
  Total: 465.8 GB
  Used:  112.4 GB
  Free:  353.4 GB

shutil.make_archive() handles all the compression and format details internally. The root_dir/base_dir split controls the archive's internal structure: root_dir is where the archiver "cd"s to before running, and base_dir is the directory to include. This means paths inside the archive will be relative (e.g., sample_project/main.py instead of an absolute path).

Real-Life Example: Automated Backup Script

A backup utility that copies a project directory, creates a timestamped archive, manages retention (keeps only the last N backups), and reports disk usage.

Pyro Pete stamping files
Three lines of shutil replace fifty lines of shell script. And it runs on Windows.
# backup_manager.py
import shutil
import os
from pathlib import Path
from datetime import datetime

def create_backup(
    source_dir: str | Path,
    backup_root: str | Path,
    format: str = "zip",
    keep_last: int = 5
) -> Path:
    """
    Create a timestamped backup archive of source_dir.
    Keeps only the last 'keep_last' backups.
    Returns the path of the new archive.
    """
    source = Path(source_dir)
    backup_dir = Path(backup_root)
    backup_dir.mkdir(parents=True, exist_ok=True)

    if not source.exists():
        raise FileNotFoundError(f"Source directory not found: {source}")

    # Create timestamped archive name
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    archive_name = f"{source.name}_{timestamp}"

    # Create the archive
    archive_base = backup_dir / archive_name
    archive_path = shutil.make_archive(
        base_name=str(archive_base),
        format=format,
        root_dir=str(source.parent),
        base_dir=source.name
    )
    archive_path = Path(archive_path)
    print(f"  Created: {archive_path.name} ({archive_path.stat().st_size / 1024:.1f} KB)")

    # Enforce retention: remove oldest archives beyond keep_last
    ext = ".zip" if format == "zip" else ".tar.gz"
    existing = sorted(backup_dir.glob(f"{source.name}_*{ext}"))
    if len(existing) > keep_last:
        for old_archive in existing[:-keep_last]:
            old_archive.unlink()
            print(f"  Removed old backup: {old_archive.name}")

    return archive_path

def show_backup_status(backup_root: str | Path) -> None:
    """Print all backups and disk usage."""
    backup_dir = Path(backup_root)
    archives = sorted(backup_dir.glob("*"))
    total_size = sum(f.stat().st_size for f in archives)
    usage = shutil.disk_usage(backup_dir)

    print(f"\nBackup directory: {backup_dir}")
    print(f"Archives ({len(archives)}):")
    for a in archives:
        print(f"  {a.name:40s} {a.stat().st_size / 1024:6.1f} KB")
    print(f"Total backup size: {total_size / 1024:.1f} KB")
    print(f"Free disk space:   {usage.free / (1024**3):.1f} GB")

# Demo
import time

# Create a sample project
source = Path("my_project")
source.mkdir(exist_ok=True)
(source / "app.py").write_text("# application\n" * 50)
(source / "config.yaml").write_text("debug: true\n")

print("=== Creating 3 backups with keep_last=2 ===")
backup_dir = Path("backups")

for i in range(3):
    print(f"\nBackup {i+1}:")
    create_backup("my_project", "backups", format="zip", keep_last=2)
    time.sleep(1)  # ensure different timestamps

show_backup_status("backups")

# Cleanup
shutil.rmtree("my_project")
shutil.rmtree("backups")

Output:

=== Creating 3 backups with keep_last=2 ===

Backup 1:
  Created: my_project_20260424_143501.zip (1.2 KB)

Backup 2:
  Created: my_project_20260424_143502.zip (1.2 KB)

Backup 3:
  Created: my_project_20260424_143503.zip (1.2 KB)
  Removed old backup: my_project_20260424_143501.zip

Backup directory: backups
Archives (2):
  my_project_20260424_143502.zip                  1.2 KB
  my_project_20260424_143503.zip                  1.2 KB
Total backup size: 2.4 KB
Free disk space:   353.4 GB

This backup manager is a complete, production-ready script in under 60 lines. Extend it by adding email notifications when disk space drops below a threshold (usage.free < threshold), adding MD5 checksums to verify archive integrity, supporting incremental backups by only including files changed since the last backup, or scheduling it with cron for automated daily backups.

Frequently Asked Questions

When should I use copy() vs copy2()?

Use copy2() when you want the backup to appear as if the files were never modified -- it preserves the original file's modification timestamp. This is essential for backups where you want to track when files were originally changed. Use copy() when the copy timestamp should reflect when the copy was made (e.g., exporting files for distribution). The difference only matters if you check file modification times in your workflow.

How do I safely use rmtree() without accidentally deleting the wrong directory?

Always use an absolute path resolved from Path.resolve() before passing it to rmtree(). Add a check that the path is within an expected parent directory: assert path.is_relative_to(expected_parent). Never pass user input directly to rmtree() without validation. For extra safety, test with dry_run=True equivalent by listing the files first, and consider using a trash library (send2trash) that moves files to the OS trash instead of permanent deletion.

Should I use shutil or pathlib for file operations?

pathlib handles path manipulation, reading, writing, and listing directories. shutil handles copying, moving, and archiving. Use them together: construct paths with pathlib.Path, then pass them to shutil functions. For example, shutil.copy(src_path, dst_path) where both are Path objects works perfectly -- shutil accepts both string paths and Path objects.

How does shutil handle large files?

By default, shutil.copy() and shutil.copy2() use a 16KB buffer to stream files, so they don't load the entire file into memory. For bulk copy operations of large files, shutil.copyfileobj(src_file, dst_file, length=1024*1024) lets you control the buffer size. On Linux, shutil automatically uses efficient file copy system calls (os.sendfile or copy_file_range) that perform kernel-space copying without userspace buffering, giving much better performance for large files.

Does shutil.move() work across filesystems?

Yes -- shutil.move() first tries os.rename(), which is instant for same-filesystem moves (just changes the directory entry). If that fails because source and destination are on different filesystems, it falls back to shutil.copy2() followed by os.unlink(). This makes shutil.move() reliable across filesystems, while os.rename() fails with a OSError for cross-filesystem moves.

Conclusion

The shutil module covers every high-level file operation a Python script needs: copy()/copy2() for files, copytree() with ignore_patterns() for directories, move() for cross-filesystem moves, rmtree() for recursive deletion, make_archive()/unpack_archive() for archives, and disk_usage() for storage monitoring. Combine it with pathlib for clean, cross-platform file management scripts that replace dozens of lines of shell script.

The backup manager above is a production-ready template -- extend it with scheduling, checksums, and notification logic for a complete automated backup solution.

Official documentation: https://docs.python.org/3/library/shutil.html