Intermediate

Your application needs to react when a file changes: reload configuration when config.yaml is updated, trigger processing when a new CSV appears in a watched folder, or invalidate a cache when a template file is modified. Polling with a loop and os.stat() works but wastes CPU, misses rapid changes, and is painful to write correctly. There is a better way.

The watchdog library provides cross-platform filesystem monitoring using native OS APIs (inotify on Linux, kqueue on macOS, ReadDirectoryChangesW on Windows) with a clean Python event handler interface. You define what to watch and what to do, and watchdog calls your handler the moment a change happens — no polling, no missed events.

This article covers installing watchdog, creating a basic file watcher, filtering events by file type and path, using the pattern-based PatternMatchingEventHandler, watching multiple directories, debouncing rapid events, and building a real-world auto-processing pipeline. By the end you will have a complete toolkit for reacting to filesystem changes in any Python application.

Python watchdog: Quick Example

Here is a minimal watchdog script that prints a message whenever any file in the current directory changes:

# quick_watch.py
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import time

class MyHandler(FileSystemEventHandler):
    def on_modified(self, event):
        if not event.is_directory:
            print(f"Modified: {event.src_path}")

    def on_created(self, event):
        if not event.is_directory:
            print(f"Created:  {event.src_path}")

observer = Observer()
observer.schedule(MyHandler(), path=".", recursive=False)
observer.start()

print("Watching current directory. Press Ctrl+C to stop.")
try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    observer.stop()
observer.join()
Watching current directory. Press Ctrl+C to stop.
Created:  ./test.txt
Modified: ./test.txt
Modified: ./config.json

The pattern is always: subclass FileSystemEventHandler, override the event methods you care about, create an Observer, schedule the handler with observer.schedule(), and call observer.start(). The observer runs in a background thread; your main thread stays free for other work.

Python watchdog tutorial illustration 1
inotify under the hood. Your CPU says thank you.

What Is watchdog and How Does It Work?

watchdog is a Python library that wraps OS-level filesystem notification APIs to detect file and directory changes in real time. Unlike a polling approach that checks file modification times on a timer, watchdog is event-driven: the OS notifies watchdog the instant a change occurs.

PlatformWatchdog BackendAPI Used
LinuxInotifyObserverinotify
macOSFSEventsObserverFSEvents / kqueue
WindowsWindowsApiObserverReadDirectoryChangesW
All (fallback)PollingObserverstat() polling

When you use from watchdog.observers import Observer, watchdog automatically selects the best backend for your platform. The PollingObserver is the cross-platform fallback and works everywhere, but it uses stat() polling so it is less efficient. The native backends are event-driven and have essentially zero CPU overhead when files are not changing.

Installation

pip install watchdog
Successfully installed watchdog-4.0.1

On Linux, watchdog uses the kernel’s inotify API which has limits on the number of watched files. If you watch large directory trees, you may need to increase the inotify limit: echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p.

Understanding Event Types

watchdog fires different event types depending on what happened. Override the corresponding method in your handler:

# event_types.py
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import time

class VerboseHandler(FileSystemEventHandler):
    """Override all event methods to see what watchdog reports."""

    def on_created(self, event):
        kind = "directory" if event.is_directory else "file"
        print(f"CREATED  [{kind}]: {event.src_path}")

    def on_deleted(self, event):
        kind = "directory" if event.is_directory else "file"
        print(f"DELETED  [{kind}]: {event.src_path}")

    def on_modified(self, event):
        if not event.is_directory:   # Ignore directory modify events (very noisy)
            print(f"MODIFIED [file]: {event.src_path}")

    def on_moved(self, event):
        print(f"MOVED    : {event.src_path} -> {event.dest_path}")

    def on_any_event(self, event):
        # Called for every event -- useful for logging
        pass   # Commented out to avoid duplicating the above

observer = Observer()
observer.schedule(VerboseHandler(), path="./watched", recursive=True)
observer.start()
print("Watching ./watched recursively...")
try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    observer.stop()
observer.join()
Watching ./watched recursively...
CREATED  [file]: watched/report.csv
MODIFIED [file]: watched/report.csv
CREATED  [file]: watched/subdir/data.json
DELETED  [file]: watched/old_file.txt
MOVED    : watched/draft.txt -> watched/final.txt

Directory modification events fire whenever a file inside the directory is created or deleted (because the directory’s metadata changes). They are almost always noise for file-processing use cases, which is why filtering with if not event.is_directory is the standard pattern. The on_moved event includes both event.src_path (original path) and event.dest_path (new path), making it easy to handle renames.

Filtering by File Pattern with PatternMatchingEventHandler

Instead of writing if/else logic in your handler to filter file types, use PatternMatchingEventHandler. It handles pattern filtering for you and only calls your methods for files that match.

# pattern_handler.py
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
import time

class CSVHandler(PatternMatchingEventHandler):
    def __init__(self):
        super().__init__(
            patterns=["*.csv", "*.tsv"],      # Only watch CSV and TSV files
            ignore_patterns=["*.tmp", "~*"],  # Skip temp files
            ignore_directories=True,          # Skip directory events
            case_sensitive=False              # Case-insensitive on all platforms
        )

    def on_created(self, event):
        print(f"New data file: {event.src_path}")
        self.process_file(event.src_path)

    def on_modified(self, event):
        print(f"Updated data file: {event.src_path}")
        self.process_file(event.src_path)

    def process_file(self, filepath):
        """Simulate processing the file."""
        print(f"  -> Processing {filepath}...")

observer = Observer()
observer.schedule(CSVHandler(), path="./data_inbox", recursive=False)
observer.start()
print("Watching ./data_inbox for CSV/TSV files...")
try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    observer.stop()
observer.join()
Watching ./data_inbox for CSV/TSV files...
New data file: ./data_inbox/sales_april.csv
  -> Processing ./data_inbox/sales_april.csv...
New data file: ./data_inbox/inventory.tsv
  -> Processing ./data_inbox/inventory.tsv...

PatternMatchingEventHandler uses Unix shell-style wildcards (*, ?, [seq]) via Python’s fnmatch module. The ignore_patterns parameter prevents processing of temporary files that editors create while saving (many editors write to a ~file.txt or file.txt.tmp before renaming to the final name, which would otherwise trigger your handler multiple times per save).

Python watchdog tutorial illustration 2
ignore_patterns=[‘*.tmp’] — because editors are messy and so are you.

Debouncing Rapid Events

A common problem: when a user saves a large file, watchdog may fire on_modified several times in rapid succession as the OS writes data in chunks. If your handler does expensive work (like reloading a config or triggering a build), you want to process it only once after the writes settle down.

# debounce_handler.py
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
import threading
import time

class DebouncedHandler(PatternMatchingEventHandler):
    def __init__(self, debounce_seconds=0.5):
        super().__init__(patterns=["*.json", "*.yaml"],
                         ignore_directories=True)
        self.debounce_seconds = debounce_seconds
        self._timers = {}   # path -> threading.Timer

    def on_any_event(self, event):
        if event.is_directory:
            return
        path = event.src_path
        # Cancel existing timer for this path (reset the debounce window)
        if path in self._timers:
            self._timers[path].cancel()
        # Start a new timer
        timer = threading.Timer(
            self.debounce_seconds,
            self._process,
            args=[path, event.event_type]
        )
        timer.start()
        self._timers[path] = timer

    def _process(self, path, event_type):
        """Called once after rapid events settle down."""
        print(f"[{event_type}] Processing settled: {path}")
        del self._timers[path]

observer = Observer()
observer.schedule(DebouncedHandler(debounce_seconds=0.5), path="./config", recursive=False)
observer.start()
print("Watching ./config with 500ms debounce...")
try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    observer.stop()
observer.join()
Watching ./config with 500ms debounce...
[modified] Processing settled: ./config/settings.json

The debounce pattern uses threading.Timer to delay execution. Each time an event fires for the same path, the existing timer is cancelled and a new one is started. Only after the events stop for 500ms (no new events in the debounce window) does _process() actually run. This ensures your handler runs exactly once per “save session” regardless of how many OS-level write events the save generates.

Real-Life Example: Auto-Processing Data Inbox

Here is a production-style script that watches an inbox folder for new CSV files, processes them, and moves them to a processed archive — a common pattern in data pipeline and ETL workflows.

# data_inbox_processor.py
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
import os
import shutil
import time
import threading
from datetime import datetime

INBOX_DIR  = "./inbox"
DONE_DIR   = "./processed"
ERROR_DIR  = "./errors"

# Create directories if they don't exist
for d in [INBOX_DIR, DONE_DIR, ERROR_DIR]:
    os.makedirs(d, exist_ok=True)

class InboxHandler(PatternMatchingEventHandler):
    def __init__(self):
        super().__init__(
            patterns=["*.csv"],
            ignore_patterns=["*.tmp", ".~*"],
            ignore_directories=True
        )
        # Debounce: wait 1 second after last event before processing
        self._pending = {}

    def on_created(self, event):
        self._schedule(event.src_path)

    def on_moved(self, event):
        # File was renamed/moved into the watched folder
        if event.dest_path.endswith(".csv"):
            self._schedule(event.dest_path)

    def _schedule(self, path):
        if path in self._pending:
            self._pending[path].cancel()
        t = threading.Timer(1.0, self._process, args=[path])
        t.start()
        self._pending[path] = t

    def _process(self, filepath):
        """Process a single CSV file."""
        self._pending.pop(filepath, None)
        if not os.path.exists(filepath):
            return  # File disappeared before we could process it

        filename = os.path.basename(filepath)
        print(f"[{datetime.now():%H:%M:%S}] Processing: {filename}")
        try:
            row_count = self._count_rows(filepath)
            print(f"  Rows: {row_count}")

            # Archive to processed folder with timestamp
            ts = datetime.now().strftime("%Y%m%d_%H%M%S")
            dest = os.path.join(DONE_DIR, f"{ts}_{filename}")
            shutil.move(filepath, dest)
            print(f"  Archived to: {dest}")

        except Exception as e:
            print(f"  ERROR: {e}")
            dest = os.path.join(ERROR_DIR, filename)
            shutil.move(filepath, dest)
            print(f"  Moved to errors: {dest}")

    def _count_rows(self, filepath):
        """Count non-header rows in a CSV file."""
        with open(filepath, "r") as f:
            return sum(1 for line in f) - 1   # Subtract header row

if __name__ == "__main__":
    observer = Observer()
    observer.schedule(InboxHandler(), path=INBOX_DIR, recursive=False)
    observer.start()
    print(f"Data inbox processor running.")
    print(f"  Watching: {os.path.abspath(INBOX_DIR)}")
    print(f"  Archive : {os.path.abspath(DONE_DIR)}")
    print(f"  Errors  : {os.path.abspath(ERROR_DIR)}")
    print("Drop CSV files into the inbox folder to process them.")
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()
    print("Processor stopped.")
Data inbox processor running.
  Watching: /home/user/project/inbox
  Archive : /home/user/project/processed
  Errors  : /home/user/project/errors
Drop CSV files into the inbox folder to process them.
[10:15:33] Processing: sales_data.csv
  Rows: 847
  Archived to: processed/20260430_101534_sales_data.csv

This is the pattern used in ETL pipelines, document processing systems, and data ingestion services. The 1-second debounce handles editors that write temp files before the final rename, the on_moved override catches files copied from another location, and the error directory ensures bad files are preserved for investigation rather than silently dropped.

Python watchdog tutorial illustration 3
Filesystem events are free. Your polling loop costs you every second.

Frequently Asked Questions

What does recursive=True do and when should I use it?

When recursive=True is passed to observer.schedule(), watchdog monitors all subdirectories under the watched path, not just the top-level directory. Use it when files may arrive in subdirectories. Be careful with very large directory trees — on Linux, each subdirectory consumes an inotify watch slot. For deep trees with thousands of subdirectories, increase the inotify limit or use a more targeted path.

When should I use PollingObserver instead of Observer?

Use PollingObserver when watching network-mounted filesystems (NFS, SMB/CIFS, Docker volume mounts) because inotify and FSEvents do not receive events for remote changes. Import it with from watchdog.observers.polling import PollingObserver. It polls every second by default; pass a timeout argument to change the interval. Also use it when running in Docker containers where the native filesystem events may not propagate from the host.

No, watchdog watches the path as given and does not follow symlinks by default. If you watch a symlink, you watch the link itself, not the target directory. To watch the real path, resolve the symlink first with os.path.realpath(path) before passing it to observer.schedule().

Can I watch multiple directories with one observer?

Yes. Call observer.schedule() multiple times with different paths and handlers. One Observer can manage dozens of watches efficiently. You can also use the same handler for multiple paths. The event.src_path attribute always tells you which file triggered the event, so you can differentiate between watched directories in a single handler if needed.

Why does my handler fire twice when I save a file in a text editor?

Most text editors use an atomic save pattern: write to a temporary file, then rename it to the final name. This generates two events: a created event for the temp file and a moved event for the rename. Some editors also fire an extra modified event. Use debouncing (as shown above) or filter by filename to ignore temp files (ignore_patterns=["*.tmp", ".~*", "#*"]). The debounce approach is more robust because temp file naming conventions vary by editor.

Conclusion

watchdog gives you efficient, event-driven filesystem monitoring using the native OS APIs on every platform. You have seen how to subclass FileSystemEventHandler for full control, use PatternMatchingEventHandler for clean file-type filtering, implement debouncing to handle rapid successive events, and build a complete data inbox pipeline that watches, processes, archives, and handles errors. The official watchdog documentation at python-watchdog.readthedocs.io covers advanced topics like custom observers and event queue management.