Intermediate

You have a list of tasks where some must run before others — database migrations before app startup, CSS before JavaScript, package A before package B. Keeping track of these dependencies manually is a recipe for mysterious bugs and circular import nightmares. What you really need is a topological sort: an ordering of nodes in a directed graph such that every dependency comes before the thing that depends on it.

Python 3.9 added the graphlib module to the standard library, giving you a clean, built-in way to solve exactly this problem. No third-party libraries, no reimplementing Kahn’s algorithm from scratch — just import TopologicalSorter and describe your dependency graph.

In this article, we will cover how graphlib.TopologicalSorter works, how to build and traverse dependency graphs, how to detect circular dependencies, and how to parallelize independent tasks. By the end you will have a solid understanding of topological sorting and a reusable pattern for dependency-driven task execution.

Python graphlib: Quick Example

Here is the fastest path from zero to a sorted dependency order using graphlib:

# quick_graphlib.py
from graphlib import TopologicalSorter

# Define a dependency graph: key depends on all values
graph = {
    "build": {"test"},
    "test": {"lint", "compile"},
    "lint": set(),
    "compile": {"lint"},
}

sorter = TopologicalSorter(graph)
order = list(sorter.static_order())
print("Build order:", order)

Output:

Build order: ['lint', 'compile', 'test', 'build']

The TopologicalSorter takes a dictionary where each key depends on its value set. static_order() returns a single valid execution order — no parallelism, just a flat sequence. This is the simplest mode and is perfect when you want a deterministic list.

The sections below cover more advanced use: dynamic mode for parallel scheduling, cycle detection, and real-world patterns for build systems and package managers.

What Is graphlib and Why Use It?

Topological sorting solves the question: “given a set of tasks with dependencies, in what order should I run them so every dependency is satisfied before the task that needs it?” It is named after topology because it works on directed acyclic graphs (DAGs) — graphs with no cycles.

Think of it like a university course planner. You cannot take Algorithms without taking Data Structures first, and you cannot take Data Structures without completing Introduction to Programming. A topological sort finds the valid sequence of courses to take.

Approach	Pros	Cons
Manual ordering	Simple for small graphs	Error-prone, doesn’t scale
Roll your own DFS/BFS	Full control	30+ lines, subtle bugs
`graphlib.TopologicalSorter`	Standard library, clean API, cycle detection	Python 3.9+ only
NetworkX	Full graph algorithms	Third-party dependency, heavy

For most dependency resolution use cases, graphlib hits the sweet spot: it is built in, readable, and handles cycle detection out of the box.

Building Dependency Graphs

The TopologicalSorter accepts a dictionary where each key is a node and each value is a set (or any iterable) of nodes that the key depends on. You can also build the graph incrementally using the add() method, which is useful when you discover dependencies at runtime.

# build_graph.py
from graphlib import TopologicalSorter

sorter = TopologicalSorter()

# Add nodes one at a time: add(node, *dependencies)
sorter.add("deploy", "build", "test")
sorter.add("build", "compile", "lint")
sorter.add("compile", "setup")
sorter.add("lint", "setup")
sorter.add("test", "build")
sorter.add("setup")  # no dependencies

order = list(sorter.static_order())
print("Deployment pipeline:", order)

Output:

Deployment pipeline: ['setup', 'compile', 'lint', 'build', 'test', 'deploy']

The add(node, *deps) signature is convenient when you are reading dependency data from a config file or database. You do not need to construct the full dictionary upfront — just call add() as you discover each dependency relationship.

Dependency graphs: when the order matters and your build system needs to know it.

Detecting Circular Dependencies

A topological sort is only possible on a directed acyclic graph — if you have a cycle (A depends on B depends on A), there is no valid ordering. graphlib raises a CycleError when it detects one, giving you the list of nodes involved in the cycle.

# cycle_detection.py
from graphlib import TopologicalSorter, CycleError

graph = {
    "A": {"B"},
    "B": {"C"},
    "C": {"A"},  # creates a cycle: A -> B -> C -> A
}

sorter = TopologicalSorter(graph)

try:
    order = list(sorter.static_order())
    print("Order:", order)
except CycleError as e:
    print(f"Circular dependency detected: {e}")
    print(f"Cycle involves: {e.args[1]}")

Output:

Circular dependency detected: ('nodes are in a cycle', ['A', 'C', 'B', 'A'])
Cycle involves: ['A', 'C', 'B', 'A']

The CycleError exception includes the list of nodes forming the cycle as e.args[1]. This makes it straightforward to surface a useful error message to the user — “Task A and Task C have a circular dependency” is far more actionable than a generic crash.

Dynamic Mode for Parallel Execution

The static_order() method gives you one valid sequence, but it is inherently serial. If some tasks can run in parallel (because they have no dependency between them), you are leaving performance on the table. graphlib provides a dynamic mode for exactly this use case.

In dynamic mode, you call prepare() first, then iterate using get_ready() and done(). The sorter yields batches of tasks that are currently ready to execute (all their dependencies have been marked done), allowing you to dispatch them concurrently.

# parallel_sorter.py
from graphlib import TopologicalSorter
import concurrent.futures
import time

def run_task(task_name):
    """Simulate a task with a short delay."""
    print(f"  Running: {task_name}")
    time.sleep(0.1)
    return task_name

graph = {
    "deploy": {"build", "test"},
    "build": {"lint", "compile"},
    "test": {"compile"},
    "compile": {"setup"},
    "lint": {"setup"},
    "setup": set(),
}

sorter = TopologicalSorter(graph)
sorter.prepare()

with concurrent.futures.ThreadPoolExecutor() as pool:
    futures = {}
    while sorter.is_active():
        # Get all tasks whose dependencies are done
        for task in sorter.get_ready():
            future = pool.submit(run_task, task)
            futures[future] = task

        # Wait for any task to finish, then mark it done
        done_futures, _ = concurrent.futures.wait(
            futures, return_when=concurrent.futures.FIRST_COMPLETED
        )
        for future in done_futures:
            task = futures.pop(future)
            sorter.done(task)
            print(f"  Completed: {task}")

print("All tasks finished.")

Output:

  Running: setup
  Completed: setup
  Running: lint
  Running: compile
  Completed: lint
  Completed: compile
  Running: build
  Running: test
  Completed: test
  Completed: build
  Running: deploy
  Completed: deploy
All tasks finished.

The dynamic mode correctly dispatches lint and compile in parallel once setup finishes, then build and test in parallel once compile finishes. You get maximum parallelism automatically based on the graph structure.

get_ready() hands you independent tasks in batches. ThreadPoolExecutor handles the rest.

Real-Life Example: Package Dependency Resolver

Let’s build a simplified package dependency resolver — the kind of logic that sits at the heart of pip, npm, and apt. Given a list of packages with their dependencies, it outputs a valid installation order and detects any circular dependencies.

# package_resolver.py
from graphlib import TopologicalSorter, CycleError
from dataclasses import dataclass, field
from typing import Dict, Set, List

@dataclass
class Package:
    name: str
    version: str
    requires: Set[str] = field(default_factory=set)

def resolve_install_order(packages: List[Package]) -> List[str]:
    """
    Given a list of packages, return a valid installation order
    that satisfies all dependencies.

    Raises CycleError if circular dependencies are detected.
    Raises ValueError if an unknown package is referenced.
    """
    known = {pkg.name for pkg in packages}

    # Validate: check all dependencies exist in the package list
    for pkg in packages:
        for dep in pkg.requires:
            if dep not in known:
                raise ValueError(
                    f"Package '{pkg.name}' depends on '{dep}', "
                    f"which is not in the package list"
                )

    # Build graph: each package depends on its requires set
    graph: Dict[str, Set[str]] = {
        pkg.name: pkg.requires for pkg in packages
    }

    try:
        sorter = TopologicalSorter(graph)
        return list(sorter.static_order())
    except CycleError as e:
        cycle_nodes = e.args[1]
        raise CycleError(
            f"Circular dependency detected involving: "
            f"{' -> '.join(cycle_nodes)}"
        )

# --- Example usage ---
packages = [
    Package("myapp",     "1.0.0", requires={"requests", "sqlalchemy"}),
    Package("requests",  "2.31.0", requires={"urllib3", "certifi"}),
    Package("sqlalchemy","2.0.0",  requires={"greenlet"}),
    Package("urllib3",   "2.0.7",  requires=set()),
    Package("certifi",   "2024.2", requires=set()),
    Package("greenlet",  "3.0.0",  requires=set()),
]

order = resolve_install_order(packages)
print("Installation order:")
for i, pkg in enumerate(order, 1):
    print(f"  {i}. {pkg}")

Output:

Installation order:
  1. urllib3
  2. certifi
  3. greenlet
  4. requests
  5. sqlalchemy
  6. myapp

This resolver validates that all referenced packages exist, builds the dependency graph from the package metadata, and uses TopologicalSorter to produce a valid installation order. The same pattern applies to build systems, CI pipeline stages, database migration runners, and microservice startup orchestrators. Extend it by adding version conflict detection or by switching to dynamic mode for parallel installs.

Dependency resolution: urllib3 before requests before myapp. Every time.

Frequently Asked Questions

What Python version does graphlib require?

The graphlib module was added in Python 3.9. If you are on Python 3.8 or earlier, you will need to either upgrade or implement topological sorting manually (using DFS or Kahn’s algorithm). On Python 3.9+, it is part of the standard library with no installation needed.

When should I use static_order() vs dynamic mode?

Use static_order() when you need a simple serial execution order and do not care about parallelism — it is one line and very readable. Use dynamic mode (prepare() + get_ready() + done()) when you want to run independent tasks concurrently. Dynamic mode requires more code but gives you maximum throughput on multi-core systems.

Are there multiple valid topological orders?

Yes. For most graphs there are several valid orderings. For example, if A and B both depend only on C, you can run A before B or B before A — both are correct. graphlib does not guarantee which valid order it returns, so do not write code that relies on a specific tie-breaking behavior. If you need a deterministic order among independent nodes, sort the node names before adding them to the graph.

How do I handle CycleError gracefully?

Catch graphlib.CycleError and read e.args[1] to get the list of nodes forming the cycle. Format this into a human-readable error message that tells the user which tasks are in a circular dependency. Never swallow the exception silently — a circular dependency is a configuration bug that must be fixed, not ignored.

Does graphlib scale to large graphs?

Yes. TopologicalSorter uses Kahn’s algorithm internally, which runs in O(V + E) time where V is the number of nodes and E is the number of edges. This scales linearly and handles graphs with thousands of nodes efficiently. For extremely large graphs (millions of nodes), consider a dedicated graph library like NetworkX, but for typical build system and dependency resolution use cases, graphlib is more than adequate.

Conclusion

The graphlib module gives you a clean, standard-library solution to dependency resolution. We covered TopologicalSorter with both static_order() for simple serial ordering and dynamic mode for parallel task scheduling. We used CycleError to catch circular dependencies before they cause subtle runtime bugs, and built a complete package dependency resolver as a real-world example.

Try extending the package resolver to detect version conflicts, or wire the parallel scheduler to asyncio tasks instead of threads. The dynamic mode API maps naturally to async coroutines — just replace the ThreadPoolExecutor with asyncio.gather.

For the full API reference, see the Python graphlib documentation.

Post Views: 3

How To Use Python graphlib for Dependency Resolution

Python graphlib: Quick Example

What Is graphlib and Why Use It?

Building Dependency Graphs

Detecting Circular Dependencies

Dynamic Mode for Parallel Execution

Real-Life Example: Package Dependency Resolver

Frequently Asked Questions

What Python version does graphlib require?

When should I use static_order() vs dynamic mode?

Are there multiple valid topological orders?

How do I handle CycleError gracefully?

Does graphlib scale to large graphs?

Conclusion

Submit a Comment Cancel reply

How To Use Python graphlib for Dependency Resolution

Python graphlib: Quick Example

What Is graphlib and Why Use It?

Building Dependency Graphs

Detecting Circular Dependencies

Dynamic Mode for Parallel Execution

Real-Life Example: Package Dependency Resolver

Frequently Asked Questions

What Python version does graphlib require?

When should I use static_order() vs dynamic mode?

Are there multiple valid topological orders?

How do I handle CycleError gracefully?

Does graphlib scale to large graphs?

Conclusion

Related Articles

Submit a Comment Cancel reply