Intermediate
You have a list of tasks where some must run before others — database migrations before app startup, CSS before JavaScript, package A before package B. Keeping track of these dependencies manually is a recipe for mysterious bugs and circular import nightmares. What you really need is a topological sort: an ordering of nodes in a directed graph such that every dependency comes before the thing that depends on it.
Python 3.9 added the graphlib module to the standard library, giving you a clean, built-in way to solve exactly this problem. No third-party libraries, no reimplementing Kahn’s algorithm from scratch — just import TopologicalSorter and describe your dependency graph.
In this article, we will cover how graphlib.TopologicalSorter works, how to build and traverse dependency graphs, how to detect circular dependencies, and how to parallelize independent tasks. By the end you will have a solid understanding of topological sorting and a reusable pattern for dependency-driven task execution.
Python graphlib: Quick Example
Here is the fastest path from zero to a sorted dependency order using graphlib:
# quick_graphlib.py
from graphlib import TopologicalSorter
# Define a dependency graph: key depends on all values
graph = {
"build": {"test"},
"test": {"lint", "compile"},
"lint": set(),
"compile": {"lint"},
}
sorter = TopologicalSorter(graph)
order = list(sorter.static_order())
print("Build order:", order)
Output:
Build order: ['lint', 'compile', 'test', 'build']
The TopologicalSorter takes a dictionary where each key depends on its value set. static_order() returns a single valid execution order — no parallelism, just a flat sequence. This is the simplest mode and is perfect when you want a deterministic list.
The sections below cover more advanced use: dynamic mode for parallel scheduling, cycle detection, and real-world patterns for build systems and package managers.
What Is graphlib and Why Use It?
Topological sorting solves the question: “given a set of tasks with dependencies, in what order should I run them so every dependency is satisfied before the task that needs it?” It is named after topology because it works on directed acyclic graphs (DAGs) — graphs with no cycles.
Think of it like a university course planner. You cannot take Algorithms without taking Data Structures first, and you cannot take Data Structures without completing Introduction to Programming. A topological sort finds the valid sequence of courses to take.
| Approach | Pros | Cons |
|---|---|---|
| Manual ordering | Simple for small graphs | Error-prone, doesn’t scale |
| Roll your own DFS/BFS | Full control | 30+ lines, subtle bugs |
graphlib.TopologicalSorter | Standard library, clean API, cycle detection | Python 3.9+ only |
| NetworkX | Full graph algorithms | Third-party dependency, heavy |
For most dependency resolution use cases, graphlib hits the sweet spot: it is built in, readable, and handles cycle detection out of the box.
Building Dependency Graphs
The TopologicalSorter accepts a dictionary where each key is a node and each value is a set (or any iterable) of nodes that the key depends on. You can also build the graph incrementally using the add() method, which is useful when you discover dependencies at runtime.
# build_graph.py
from graphlib import TopologicalSorter
sorter = TopologicalSorter()
# Add nodes one at a time: add(node, *dependencies)
sorter.add("deploy", "build", "test")
sorter.add("build", "compile", "lint")
sorter.add("compile", "setup")
sorter.add("lint", "setup")
sorter.add("test", "build")
sorter.add("setup") # no dependencies
order = list(sorter.static_order())
print("Deployment pipeline:", order)
Output:
Deployment pipeline: ['setup', 'compile', 'lint', 'build', 'test', 'deploy']
The add(node, *deps) signature is convenient when you are reading dependency data from a config file or database. You do not need to construct the full dictionary upfront — just call add() as you discover each dependency relationship.

Detecting Circular Dependencies
A topological sort is only possible on a directed acyclic graph — if you have a cycle (A depends on B depends on A), there is no valid ordering. graphlib raises a CycleError when it detects one, giving you the list of nodes involved in the cycle.
# cycle_detection.py
from graphlib import TopologicalSorter, CycleError
graph = {
"A": {"B"},
"B": {"C"},
"C": {"A"}, # creates a cycle: A -> B -> C -> A
}
sorter = TopologicalSorter(graph)
try:
order = list(sorter.static_order())
print("Order:", order)
except CycleError as e:
print(f"Circular dependency detected: {e}")
print(f"Cycle involves: {e.args[1]}")
Output:
Circular dependency detected: ('nodes are in a cycle', ['A', 'C', 'B', 'A'])
Cycle involves: ['A', 'C', 'B', 'A']
The CycleError exception includes the list of nodes forming the cycle as e.args[1]. This makes it straightforward to surface a useful error message to the user — “Task A and Task C have a circular dependency” is far more actionable than a generic crash.
Dynamic Mode for Parallel Execution
The static_order() method gives you one valid sequence, but it is inherently serial. If some tasks can run in parallel (because they have no dependency between them), you are leaving performance on the table. graphlib provides a dynamic mode for exactly this use case.
In dynamic mode, you call prepare() first, then iterate using get_ready() and done(). The sorter yields batches of tasks that are currently ready to execute (all their dependencies have been marked done), allowing you to dispatch them concurrently.
# parallel_sorter.py
from graphlib import TopologicalSorter
import concurrent.futures
import time
def run_task(task_name):
"""Simulate a task with a short delay."""
print(f" Running: {task_name}")
time.sleep(0.1)
return task_name
graph = {
"deploy": {"build", "test"},
"build": {"lint", "compile"},
"test": {"compile"},
"compile": {"setup"},
"lint": {"setup"},
"setup": set(),
}
sorter = TopologicalSorter(graph)
sorter.prepare()
with concurrent.futures.ThreadPoolExecutor() as pool:
futures = {}
while sorter.is_active():
# Get all tasks whose dependencies are done
for task in sorter.get_ready():
future = pool.submit(run_task, task)
futures[future] = task
# Wait for any task to finish, then mark it done
done_futures, _ = concurrent.futures.wait(
futures, return_when=concurrent.futures.FIRST_COMPLETED
)
for future in done_futures:
task = futures.pop(future)
sorter.done(task)
print(f" Completed: {task}")
print("All tasks finished.")
Output:
Running: setup
Completed: setup
Running: lint
Running: compile
Completed: lint
Completed: compile
Running: build
Running: test
Completed: test
Completed: build
Running: deploy
Completed: deploy
All tasks finished.
The dynamic mode correctly dispatches lint and compile in parallel once setup finishes, then build and test in parallel once compile finishes. You get maximum parallelism automatically based on the graph structure.

Real-Life Example: Package Dependency Resolver
Let’s build a simplified package dependency resolver — the kind of logic that sits at the heart of pip, npm, and apt. Given a list of packages with their dependencies, it outputs a valid installation order and detects any circular dependencies.
# package_resolver.py
from graphlib import TopologicalSorter, CycleError
from dataclasses import dataclass, field
from typing import Dict, Set, List
@dataclass
class Package:
name: str
version: str
requires: Set[str] = field(default_factory=set)
def resolve_install_order(packages: List[Package]) -> List[str]:
"""
Given a list of packages, return a valid installation order
that satisfies all dependencies.
Raises CycleError if circular dependencies are detected.
Raises ValueError if an unknown package is referenced.
"""
known = {pkg.name for pkg in packages}
# Validate: check all dependencies exist in the package list
for pkg in packages:
for dep in pkg.requires:
if dep not in known:
raise ValueError(
f"Package '{pkg.name}' depends on '{dep}', "
f"which is not in the package list"
)
# Build graph: each package depends on its requires set
graph: Dict[str, Set[str]] = {
pkg.name: pkg.requires for pkg in packages
}
try:
sorter = TopologicalSorter(graph)
return list(sorter.static_order())
except CycleError as e:
cycle_nodes = e.args[1]
raise CycleError(
f"Circular dependency detected involving: "
f"{' -> '.join(cycle_nodes)}"
)
# --- Example usage ---
packages = [
Package("myapp", "1.0.0", requires={"requests", "sqlalchemy"}),
Package("requests", "2.31.0", requires={"urllib3", "certifi"}),
Package("sqlalchemy","2.0.0", requires={"greenlet"}),
Package("urllib3", "2.0.7", requires=set()),
Package("certifi", "2024.2", requires=set()),
Package("greenlet", "3.0.0", requires=set()),
]
order = resolve_install_order(packages)
print("Installation order:")
for i, pkg in enumerate(order, 1):
print(f" {i}. {pkg}")
Output:
Installation order:
1. urllib3
2. certifi
3. greenlet
4. requests
5. sqlalchemy
6. myapp
This resolver validates that all referenced packages exist, builds the dependency graph from the package metadata, and uses TopologicalSorter to produce a valid installation order. The same pattern applies to build systems, CI pipeline stages, database migration runners, and microservice startup orchestrators. Extend it by adding version conflict detection or by switching to dynamic mode for parallel installs.

Frequently Asked Questions
What Python version does graphlib require?
The graphlib module was added in Python 3.9. If you are on Python 3.8 or earlier, you will need to either upgrade or implement topological sorting manually (using DFS or Kahn’s algorithm). On Python 3.9+, it is part of the standard library with no installation needed.
When should I use static_order() vs dynamic mode?
Use static_order() when you need a simple serial execution order and do not care about parallelism — it is one line and very readable. Use dynamic mode (prepare() + get_ready() + done()) when you want to run independent tasks concurrently. Dynamic mode requires more code but gives you maximum throughput on multi-core systems.
Are there multiple valid topological orders?
Yes. For most graphs there are several valid orderings. For example, if A and B both depend only on C, you can run A before B or B before A — both are correct. graphlib does not guarantee which valid order it returns, so do not write code that relies on a specific tie-breaking behavior. If you need a deterministic order among independent nodes, sort the node names before adding them to the graph.
How do I handle CycleError gracefully?
Catch graphlib.CycleError and read e.args[1] to get the list of nodes forming the cycle. Format this into a human-readable error message that tells the user which tasks are in a circular dependency. Never swallow the exception silently — a circular dependency is a configuration bug that must be fixed, not ignored.
Does graphlib scale to large graphs?
Yes. TopologicalSorter uses Kahn’s algorithm internally, which runs in O(V + E) time where V is the number of nodes and E is the number of edges. This scales linearly and handles graphs with thousands of nodes efficiently. For extremely large graphs (millions of nodes), consider a dedicated graph library like NetworkX, but for typical build system and dependency resolution use cases, graphlib is more than adequate.
Conclusion
The graphlib module gives you a clean, standard-library solution to dependency resolution. We covered TopologicalSorter with both static_order() for simple serial ordering and dynamic mode for parallel task scheduling. We used CycleError to catch circular dependencies before they cause subtle runtime bugs, and built a complete package dependency resolver as a real-world example.
Try extending the package resolver to detect version conflicts, or wire the parallel scheduler to asyncio tasks instead of threads. The dynamic mode API maps naturally to async coroutines — just replace the ThreadPoolExecutor with asyncio.gather.
For the full API reference, see the Python graphlib documentation.