Intermediate

When you need to repeatedly pull the highest- or lowest-priority item from a collection, a regular sorted list becomes a performance trap. Every time you add an item and re-sort, you’re doing O(n log n) work. A heap structure reduces that to O(log n) per insertion and O(log n) per removal — a massive difference when processing thousands of tasks, events, or jobs. Python’s built-in heapq module provides a min-heap implementation directly on top of ordinary Python lists, with no external dependencies required.

Priority queues show up everywhere in real software: task schedulers process the most urgent job first, Dijkstra’s algorithm uses a heap to find the shortest path, event simulation systems process events by timestamp, and streaming data pipelines use heaps to track the top-N results efficiently. Understanding heapq gives you the foundation for all of these patterns.

In this tutorial, we’ll cover how a heap works internally, how to use heapq.heappush() and heappop() to build a priority queue, how to find the N largest or smallest items efficiently with nlargest() and nsmallest(), how to handle custom priority objects, and a practical task scheduler project that ties it all together.

Quick Example: Priority Queue in 10 Lines

Here’s a minimal working priority queue using heapq:

# heapq_quick.py
import heapq

tasks = []
heapq.heappush(tasks, (3, 'Low priority task'))
heapq.heappush(tasks, (1, 'Critical task'))
heapq.heappush(tasks, (2, 'Medium priority task'))

while tasks:
    priority, task = heapq.heappop(tasks)
    print(f"[Priority {priority}] {task}")

Output:

[Priority 1] Critical task
[Priority 2] Medium priority task
[Priority 3] Low priority task

Tasks always come out in ascending priority order (lowest number = highest priority). The heap maintains this invariant automatically as you push and pop — you never need to sort manually. The tuple format (priority, item) is the standard pattern: Python compares tuples element by element, so the first element determines ordering.

What Is a Heap and How Does heapq Work?

A heap is a binary tree stored as a flat list where every parent node is smaller than or equal to its children. Python’s heapq implements a min-heap — the smallest element is always at index 0. This property lets the module find and remove the minimum in O(1) time and restore the heap property after a push or pop in O(log n) time.

The heap is stored as a regular Python list — there’s no special class or data structure. You just use the heapq functions to maintain the heap invariant on that list. You can inspect the raw list at any time, though the order looks scrambled because the heap is not fully sorted:

# heap_internals.py
import heapq

nums = [5, 3, 8, 1, 4, 2, 7]
heapq.heapify(nums)  # Convert list to heap in-place, O(n)

print("Heap:", nums)   # [1, 3, 2, 5, 4, 8, 7] -- heap-ordered, not sorted
print("Min:", nums[0]) # 1 -- always the smallest

heapq.heappush(nums, 0)
print("After push 0:", nums)
print("New min:", nums[0])  # 0

Output:

Heap: [1, 3, 2, 5, 4, 8, 7]
Min: 1
After push 0: [0, 3, 1, 5, 4, 8, 7, 2]  (positions may vary)
New min: 0

heapq.heapify() converts an existing list into a valid heap in O(n) time — much faster than pushing elements one by one. Use it when you’re starting from a collection you’ve already built.

FunctionDescriptionTime Complexity
heapify(list)Convert list to heap in-placeO(n)
heappush(heap, item)Push item onto heapO(log n)
heappop(heap)Pop smallest itemO(log n)
heappushpop(heap, item)Push then pop (faster than two calls)O(log n)
heapreplace(heap, item)Pop then push (faster than two calls)O(log n)
nlargest(n, iterable)Find n largest itemsO(k log n)
nsmallest(n, iterable)Find n smallest itemsO(k log n)

Finding Top-N Items with nlargest and nsmallest

Two of the most useful heapq functions are nlargest() and nsmallest(). They return the N highest or lowest items from any iterable, using a heap internally to avoid sorting the full dataset. This is significantly faster than sorted(data)[-n:] when n is small relative to the dataset size.

# top_n.py
import heapq

scores = [45, 89, 23, 67, 91, 12, 78, 55, 34, 88, 100, 3, 72]

top3 = heapq.nlargest(3, scores)
bottom3 = heapq.nsmallest(3, scores)

print("Top 3 scores:    ", top3)    # [100, 91, 89]
print("Bottom 3 scores: ", bottom3) # [3, 12, 23]

# Works with any iterable, supports key function
students = [
    {'name': 'Alice', 'gpa': 3.9},
    {'name': 'Bob',   'gpa': 3.5},
    {'name': 'Carol', 'gpa': 3.7},
    {'name': 'Dave',  'gpa': 3.2},
    {'name': 'Eve',   'gpa': 3.8},
]

top2 = heapq.nlargest(2, students, key=lambda s: s['gpa'])
print("Top 2 students:", [s['name'] for s in top2])  # ['Alice', 'Eve']

Output:

Top 3 scores:     [100, 91, 89]
Bottom 3 scores:  [3, 12, 23]
Top 2 students: ['Alice', 'Eve']

The key parameter works just like sorted() — pass any callable that extracts the comparison value. For very small N (like 1), min() and max() are faster. For N close to the length of the dataset, sorted() is faster. For everything in between, nlargest() and nsmallest() are the right tool.

Custom Priority Objects

When items in your priority queue are complex objects (not simple numbers), you need a way to tell the heap how to compare them. The cleanest approach is to push tuples of (priority, counter, item). The counter breaks ties in priority and prevents Python from falling back to comparing the items themselves (which would fail if items don’t support comparison).

# priority_objects.py
import heapq
import itertools

class Task:
    def __init__(self, name, priority):
        self.name = name
        self.priority = priority
    def __repr__(self):
        return f"Task({self.name!r}, priority={self.priority})"

class PriorityQueue:
    def __init__(self):
        self._heap = []
        self._counter = itertools.count()  # tie-breaker

    def push(self, item, priority):
        # Lower priority number = processed first
        count = next(self._counter)
        heapq.heappush(self._heap, (priority, count, item))

    def pop(self):
        priority, count, item = heapq.heappop(self._heap)
        return item

    def __len__(self):
        return len(self._heap)

pq = PriorityQueue()
pq.push(Task('Deploy database migration', 2), priority=2)
pq.push(Task('Fix critical security bug', 1), priority=1)
pq.push(Task('Update documentation', 3), priority=3)
pq.push(Task('Restart hung service', 1), priority=1)  # tie on priority

print("Processing tasks:")
while pq:
    print(" -", pq.pop())

Output:

Processing tasks:
 - Task('Fix critical security bug', priority=1)
 - Task('Restart hung service', priority=1)
 - Task('Deploy database migration', priority=2)
 - Task('Update documentation', priority=3)

The itertools.count() counter ensures that when two items share the same priority, they come out in FIFO order (first-in, first-out). Without the counter, Python would try to compare the Task objects directly, which would raise a TypeError since Task doesn’t define __lt__. Using a counter is the idiomatic pattern recommended in the official heapq documentation.

Simulating a Max-Heap

Python’s heapq only supports min-heaps. To simulate a max-heap (where the largest item comes out first), negate the priority values before pushing:

# max_heap.py
import heapq

scores = [50, 80, 30, 95, 60]
max_heap = []

for score in scores:
    heapq.heappush(max_heap, -score)  # negate to invert order

print("Scores in descending order:")
while max_heap:
    score = -heapq.heappop(max_heap)  # un-negate on pop
    print(score, end=' ')
print()

Output:

Scores in descending order:
95 80 60 50 30

Negating integers and floats works cleanly. For strings or complex objects, this trick doesn’t apply directly — use the tuple-with-counter pattern instead and negate the numeric priority component.

Real-Life Example: Job Scheduler with Aging

A realistic priority queue for a job scheduler that implements “aging” — low-priority jobs gradually increase in priority the longer they wait, preventing starvation:

# job_scheduler.py
import heapq
import itertools
import time

class JobScheduler:
    def __init__(self):
        self._heap = []
        self._counter = itertools.count()
        self._aging_interval = 5   # seconds before priority boost
        self._aging_boost = 1      # boost amount per interval

    def submit(self, job_name, priority):
        count = next(self._counter)
        submit_time = time.time()
        heapq.heappush(self._heap, (priority, count, job_name, submit_time))
        print(f"  Submitted '{job_name}' with priority {priority}")

    def _effective_priority(self, priority, count, job_name, submit_time):
        """Reduce priority number (boost urgency) based on wait time."""
        wait_seconds = time.time() - submit_time
        boosts = int(wait_seconds / self._aging_interval)
        return max(0, priority - (boosts * self._aging_boost))

    def process_next(self):
        if not self._heap:
            return None
        # Re-heapify with effective priorities
        updated = []
        for entry in self._heap:
            priority, count, name, submit_time = entry
            eff = self._effective_priority(priority, count, name, submit_time)
            updated.append((eff, count, name, submit_time))
        heapq.heapify(updated)
        eff_priority, count, name, submit_time = heapq.heappop(updated)
        self._heap = updated
        return name, eff_priority

# Simulate scheduler
scheduler = JobScheduler()
print("Submitting jobs:")
scheduler.submit("Generate monthly report", priority=5)
scheduler.submit("Fix login page crash", priority=1)
scheduler.submit("Update user profile endpoint", priority=3)
scheduler.submit("Archive old logs", priority=7)

print("\nProcessing in priority order:")
for _ in range(4):
    result = scheduler.process_next()
    if result:
        name, eff_priority = result
        print(f"  Processing: '{name}' (effective priority: {eff_priority})")

Output:

Submitting jobs:
  Submitted 'Generate monthly report' with priority 5
  Submitted 'Fix login page crash' with priority 1
  Submitted 'Update user profile endpoint' with priority 3
  Submitted 'Archive old logs' with priority 7

Processing in priority order:
  Processing: 'Fix login page crash' (effective priority: 1)
  Processing: 'Update user profile endpoint' (effective priority: 3)
  Processing: 'Generate monthly report' (effective priority: 5)
  Processing: 'Archive old logs' (effective priority: 7)

This scheduler can be extended to run in a background thread that continuously calls process_next() and dispatches work to a thread pool. The aging mechanism ensures that a high-priority flood of urgent jobs doesn’t permanently starve low-priority maintenance tasks.

Frequently Asked Questions

When should I use heapq vs sorted()?

Use heapq when you’re repeatedly adding items and pulling the minimum — like a task queue or streaming algorithm. Use sorted() when you need the full collection in order, or when you process everything at once. Heaps shine in incremental “add some, pop the min, add more” workloads. If you’re only sorting once, sorted() is simpler and fast enough.

Is heapq thread-safe?

No. heapq operations on a plain list are not thread-safe. If multiple threads push and pop concurrently, you’ll get race conditions and heap corruption. For thread-safe priority queues, use queue.PriorityQueue from the standard library, which wraps a heap with locking. asyncio.PriorityQueue is the equivalent for async code.

When is nlargest faster than sorted()?

nlargest(k, data) uses an internal heap of size k, making it O(n log k). It’s faster than sorted(data, reverse=True)[:k] (which is O(n log n)) when k is much smaller than n. For a list of 1,000,000 items, finding the top 10 with nlargest is dramatically faster. But if k is close to n, sorted() wins because it can use Timsort’s optimizations.

What is heappushpop and when should I use it?

heapq.heappushpop(heap, item) pushes an item onto the heap and then pops the smallest item, all in one operation. It’s more efficient than calling heappush followed by heappop because it avoids an extra heap-property restoration step. It’s particularly useful for maintaining a fixed-size heap of the N largest items seen so far in a stream.

Does heapq preserve insertion order for equal priorities?

Not by default. If two items have the same priority, the heap may return them in any order. To get FIFO behavior for equal priorities, use the tuple pattern (priority, counter, item) where counter is an incrementing integer from itertools.count(). This guarantees equal-priority items are returned in the order they were inserted.

Conclusion

Python’s heapq module gives you an efficient priority queue with O(log n) push and pop directly on a standard Python list. The key functions are heappush(), heappop(), and heapify() for building and using priority queues, plus nlargest() and nsmallest() for finding the top-N items from any iterable without sorting the whole collection.

The real-life job scheduler shows how to combine heapq with itertools.count() for stable ordering and add aging logic to prevent starvation — patterns directly applicable to production systems. You can extend this to distributed task queues, real-time bid processing, or any system where “most important first” processing matters.

For more details, see the official documentation at docs.python.org/3/library/heapq.html. For thread-safe use cases, also read docs.python.org/3/library/queue.html.