Back to Notes

Heap & Priority Queue

When to Use

  • "Top K", "Kth largest/smallest", "K most frequent"
  • Need to repeatedly get min or max efficiently
  • Merging multiple sorted streams
  • Median in a dynamic stream
  • Minimize total cost by always combining smallest

Signal phrases: "top K", "Kth largest/smallest", "K frequent", "merge K sorted", "median from stream", "minimum cost to connect"


Array Representation (How Heaps Are Stored)

Heap = complete binary tree stored as array. No gaps → efficient index math.

For element at index i (0-based):

  • Left child: 2i + 1
  • Right child: 2i + 2
  • Parent: (i - 1) // 2

Interviewers ask "implement priority queue from scratch" — these formulas are the only way.


Heapify: O(N) vs O(N log N) — Interview Trap

MethodComplexityWhen
Build from array (heapq.heapify(arr))O(N)Have all elements upfront
Insert one by one (heappush N times)O(N log N)Elements arrive one at a time

Interview tip: Given a static array, heapify first O(N) then poll K times O(K log N) = O(N + K log N). Faster than inserting one by one O(N log N).


Time Complexity Cheat Sheet

OperationComplexityReason
Insert (heappush)O(log N)Bubble up to maintain heap property
Remove min/max (heappop)O(log N)Bubble down new root
Peek (heap[0])O(1)Root always at index 0
SearchO(N)Not a BST — no order between siblings
Build (heapify)O(N)Bottom-up sift is amortized linear
SpaceO(N)Store all elements

Python heapq Basics

import heapq

# Min-heap (default)
heap = [3, 1, 2]
heapq.heapify(heap)        # O(N) in-place
heapq.heappush(heap, 0)    # O(log N)
heapq.heappop(heap)        # O(log N) — returns smallest
heap[0]                    # O(1) peek

# Max-heap: negate values
heapq.heappush(heap, -val)
max_val = -heapq.heappop(heap)

# Tuple heap — sorted by first element
heapq.heappush(heap, (freq, num))   # compare freq first

# heapreplace = pop + push in one operation (faster than separate)
heapq.heapreplace(heap, new_val)

Pattern 1 — Top K Elements

Keep a min-heap of size K. If next element > heap root → replace root. Final heap contains K largest.

import heapq
from collections import Counter

# LC 215 — Kth Largest Element
def findKthLargest(nums, k):
    heap = []
    for n in nums:
        heapq.heappush(heap, n)
        if len(heap) > k:
            heapq.heappop(heap)   # evict smallest
    return heap[0]    # Kth largest = min of top-K

# LC 347 — Top K Frequent Elements
def topKFrequent(nums, k):
    count = Counter(nums)
    heap = []
    for num, freq in count.items():
        heapq.heappush(heap, (freq, num))
        if len(heap) > k:
            heapq.heappop(heap)
    return [num for freq, num in heap]

# LC 973 — K Closest Points to Origin
def kClosest(points, k):
    heap = []
    for x, y in points:
        dist = -(x*x + y*y)     # negate for max-heap behavior
        heapq.heappush(heap, (dist, x, y))
        if len(heap) > k:
            heapq.heappop(heap)
    return [[x, y] for _, x, y in heap]

Time: O(N log K) — heap stays size K.


Pattern 2 — Two Heaps (Median in Data Stream)

Max-heap (left half) + min-heap (right half). Median = top of max-heap or avg of both tops.

import heapq

# LC 295 — Find Median from Data Stream
class MedianFinder:
    def __init__(self):
        self.small = []   # max-heap (negated) — stores smaller half
        self.large = []   # min-heap — stores larger half

    def addNum(self, num):
        heapq.heappush(self.small, -num)
        # Ensure small's max <= large's min
        if self.small and self.large and (-self.small[0] > self.large[0]):
            heapq.heappush(self.large, -heapq.heappop(self.small))
        # Balance sizes — small can have at most 1 extra
        if len(self.small) > len(self.large) + 1:
            heapq.heappush(self.large, -heapq.heappop(self.small))
        if len(self.large) > len(self.small):
            heapq.heappush(self.small, -heapq.heappop(self.large))

    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]
        return (-self.small[0] + self.large[0]) / 2

Key invariant: len(small) == len(large) or len(small) == len(large) + 1.


Pattern 3 — Merge K Sorted (Min-Heap with Source Tracking)

Push first element of each list. Pop min, push next from the same list.

import heapq

# LC 23 — Merge K Sorted Lists
def mergeKLists(lists):
    heap = []
    for i, node in enumerate(lists):
        if node:
            heapq.heappush(heap, (node.val, i, node))

    dummy = ListNode(0)
    curr = dummy
    while heap:
        val, i, node = heapq.heappop(heap)
        curr.next = node
        curr = curr.next
        if node.next:
            heapq.heappush(heap, (node.next.val, i, node.next))

    return dummy.next

# For arrays: push (value, list_index, element_index)
def mergeKArrays(arrays):
    result = []
    heap = [(arrays[i][0], i, 0) for i in range(len(arrays)) if arrays[i]]
    heapq.heapify(heap)
    while heap:
        val, i, j = heapq.heappop(heap)
        result.append(val)
        if j + 1 < len(arrays[i]):
            heapq.heappush(heap, (arrays[i][j+1], i, j+1))
    return result

Time: O(N log K) where N = total elements, K = number of lists.


Pattern 4 — Frequency Sort / Reorganize

Count frequencies → max-heap by frequency → greedily place most-frequent first.

import heapq
from collections import Counter

# LC 767 — Reorganize String (no two adjacent chars same)
def reorganizeString(s):
    count = Counter(s)
    max_heap = [(-freq, char) for char, freq in count.items()]
    heapq.heapify(max_heap)

    result = []
    prev_freq, prev_char = 0, ''

    while max_heap or prev_freq < 0:
        if not max_heap:
            return ""    # impossible — prev char can't be placed
        freq, char = heapq.heappop(max_heap)
        result.append(char)
        if prev_freq < 0:
            heapq.heappush(max_heap, (prev_freq, prev_char))
        prev_freq, prev_char = freq + 1, char    # decrement count

    return ''.join(result)

# LC 451 — Sort Characters by Frequency
def frequencySort(s):
    count = Counter(s)
    max_heap = [(-freq, char) for char, freq in count.items()]
    heapq.heapify(max_heap)
    result = []
    while max_heap:
        freq, char = heapq.heappop(max_heap)
        result.append(char * (-freq))
    return ''.join(result)

When NOT to Use a Heap

NeedUse InsteadWhy
Closest value to XSortedList (Python sortedcontainers)Heap only knows absolute min/max
Sorted order frequentlySortedListExtracting all from heap destroys it
Search for elementset / dictHeap search is O(N), not O(1)
Range queriesSegment tree / BITHeap has no range structure

Problems

Top K Pattern

ProblemLCKey
Kth Largest Element#215Min-heap size K
K Closest Points to Origin#973Max-heap size K on dist
Top K Frequent Elements#347Counter + min-heap size K
Top K Frequent Words#692Break ties by lexicographic order
Kth Largest in a Stream#703Maintain heap of size K
Reorganize String#767Max-heap by freq
Frequency Sort#451Max-heap by freq

Merge K Sorted Pattern

ProblemLCKey
Merge K Sorted Lists#23Heap with (val, list_idx, node)
K Pairs with Smallest Sums#373Heap initialized with first column
Kth Smallest in Sorted Matrix#378Heap + row advance

Two Heaps Pattern

ProblemLCKey
Find Median from Data Stream#295Max-heap + min-heap balanced
Sliding Window Median#480Two heaps + lazy removal
Maximize Capital (IPO)#502Two heaps: one by capital, one by profit

Minimum Number Pattern (Connect/Schedule)

ProblemLCKey
Min Cost to Connect Sticks#1167Merge two smallest repeatedly
Meeting Rooms II#253Min-heap of end times
Task Scheduler#621Max-heap + cooldown queue
Minimum Cost to Hire K Workers#857Sort by ratio + min-heap

Related

  • [[DSA/DSA Techniques for Interviews/13. Greedy]] — greedy + heap combos (task scheduler, connect ropes)
  • [[DSA/Data Structures/Trees & BST]] — heap is a complete binary tree
  • [[DSA/DSA Topics]] — index