Heap & Priority Queue

When to Use

"Top K", "Kth largest/smallest", "K most frequent"
Need to repeatedly get min or max efficiently
Merging multiple sorted streams
Median in a dynamic stream
Minimize total cost by always combining smallest

Signal phrases: "top K", "Kth largest/smallest", "K frequent", "merge K sorted", "median from stream", "minimum cost to connect"

Array Representation (How Heaps Are Stored)

Heap = complete binary tree stored as array. No gaps → efficient index math.

For element at index i (0-based):

Left child: 2i + 1
Right child: 2i + 2
Parent: (i - 1) // 2

Interviewers ask "implement priority queue from scratch" — these formulas are the only way.

Heapify: O(N) vs O(N log N) — Interview Trap

Method	Complexity	When
Build from array (`heapq.heapify(arr)`)	O(N)	Have all elements upfront
Insert one by one (`heappush` N times)	O(N log N)	Elements arrive one at a time

Interview tip: Given a static array, heapify first O(N) then poll K times O(K log N) = O(N + K log N). Faster than inserting one by one O(N log N).

Time Complexity Cheat Sheet

Operation	Complexity	Reason
Insert (`heappush`)	O(log N)	Bubble up to maintain heap property
Remove min/max (`heappop`)	O(log N)	Bubble down new root
Peek (`heap[0]`)	O(1)	Root always at index 0
Search	O(N)	Not a BST — no order between siblings
Build (`heapify`)	O(N)	Bottom-up sift is amortized linear
Space	O(N)	Store all elements

Python heapq Basics

import heapq

# Min-heap (default)
heap = [3, 1, 2]
heapq.heapify(heap)        # O(N) in-place
heapq.heappush(heap, 0)    # O(log N)
heapq.heappop(heap)        # O(log N) — returns smallest
heap[0]                    # O(1) peek

# Max-heap: negate values
heapq.heappush(heap, -val)
max_val = -heapq.heappop(heap)

# Tuple heap — sorted by first element
heapq.heappush(heap, (freq, num))   # compare freq first

# heapreplace = pop + push in one operation (faster than separate)
heapq.heapreplace(heap, new_val)

Pattern 1 — Top K Elements

Keep a min-heap of size K. If next element > heap root → replace root. Final heap contains K largest.

import heapq
from collections import Counter

# LC 215 — Kth Largest Element
def findKthLargest(nums, k):
    heap = []
    for n in nums:
        heapq.heappush(heap, n)
        if len(heap) > k:
            heapq.heappop(heap)   # evict smallest
    return heap[0]    # Kth largest = min of top-K

# LC 347 — Top K Frequent Elements
def topKFrequent(nums, k):
    count = Counter(nums)
    heap = []
    for num, freq in count.items():
        heapq.heappush(heap, (freq, num))
        if len(heap) > k:
            heapq.heappop(heap)
    return [num for freq, num in heap]

# LC 973 — K Closest Points to Origin
def kClosest(points, k):
    heap = []
    for x, y in points:
        dist = -(x*x + y*y)     # negate for max-heap behavior
        heapq.heappush(heap, (dist, x, y))
        if len(heap) > k:
            heapq.heappop(heap)
    return [[x, y] for _, x, y in heap]

Time: O(N log K) — heap stays size K.

Pattern 2 — Two Heaps (Median in Data Stream)

Max-heap (left half) + min-heap (right half). Median = top of max-heap or avg of both tops.

import heapq

# LC 295 — Find Median from Data Stream
class MedianFinder:
    def __init__(self):
        self.small = []   # max-heap (negated) — stores smaller half
        self.large = []   # min-heap — stores larger half

    def addNum(self, num):
        heapq.heappush(self.small, -num)
        # Ensure small's max <= large's min
        if self.small and self.large and (-self.small[0] > self.large[0]):
            heapq.heappush(self.large, -heapq.heappop(self.small))
        # Balance sizes — small can have at most 1 extra
        if len(self.small) > len(self.large) + 1:
            heapq.heappush(self.large, -heapq.heappop(self.small))
        if len(self.large) > len(self.small):
            heapq.heappush(self.small, -heapq.heappop(self.large))

    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]
        return (-self.small[0] + self.large[0]) / 2

Key invariant: len(small) == len(large) or len(small) == len(large) + 1.

Pattern 3 — Merge K Sorted (Min-Heap with Source Tracking)

Push first element of each list. Pop min, push next from the same list.

import heapq

# LC 23 — Merge K Sorted Lists
def mergeKLists(lists):
    heap = []
    for i, node in enumerate(lists):
        if node:
            heapq.heappush(heap, (node.val, i, node))

    dummy = ListNode(0)
    curr = dummy
    while heap:
        val, i, node = heapq.heappop(heap)
        curr.next = node
        curr = curr.next
        if node.next:
            heapq.heappush(heap, (node.next.val, i, node.next))

    return dummy.next

# For arrays: push (value, list_index, element_index)
def mergeKArrays(arrays):
    result = []
    heap = [(arrays[i][0], i, 0) for i in range(len(arrays)) if arrays[i]]
    heapq.heapify(heap)
    while heap:
        val, i, j = heapq.heappop(heap)
        result.append(val)
        if j + 1 < len(arrays[i]):
            heapq.heappush(heap, (arrays[i][j+1], i, j+1))
    return result

Time: O(N log K) where N = total elements, K = number of lists.

Pattern 4 — Frequency Sort / Reorganize

Count frequencies → max-heap by frequency → greedily place most-frequent first.

import heapq
from collections import Counter

# LC 767 — Reorganize String (no two adjacent chars same)
def reorganizeString(s):
    count = Counter(s)
    max_heap = [(-freq, char) for char, freq in count.items()]
    heapq.heapify(max_heap)

    result = []
    prev_freq, prev_char = 0, ''

    while max_heap or prev_freq < 0:
        if not max_heap:
            return ""    # impossible — prev char can't be placed
        freq, char = heapq.heappop(max_heap)
        result.append(char)
        if prev_freq < 0:
            heapq.heappush(max_heap, (prev_freq, prev_char))
        prev_freq, prev_char = freq + 1, char    # decrement count

    return ''.join(result)

# LC 451 — Sort Characters by Frequency
def frequencySort(s):
    count = Counter(s)
    max_heap = [(-freq, char) for char, freq in count.items()]
    heapq.heapify(max_heap)
    result = []
    while max_heap:
        freq, char = heapq.heappop(max_heap)
        result.append(char * (-freq))
    return ''.join(result)

When NOT to Use a Heap

Need	Use Instead	Why
Closest value to X	`SortedList` (Python `sortedcontainers`)	Heap only knows absolute min/max
Sorted order frequently	`SortedList`	Extracting all from heap destroys it
Search for element	`set` / `dict`	Heap search is O(N), not O(1)
Range queries	Segment tree / BIT	Heap has no range structure

Problems

Top K Pattern

Problem	LC	Key
Kth Largest Element	#215	Min-heap size K
K Closest Points to Origin	#973	Max-heap size K on dist
Top K Frequent Elements	#347	Counter + min-heap size K
Top K Frequent Words	#692	Break ties by lexicographic order
Kth Largest in a Stream	#703	Maintain heap of size K
Reorganize String	#767	Max-heap by freq
Frequency Sort	#451	Max-heap by freq

Merge K Sorted Pattern

Problem	LC	Key
Merge K Sorted Lists	#23	Heap with (val, list_idx, node)
K Pairs with Smallest Sums	#373	Heap initialized with first column
Kth Smallest in Sorted Matrix	#378	Heap + row advance

Two Heaps Pattern

Problem	LC	Key
Find Median from Data Stream	#295	Max-heap + min-heap balanced
Sliding Window Median	#480	Two heaps + lazy removal
Maximize Capital (IPO)	#502	Two heaps: one by capital, one by profit

Minimum Number Pattern (Connect/Schedule)

Problem	LC	Key
Min Cost to Connect Sticks	#1167	Merge two smallest repeatedly
Meeting Rooms II	#253	Min-heap of end times
Task Scheduler	#621	Max-heap + cooldown queue
Minimum Cost to Hire K Workers	#857	Sort by ratio + min-heap

[[DSA/DSA Techniques for Interviews/13. Greedy]] — greedy + heap combos (task scheduler, connect ropes)
[[DSA/Data Structures/Trees & BST]] — heap is a complete binary tree
[[DSA/DSA Topics]] — index