Heap & Priority Queue
When to Use
- "Top K", "Kth largest/smallest", "K most frequent"
- Need to repeatedly get min or max efficiently
- Merging multiple sorted streams
- Median in a dynamic stream
- Minimize total cost by always combining smallest
Signal phrases: "top K", "Kth largest/smallest", "K frequent", "merge K sorted", "median from stream", "minimum cost to connect"
Array Representation (How Heaps Are Stored)
Heap = complete binary tree stored as array. No gaps → efficient index math.
For element at index i (0-based):
- Left child:
2i + 1 - Right child:
2i + 2 - Parent:
(i - 1) // 2
Interviewers ask "implement priority queue from scratch" — these formulas are the only way.
Heapify: O(N) vs O(N log N) — Interview Trap
| Method | Complexity | When |
|---|---|---|
Build from array (heapq.heapify(arr)) | O(N) | Have all elements upfront |
Insert one by one (heappush N times) | O(N log N) | Elements arrive one at a time |
Interview tip: Given a static array, heapify first O(N) then poll K times O(K log N) = O(N + K log N). Faster than inserting one by one O(N log N).
Time Complexity Cheat Sheet
| Operation | Complexity | Reason |
|---|---|---|
Insert (heappush) | O(log N) | Bubble up to maintain heap property |
Remove min/max (heappop) | O(log N) | Bubble down new root |
Peek (heap[0]) | O(1) | Root always at index 0 |
| Search | O(N) | Not a BST — no order between siblings |
Build (heapify) | O(N) | Bottom-up sift is amortized linear |
| Space | O(N) | Store all elements |
Python heapq Basics
import heapq
# Min-heap (default)
heap = [3, 1, 2]
heapq.heapify(heap) # O(N) in-place
heapq.heappush(heap, 0) # O(log N)
heapq.heappop(heap) # O(log N) — returns smallest
heap[0] # O(1) peek
# Max-heap: negate values
heapq.heappush(heap, -val)
max_val = -heapq.heappop(heap)
# Tuple heap — sorted by first element
heapq.heappush(heap, (freq, num)) # compare freq first
# heapreplace = pop + push in one operation (faster than separate)
heapq.heapreplace(heap, new_val)
Pattern 1 — Top K Elements
Keep a min-heap of size K. If next element > heap root → replace root. Final heap contains K largest.
import heapq
from collections import Counter
# LC 215 — Kth Largest Element
def findKthLargest(nums, k):
heap = []
for n in nums:
heapq.heappush(heap, n)
if len(heap) > k:
heapq.heappop(heap) # evict smallest
return heap[0] # Kth largest = min of top-K
# LC 347 — Top K Frequent Elements
def topKFrequent(nums, k):
count = Counter(nums)
heap = []
for num, freq in count.items():
heapq.heappush(heap, (freq, num))
if len(heap) > k:
heapq.heappop(heap)
return [num for freq, num in heap]
# LC 973 — K Closest Points to Origin
def kClosest(points, k):
heap = []
for x, y in points:
dist = -(x*x + y*y) # negate for max-heap behavior
heapq.heappush(heap, (dist, x, y))
if len(heap) > k:
heapq.heappop(heap)
return [[x, y] for _, x, y in heap]
Time: O(N log K) — heap stays size K.
Pattern 2 — Two Heaps (Median in Data Stream)
Max-heap (left half) + min-heap (right half). Median = top of max-heap or avg of both tops.
import heapq
# LC 295 — Find Median from Data Stream
class MedianFinder:
def __init__(self):
self.small = [] # max-heap (negated) — stores smaller half
self.large = [] # min-heap — stores larger half
def addNum(self, num):
heapq.heappush(self.small, -num)
# Ensure small's max <= large's min
if self.small and self.large and (-self.small[0] > self.large[0]):
heapq.heappush(self.large, -heapq.heappop(self.small))
# Balance sizes — small can have at most 1 extra
if len(self.small) > len(self.large) + 1:
heapq.heappush(self.large, -heapq.heappop(self.small))
if len(self.large) > len(self.small):
heapq.heappush(self.small, -heapq.heappop(self.large))
def findMedian(self):
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
Key invariant: len(small) == len(large) or len(small) == len(large) + 1.
Pattern 3 — Merge K Sorted (Min-Heap with Source Tracking)
Push first element of each list. Pop min, push next from the same list.
import heapq
# LC 23 — Merge K Sorted Lists
def mergeKLists(lists):
heap = []
for i, node in enumerate(lists):
if node:
heapq.heappush(heap, (node.val, i, node))
dummy = ListNode(0)
curr = dummy
while heap:
val, i, node = heapq.heappop(heap)
curr.next = node
curr = curr.next
if node.next:
heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next
# For arrays: push (value, list_index, element_index)
def mergeKArrays(arrays):
result = []
heap = [(arrays[i][0], i, 0) for i in range(len(arrays)) if arrays[i]]
heapq.heapify(heap)
while heap:
val, i, j = heapq.heappop(heap)
result.append(val)
if j + 1 < len(arrays[i]):
heapq.heappush(heap, (arrays[i][j+1], i, j+1))
return result
Time: O(N log K) where N = total elements, K = number of lists.
Pattern 4 — Frequency Sort / Reorganize
Count frequencies → max-heap by frequency → greedily place most-frequent first.
import heapq
from collections import Counter
# LC 767 — Reorganize String (no two adjacent chars same)
def reorganizeString(s):
count = Counter(s)
max_heap = [(-freq, char) for char, freq in count.items()]
heapq.heapify(max_heap)
result = []
prev_freq, prev_char = 0, ''
while max_heap or prev_freq < 0:
if not max_heap:
return "" # impossible — prev char can't be placed
freq, char = heapq.heappop(max_heap)
result.append(char)
if prev_freq < 0:
heapq.heappush(max_heap, (prev_freq, prev_char))
prev_freq, prev_char = freq + 1, char # decrement count
return ''.join(result)
# LC 451 — Sort Characters by Frequency
def frequencySort(s):
count = Counter(s)
max_heap = [(-freq, char) for char, freq in count.items()]
heapq.heapify(max_heap)
result = []
while max_heap:
freq, char = heapq.heappop(max_heap)
result.append(char * (-freq))
return ''.join(result)
When NOT to Use a Heap
| Need | Use Instead | Why |
|---|---|---|
| Closest value to X | SortedList (Python sortedcontainers) | Heap only knows absolute min/max |
| Sorted order frequently | SortedList | Extracting all from heap destroys it |
| Search for element | set / dict | Heap search is O(N), not O(1) |
| Range queries | Segment tree / BIT | Heap has no range structure |
Problems
Top K Pattern
| Problem | LC | Key |
|---|---|---|
| Kth Largest Element | #215 | Min-heap size K |
| K Closest Points to Origin | #973 | Max-heap size K on dist |
| Top K Frequent Elements | #347 | Counter + min-heap size K |
| Top K Frequent Words | #692 | Break ties by lexicographic order |
| Kth Largest in a Stream | #703 | Maintain heap of size K |
| Reorganize String | #767 | Max-heap by freq |
| Frequency Sort | #451 | Max-heap by freq |
Merge K Sorted Pattern
| Problem | LC | Key |
|---|---|---|
| Merge K Sorted Lists | #23 | Heap with (val, list_idx, node) |
| K Pairs with Smallest Sums | #373 | Heap initialized with first column |
| Kth Smallest in Sorted Matrix | #378 | Heap + row advance |
Two Heaps Pattern
| Problem | LC | Key |
|---|---|---|
| Find Median from Data Stream | #295 | Max-heap + min-heap balanced |
| Sliding Window Median | #480 | Two heaps + lazy removal |
| Maximize Capital (IPO) | #502 | Two heaps: one by capital, one by profit |
Minimum Number Pattern (Connect/Schedule)
| Problem | LC | Key |
|---|---|---|
| Min Cost to Connect Sticks | #1167 | Merge two smallest repeatedly |
| Meeting Rooms II | #253 | Min-heap of end times |
| Task Scheduler | #621 | Max-heap + cooldown queue |
| Minimum Cost to Hire K Workers | #857 | Sort by ratio + min-heap |
Related
- [[DSA/DSA Techniques for Interviews/13. Greedy]] — greedy + heap combos (task scheduler, connect ropes)
- [[DSA/Data Structures/Trees & BST]] — heap is a complete binary tree
- [[DSA/DSA Topics]] — index