Back to Notes

Arrays & Hashing

When to Use

  • Need O(1) lookup (does X exist? how many times?)
  • Reducing O(N²) brute force to O(N) by trading space for time
  • Grouping elements by a shared property (anagrams, frequencies)
  • Finding relationships between elements (complement, difference, pair)

Signal phrases: "two sum", "duplicate", "anagram", "group by", "frequency", "count", "consecutive", "seen before", "unique"


Pattern 1 — Frequency Map

Count occurrences, then query.

from collections import Counter, defaultdict

# Counter (cleanest)
freq = Counter(nums)          # Counter({'a': 3, 'b': 1})
freq['z']                     # 0 if not present — no KeyError

# defaultdict (explicit control)
freq = defaultdict(int)
for n in nums:
    freq[n] += 1

# Manual dict (most explicit)
freq = {}
for n in nums:
    freq[n] = freq.get(n, 0) + 1

Problems: Valid Anagram (Counter(s) == Counter(t)), Group Anagrams, Top K Frequent Elements


Pattern 2 — Complement Lookup (Two Sum)

Store seen elements. For each new element, check if its complement already exists.

# LC 1 — Two Sum — O(N) time, O(N) space
def twoSum(nums, target):
    seen = {}  # value -> index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

Key insight: Process left-to-right. By the time you check complement in seen, you've never added the current element yet — so you can't accidentally pair an element with itself.


Pattern 3 — Grouping by Canonical Key

Map elements that share a property to the same key. Most common keys: sorted tuple, character-count tuple.

# LC 49 — Group Anagrams
from collections import defaultdict

def groupAnagrams(strs):
    groups = defaultdict(list)
    for word in strs:
        key = tuple(sorted(word))     # "eat" → ('a','e','t')
        groups[key].append(word)
    return list(groups.values())

# Optimal key — count array avoids O(M log M) sort per word
def groupAnagramsOptimal(strs):
    groups = defaultdict(list)
    for word in strs:
        count = [0] * 26
        for c in word:
            count[ord(c) - ord('a')] += 1
        groups[tuple(count)].append(word)  # hashable
    return list(groups.values())

Pattern 4 — Set Membership + Sequence Head Trick

Build a set for O(1) membership. Only start processing from "heads" of sequences to avoid O(N²).

# LC 128 — Longest Consecutive Sequence — O(N)
def longestConsecutive(nums):
    num_set = set(nums)
    longest = 0

    for n in num_set:
        if n - 1 not in num_set:       # only process sequence heads
            length = 1
            while n + length in num_set:
                length += 1
            longest = max(longest, length)

    return longest

Key insight: n - 1 not in num_set skips elements that are in the middle of a sequence. Without this guard, every element triggers a full count → O(N²) worst case.


Key Problems

ProblemPatternLCOne-liner / Key
Contains DuplicateSet#217len(set(nums)) != len(nums)
Valid AnagramFrequency map#242Counter(s) == Counter(t)
Two SumComplement lookup#1Store index in hashmap
Group AnagramsCanonical key#49tuple(sorted(word))
Top K Frequent ElementsFreq + min-heap#347Counter + heapq size K
Product of Array Except SelfPrefix + suffix#238No division — see [[Prefix Sum]]
Longest Consecutive SequenceSet + head trick#128Skip non-heads
Valid Sudoku3 sets per constraint#36Row/col/box tracking
Encode and Decode StringsLength-prefix#271"4#cats3#dog"

Complexity

OperationAverageWorst
dict / set lookupO(1)O(N) hash collision
Counter(s)O(M)O(M)
Sorted-tuple keyO(M log M) per word
Count-tuple keyO(M) per word

HashSet vs HashMap: Use set for existence check only. Use dict when you need to store a value alongside (index, count, group).


Related

  • [[DSA/DSA Techniques for Interviews/02. Two Pointers]] — pairs with hashing for O(N) pair problems
  • [[DSA/DSA Techniques for Interviews/17. Prefix Sum]] — alternative for range-sum problems
  • [[DSA/Data Structures/Hash Map]] — data structure internals