Hash Map / Set
Use a hash set for O(1) existence checks. Detect duplicates, find the first missing element, validate uniqueness constraints, or mark visited nodes without sorting.
LeetCode 217 gives you an array of integers and asks whether any value appears at least twice. This is the most direct application of a hash set: as you iterate through the array, check if you have seen the current element before. If yes, return true. If not, add it to the set and keep going.
def containsDuplicate(nums):
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return FalseThis is almost trivially simple, but it illustrates a principle that powers dozens of harder problems.
The reason hash sets matter so much in algorithm design comes down to one fact: checking whether an element exists in a hash set takes average time. Compare this to the alternatives:
This is why so many interview problems that look like they require nested loops can be solved in a single pass with the right data structure. The moment you find yourself writing a nested loop where the inner loop is just searching for something, ask yourself: can I replace that inner loop with a set or map lookup?
LeetCode 128 asks you to find the length of the longest sequence of consecutive integers in an unsorted array. For example, given [100, 4, 200, 1, 3, 2], the longest consecutive sequence is [1, 2, 3, 4], so the answer is 4.
A sorting-based approach would work in , but the problem specifically asks for . The key insight is both simple and clever: only start counting from the beginning of a sequence.
Put all numbers into a set. Then, for each number, check if num - 1 exists in the set. If it does, this number is not the start of a sequence, so skip it. If num - 1 does not exist, then this number is the start of a consecutive sequence, and you count forward: check for num + 1, num + 2, and so on.
First, build the set: {100, 4, 200, 1, 3, 2}.
Longest sequence: 4.
def longestConsecutive(nums):
num_set = set(nums)
best = 0
for num in num_set:
if num - 1 not in num_set:
current = num
length = 1
while current + 1 in num_set:
current += 1
length += 1
best = max(best, length)
return bestAt first glance, the while loop inside the for loop looks like it could be . But look more carefully: the while loop only runs when a number is a sequence start, and each number in the array is visited by the while loop at most once across the entire execution. If a number is part of a sequence of length 5, it is visited once as part of that expansion, never again. Every element is touched at most twice total (once by the for loop, once by a while loop), so the overall time is .
Consider LeetCode 202: a happy number is defined by the following process: starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat until the number either equals 1 (in which case it is happy) or loops endlessly in a cycle (in which case it is not). For example, 19 is happy: , , , .
The hash set approach is straightforward: keep a set of numbers you have already seen. At each step, compute the next number in the sequence. If it is 1, return true. If it is already in the set, you have entered a cycle, so return false. Otherwise, add it to the set and continue.
def isHappy(n):
seen = set()
while n != 1:
if n in seen:
return False
seen.add(n)
n = sum(int(d) ** 2 for d in str(n))
return TrueAn alternative approach uses Floyd's cycle detection (fast/slow pointers) to detect the cycle without extra space, but the set-based solution is simpler and perfectly efficient for this problem since the numbers converge to a small range quickly.
Beyond simple membership testing, sets support powerful bulk operations that are useful in specific problem types:
a & b): elements in both sets. Useful for finding common elements between two collections, such as common characters or shared values.a | b): elements in either set. Useful for merging collections while automatically removing duplicates.a - b): elements in a but not in b. Useful for finding missing elements or filtering out known values.These operations typically run in for intersection and for union and difference. They are especially handy in problems involving multiple arrays or groups where you need to compare membership across collections.
Both data structures give you fast lookup, but they have different tradeoffs:
In most interview problems involving pure existence checks or deduplication, a hash set is the right choice. Reach for sorted structures when the problem requires ordering, closest-value queries, or range-based operations.
Some problems ask whether two sequences have the same structure. That is, whether you can map each distinct element of one sequence to a distinct element of the other while preserving order. The core technique is to establish a bijection using two hash maps: one mapping from sequence A to sequence B, and one from B to A. If at any point a mapping contradicts an earlier one, the sequences are not isomorphic.
This pattern covers problems like Isomorphic Strings (LC 205), Word Pattern (LC 290), and Find and Replace Pattern (LC 890). In each case, the question reduces to: do the two sequences share the same "shape"?
Given s = "egg" and t = "add":
e -> a and a -> e. No conflicts, record both mappings.g -> d and d -> g. No conflicts, record both.g already maps to d, and t[2] is d. Consistent. Return true.Now consider s = "foo" and t = "bar":
f -> b, b -> f. Fine.o -> a, a -> o. Fine.o already maps to a, but t[2] is r. Contradiction. Return false.def isIsomorphic(s, t):
map_s_to_t = {}
map_t_to_s = {}
for c1, c2 in zip(s, t):
if c1 in map_s_to_t and map_s_to_t[c1] != c2:
return False
if c2 in map_t_to_s and map_t_to_s[c2] != c1:
return False
map_s_to_t[c1] = c2
map_t_to_s[c2] = c1
return TrueTwo maps are necessary because a single map only enforces one direction of the mapping. Without the reverse map, s = "ab" and t = "aa" would incorrectly pass since both a -> a and b -> a are individually fine, but the mapping is not a bijection.
When a problem asks you to find common elements or exclusive elements between two arrays, sets give you the cleanest and most efficient solution. Convert each array to a set, then apply set intersection (&) or difference (-).
This pattern applies to Intersection of Two Arrays (LC 349), Intersection of Two Arrays II (LC 350), and Find the Difference of Two Arrays (LC 2215). For LC 349, you need unique common elements. For LC 350, you need common elements with multiplicity (requiring a counter/map instead of a plain set). For LC 2215, you need elements exclusive to each array.
def intersection(nums1, nums2):
return list(set(nums1) & set(nums2))That is the entire solution. Converting to sets removes duplicates and gives membership checks. The intersection operator walks the smaller set and checks each element against the larger one, running in .
For LC 350 where duplicates matter, use a hash map (counter) instead:
from collections import Counter
def intersect(nums1, nums2):
counts = Counter(nums1)
result = []
for num in nums2:
if counts[num] > 0:
result.append(num)
counts[num] -= 1
return resultThe basic Contains Duplicate check asks "does any value repeat?" The follow-up problems add constraints that require more nuanced data structures.
Now you must determine if there are two distinct indices i and j such that nums[i] == nums[j] and abs(i - j) <= k. The key idea: maintain a sliding window set of size at most k. As you advance through the array, add the current element to the set and remove the element that just left the window.
def containsNearbyDuplicate(nums, k):
window = set()
for i, num in enumerate(nums):
if num in window:
return True
window.add(num)
if len(window) > k:
window.remove(nums[i - k])
return FalseThe set never holds more than k + 1 elements, so each operation is and the total time is .
This variant asks if there exist indices i and j with abs(i - j) <= k and abs(nums[i] - nums[j]) <= t. A plain set cannot answer range queries efficiently. Two approaches work:
t. This gives time.t + 1. Two numbers within range t must fall in the same bucket or adjacent buckets. This gives time by maintaining a hash map of bucket ID to value over the sliding window.The bucket approach is elegant because it converts a range query into a constant number of hash map lookups.
Some problems generate a sequence of states and ask you to detect when the sequence starts repeating. The pattern is straightforward: serialize each state, store it in a set, and check membership before adding. Once you see a repeated state, you have found the cycle.
This applies to Prison Cells After N Days (LC 957) and Fraction to Recurring Decimal (LC 166), among others. The cycle length lets you skip ahead using modular arithmetic instead of simulating every step.
When computing long division of, say, 1/6, you track the remainder at each step. If the remainder repeats, the decimal repeats from that point:
Result: 0.1(6).
def fractionToDecimal(numerator, denominator):
if numerator % denominator == 0:
return str(numerator // denominator)
sign = "-" if (numerator < 0) ^ (denominator < 0) else ""
numerator, denominator = abs(numerator), abs(denominator)
integer_part = numerator // denominator
remainder = numerator % denominator
decimal = []
remainder_map = {}
while remainder and remainder not in remainder_map:
remainder_map[remainder] = len(decimal)
remainder *= 10
decimal.append(str(remainder // denominator))
remainder %= denominator
if remainder:
idx = remainder_map[remainder]
decimal.insert(idx, "(")
decimal.append(")")
return f"{sign}{integer_part}.{''.join(decimal)}"The hash map records which position in the decimal expansion each remainder first appeared, so you know exactly where the repeating block starts.
When a problem asks you to group, count, or deduplicate strings that are "equivalent" under some transformation, the technique is canonicalization: transform each string into a standard form, then use a set or map keyed by that canonical form.
Unique Morse Code Words (LC 804) asks how many distinct Morse representations exist for a list of words. Each word maps to a Morse string; the canonical form is simply the concatenated Morse codes:
def uniqueMorseRepresentations(words):
morse = [".-","-...","-.-.","-..",".","..-.","--.","....",
"..",".---","-.-",".-..","--","-.","---",".--.",
"--.-",".-.","...","-","..-","...-",".--","-..-",
"-.--","--.."]
return len(set(
"".join(morse[ord(c) - ord('a')] for c in word)
for word in words
))For Find and Replace Pattern (LC 890), the canonical form is the "shape" of a word: the sequence of first-occurrence indices. For example, "abb" becomes (0, 1, 1) and "mee" also becomes (0, 1, 1), so they match. This is essentially the same bijection idea from the isomorphic strings pattern, but expressed as a single canonical key instead of a pair of maps:
def findAndReplacePattern(words, pattern):
def canonicalize(word):
mapping = {}
return tuple(mapping.setdefault(c, len(mapping)) for c in word)
target = canonicalize(pattern)
return [w for w in words if canonicalize(w) == target]For all the problems covered here:
The hash set is one of the simplest data structures, but its impact on algorithm design is enormous. Whenever a brute-force solution involves searching for elements inside a loop, a set or map can almost always eliminate that inner search and drop the complexity by a factor of n.
You're probably looking at hash set lookup / deduplication when:
Common templates:
x when x - 1 is absent. Example: Longest Consecutive Sequence.Practice Problems (108)