Hash Map / Set

General Lookup / Deduplication

Use a hash set for O(1) existence checks. Detect duplicates, find the first missing element, validate uniqueness constraints, or mark visited nodes without sorting.

Hash Set Lookup and Deduplication

Contains Duplicate: The Simplest Hash Set Problem

LeetCode 217 gives you an array of integers and asks whether any value appears at least twice. This is the most direct application of a hash set: as you iterate through the array, check if you have seen the current element before. If yes, return true. If not, add it to the set and keep going.

1

2

3

4

5

6

7

def containsDuplicate(nums):
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

This is almost trivially simple, but it illustrates a principle that powers dozens of harder problems.

Why $O(1)$ Lookup Changes Everything

The reason hash sets matter so much in algorithm design comes down to one fact: checking whether an element exists in a hash set takes $O(1)$ average time. Compare this to the alternatives:

Searching an unsorted list: $O(n)$ per lookup. If you do this for every element, your total time is $O(n^2)$ .
Sorting first, then using binary search: $O(n \log n)$ for the sort plus $O(\log n)$ per lookup. Better, but still not linear.
Hash set: $O(n)$ total. One pass, one $O(1)$ check per element.

This is why so many interview problems that look like they require nested loops can be solved in a single pass with the right data structure. The moment you find yourself writing a nested loop where the inner loop is just searching for something, ask yourself: can I replace that inner loop with a set or map lookup?

Longest Consecutive Sequence

LeetCode 128 asks you to find the length of the longest sequence of consecutive integers in an unsorted array. For example, given [100, 4, 200, 1, 3, 2], the longest consecutive sequence is [1, 2, 3, 4], so the answer is 4.

A sorting-based approach would work in $O(n \log n)$ , but the problem specifically asks for $O(n)$ . The key insight is both simple and clever: only start counting from the beginning of a sequence.

The Insight: Sequence Starts

Put all numbers into a set. Then, for each number, check if num - 1 exists in the set. If it does, this number is not the start of a sequence, so skip it. If num - 1 does not exist, then this number is the start of a consecutive sequence, and you count forward: check for num + 1, num + 2, and so on.

Walkthrough: [100, 4, 200, 1, 3, 2]

First, build the set: {100, 4, 200, 1, 3, 2}.

num = 100: Is 99 in the set? No, so 100 is a sequence start. Check 101, not in set. Sequence length: 1.
num = 4: Is 3 in the set? Yes, so 4 is not a sequence start. Skip.
num = 200: Is 199 in the set? No, so 200 is a sequence start. Check 201, not in set. Sequence length: 1.
num = 1: Is 0 in the set? No, so 1 is a sequence start. Check 2, yes. Check 3, yes. Check 4, yes. Check 5, no. Sequence length: 4.
num = 3: Is 2 in the set? Yes. Skip.
num = 2: Is 1 in the set? Yes. Skip.

Longest sequence: 4.

Code

def longestConsecutive(nums):
    num_set = set(nums)
    best = 0
    for num in num_set:
        if num - 1 not in num_set:
            current = num
            length = 1
            while current + 1 in num_set:
                current += 1
                length += 1
            best = max(best, length)
    return best

Why This Is $O(n)$ Despite the Nested While Loop

At first glance, the while loop inside the for loop looks like it could be $O(n^2)$ . But look more carefully: the while loop only runs when a number is a sequence start, and each number in the array is visited by the while loop at most once across the entire execution. If a number is part of a sequence of length 5, it is visited once as part of that expansion, never again. Every element is touched at most twice total (once by the for loop, once by a while loop), so the overall time is $O(n)$ .

Happy Number

Consider LeetCode 202: a happy number is defined by the following process: starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat until the number either equals 1 (in which case it is happy) or loops endlessly in a cycle (in which case it is not). For example, 19 is happy: $1^2 + 9^2 = 82$ , $8^2 + 2^2 = 68$ , $6^2 + 8^2 = 100$ , $1^2 + 0^2 + 0^2 = 1$ .

The hash set approach is straightforward: keep a set of numbers you have already seen. At each step, compute the next number in the sequence. If it is 1, return true. If it is already in the set, you have entered a cycle, so return false. Otherwise, add it to the set and continue.

def isHappy(n):
    seen = set()
    while n != 1:
        if n in seen:
            return False
        seen.add(n)
        n = sum(int(d) ** 2 for d in str(n))
    return True

An alternative approach uses Floyd's cycle detection (fast/slow pointers) to detect the cycle without extra space, but the set-based solution is simpler and perfectly efficient for this problem since the numbers converge to a small range quickly.

Set Operations: Intersection, Union, Difference

Beyond simple membership testing, sets support powerful bulk operations that are useful in specific problem types:

Intersection (a & b): elements in both sets. Useful for finding common elements between two collections, such as common characters or shared values.
Union (a | b): elements in either set. Useful for merging collections while automatically removing duplicates.
Difference (a - b): elements in a but not in b. Useful for finding missing elements or filtering out known values.

These operations typically run in $O(\min(\text{len}(a), \text{len}(b)))$ for intersection and $O(\text{len}(a) + \text{len}(b))$ for union and difference. They are especially handy in problems involving multiple arrays or groups where you need to compare membership across collections.

Hash Set vs. Sorted Array + Binary Search

Both data structures give you fast lookup, but they have different tradeoffs:

Hash set: $O(1)$ average lookup, $O(n)$ average insertion, $O(n)$ space. Unordered: you cannot iterate in sorted order or find the nearest element efficiently. Best when you only need existence checks.
Sorted array + binary search: $O(\log n)$ lookup, $O(n)$ insertion (due to shifting), $O(n)$ space. Ordered: you can find predecessors, successors, and ranges. Best when you need ordering or range queries.

In most interview problems involving pure existence checks or deduplication, a hash set is the right choice. Reach for sorted structures when the problem requires ordering, closest-value queries, or range-based operations.

Isomorphic Strings and Pattern Matching

Some problems ask whether two sequences have the same structure. That is, whether you can map each distinct element of one sequence to a distinct element of the other while preserving order. The core technique is to establish a bijection using two hash maps: one mapping from sequence A to sequence B, and one from B to A. If at any point a mapping contradicts an earlier one, the sequences are not isomorphic.

This pattern covers problems like Isomorphic Strings (LC 205), Word Pattern (LC 290), and Find and Replace Pattern (LC 890). In each case, the question reduces to: do the two sequences share the same "shape"?

Walkthrough: Isomorphic Strings

Given s = "egg" and t = "add":

Index 0: e -> a and a -> e. No conflicts, record both mappings.
Index 1: g -> d and d -> g. No conflicts, record both.
Index 2: g already maps to d, and t[2] is d. Consistent. Return true.

Now consider s = "foo" and t = "bar":

Index 0: f -> b, b -> f. Fine.
Index 1: o -> a, a -> o. Fine.
Index 2: o already maps to a, but t[2] is r. Contradiction. Return false.

Code

def isIsomorphic(s, t):
    map_s_to_t = {}
    map_t_to_s = {}
    for c1, c2 in zip(s, t):
        if c1 in map_s_to_t and map_s_to_t[c1] != c2:
            return False
        if c2 in map_t_to_s and map_t_to_s[c2] != c1:
            return False
        map_s_to_t[c1] = c2
        map_t_to_s[c2] = c1
    return True

Two maps are necessary because a single map only enforces one direction of the mapping. Without the reverse map, s = "ab" and t = "aa" would incorrectly pass since both a -> a and b -> a are individually fine, but the mapping is not a bijection.

Intersection and Difference of Collections

When a problem asks you to find common elements or exclusive elements between two arrays, sets give you the cleanest and most efficient solution. Convert each array to a set, then apply set intersection (&) or difference (-).

This pattern applies to Intersection of Two Arrays (LC 349), Intersection of Two Arrays II (LC 350), and Find the Difference of Two Arrays (LC 2215). For LC 349, you need unique common elements. For LC 350, you need common elements with multiplicity (requiring a counter/map instead of a plain set). For LC 2215, you need elements exclusive to each array.

Code: Intersection of Two Arrays

1

2

def intersection(nums1, nums2):
    return list(set(nums1) & set(nums2))

That is the entire solution. Converting to sets removes duplicates and gives $O(1)$ membership checks. The intersection operator walks the smaller set and checks each element against the larger one, running in $O(\min(m, n))$ .

For LC 350 where duplicates matter, use a hash map (counter) instead:

from collections import Counter

def intersect(nums1, nums2):
    counts = Counter(nums1)
    result = []
    for num in nums2:
        if counts[num] > 0:
            result.append(num)
            counts[num] -= 1
    return result

Contains Duplicate Variants

The basic Contains Duplicate check asks "does any value repeat?" The follow-up problems add constraints that require more nuanced data structures.

Contains Duplicate II (LC 219): Within Distance k

Now you must determine if there are two distinct indices i and j such that nums[i] == nums[j] and abs(i - j) <= k. The key idea: maintain a sliding window set of size at most k. As you advance through the array, add the current element to the set and remove the element that just left the window.

def containsNearbyDuplicate(nums, k):
    window = set()
    for i, num in enumerate(nums):
        if num in window:
            return True
        window.add(num)
        if len(window) > k:
            window.remove(nums[i - k])
    return False

The set never holds more than k + 1 elements, so each operation is $O(1)$ and the total time is $O(n)$ .

Contains Duplicate III (LC 220): Within Distance k and Value Range t

This variant asks if there exist indices i and j with abs(i - j) <= k and abs(nums[i] - nums[j]) <= t. A plain set cannot answer range queries efficiently. Two approaches work:

Sorted set (balanced BST): Maintain a sorted container of the window. For each new element, query for the closest value and check if it is within t. This gives $O(n \log k)$ time.
Bucket sort: Divide numbers into buckets of size t + 1. Two numbers within range t must fall in the same bucket or adjacent buckets. This gives $O(n)$ time by maintaining a hash map of bucket ID to value over the sliding window.

The bucket approach is elegant because it converts a range query into a constant number of hash map lookups.

Cycle Detection via Hash Set

Some problems generate a sequence of states and ask you to detect when the sequence starts repeating. The pattern is straightforward: serialize each state, store it in a set, and check membership before adding. Once you see a repeated state, you have found the cycle.

This applies to Prison Cells After N Days (LC 957) and Fraction to Recurring Decimal (LC 166), among others. The cycle length lets you skip ahead using modular arithmetic instead of simulating every step.

Walkthrough: Fraction to Recurring Decimal

When computing long division of, say, 1/6, you track the remainder at each step. If the remainder repeats, the decimal repeats from that point:

1 / 6 = 0, remainder 1. Seen remainders: \{\}.
10 / 6 = 1, remainder 4. Seen: \{1\}.
40 / 6 = 6, remainder 4. We have seen remainder 4 before, so the digit sequence from that point repeats.

Result: 0.1(6).

def fractionToDecimal(numerator, denominator):
    if numerator % denominator == 0:
        return str(numerator // denominator)
    sign = "-" if (numerator < 0) ^ (denominator < 0) else ""
    numerator, denominator = abs(numerator), abs(denominator)
    integer_part = numerator // denominator
    remainder = numerator % denominator
    decimal = []
    remainder_map = {}
    while remainder and remainder not in remainder_map:
        remainder_map[remainder] = len(decimal)
        remainder *= 10
        decimal.append(str(remainder // denominator))
        remainder %= denominator
    if remainder:
        idx = remainder_map[remainder]
        decimal.insert(idx, "(")
        decimal.append(")")
    return f"{sign}{integer_part}.{''.join(decimal)}"

The hash map records which position in the decimal expansion each remainder first appeared, so you know exactly where the repeating block starts.

String Canonicalization

When a problem asks you to group, count, or deduplicate strings that are "equivalent" under some transformation, the technique is canonicalization: transform each string into a standard form, then use a set or map keyed by that canonical form.

Unique Morse Code Words (LC 804) asks how many distinct Morse representations exist for a list of words. Each word maps to a Morse string; the canonical form is simply the concatenated Morse codes:

def uniqueMorseRepresentations(words):
    morse = [".-","-...","-.-.","-..",".","..-.","--.","....",
             "..",".---","-.-",".-..","--","-.","---",".--.",
             "--.-",".-.","...","-","..-","...-",".--","-..-",
             "-.--","--.."]
    return len(set(
        "".join(morse[ord(c) - ord('a')] for c in word)
        for word in words
    ))

For Find and Replace Pattern (LC 890), the canonical form is the "shape" of a word: the sequence of first-occurrence indices. For example, "abb" becomes (0, 1, 1) and "mee" also becomes (0, 1, 1), so they match. This is essentially the same bijection idea from the isomorphic strings pattern, but expressed as a single canonical key instead of a pair of maps:

1

2

3

4

5

6

def findAndReplacePattern(words, pattern):
    def canonicalize(word):
        mapping = {}
        return tuple(mapping.setdefault(c, len(mapping)) for c in word)
    target = canonicalize(pattern)
    return [w for w in words if canonicalize(w) == target]

Complexity Analysis

For all the problems covered here:

Time: $O(n)$ : each element is processed a constant number of times, and each set operation is $O(1)$ amortized.
Space: $O(n)$ : the set can grow up to the size of the input in the worst case.

The hash set is one of the simplest data structures, but its impact on algorithm design is enormous. Whenever a brute-force solution involves searching for elements inside a loop, a set or map can almost always eliminate that inner search and drop the complexity by a factor of n.

Recognizing this pattern

You're probably looking at hash set lookup / deduplication when:

The question is "does X exist," "is there a duplicate," "are all elements unique," or "how many distinct."
The brute force has a nested loop whose inner body is just "is this element in the rest of the array."
You need to detect a cycle in a generated sequence of states (numbers, remainders, board configurations) without caring about cycle length.
Two collections need to be compared for shared/exclusive elements and ordering is irrelevant.
The problem talks about "equivalent under a transformation" (anagrams, isomorphism, same pattern), where the canonical form becomes a set/map key.
You need $O(1)$ membership over an arbitrary value domain where bitmap/array indexing won't fit.

Common templates:

Seen-so-far set: iterate once, return true on a hit, else add. Example: Contains Duplicate.
Sequence-start detection: put values in a set, only expand from x when x - 1 is absent. Example: Longest Consecutive Sequence.
State cycle detection: hash each visited state, stop when a state repeats. Example: Happy Number.
Bijection via two maps: enforce a consistent mapping in both directions. Example: Isomorphic Strings.
Canonical key dedupe: transform each input into a normalized form and dump into a set. Example: Unique Morse Code Words.
Sliding-window set: drop elements as they leave a window of width k. Example: Contains Duplicate II.