← back

Hash Map / Set

General Lookup / Deduplication

Use a hash set for O(1) existence checks. Detect duplicates, find the first missing element, validate uniqueness constraints, or mark visited nodes without sorting.

Hash Set Lookup and Deduplication

Contains Duplicate: The Simplest Hash Set Problem

LeetCode 217 gives you an array of integers and asks whether any value appears at least twice. This is the most direct application of a hash set: as you iterate through the array, check if you have seen the current element before. If yes, return true. If not, add it to the set and keep going.

1
2
3
4
5
6
7
def containsDuplicate(nums):
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

This is almost trivially simple, but it illustrates a principle that powers dozens of harder problems.

Why O(1)O(1) Lookup Changes Everything

The reason hash sets matter so much in algorithm design comes down to one fact: checking whether an element exists in a hash set takes O(1)O(1) average time. Compare this to the alternatives:

  • Searching an unsorted list: O(n)O(n) per lookup. If you do this for every element, your total time is O(n2)O(n^2).
  • Sorting first, then using binary search: O(nlogn)O(n \log n) for the sort plus O(logn)O(\log n) per lookup. Better, but still not linear.
  • Hash set: O(n)O(n) total. One pass, one O(1)O(1) check per element.

This is why so many interview problems that look like they require nested loops can be solved in a single pass with the right data structure. The moment you find yourself writing a nested loop where the inner loop is just searching for something, ask yourself: can I replace that inner loop with a set or map lookup?

Longest Consecutive Sequence

LeetCode 128 asks you to find the length of the longest sequence of consecutive integers in an unsorted array. For example, given [100, 4, 200, 1, 3, 2], the longest consecutive sequence is [1, 2, 3, 4], so the answer is 4.

A sorting-based approach would work in O(nlogn)O(n \log n), but the problem specifically asks for O(n)O(n). The key insight is both simple and clever: only start counting from the beginning of a sequence.

The Insight: Sequence Starts

Put all numbers into a set. Then, for each number, check if num - 1 exists in the set. If it does, this number is not the start of a sequence, so skip it. If num - 1 does not exist, then this number is the start of a consecutive sequence, and you count forward: check for num + 1, num + 2, and so on.

Walkthrough: [100, 4, 200, 1, 3, 2]

First, build the set: {100, 4, 200, 1, 3, 2}.

  • num = 100: Is 99 in the set? No, so 100 is a sequence start. Check 101, not in set. Sequence length: 1.
  • num = 4: Is 3 in the set? Yes, so 4 is not a sequence start. Skip.
  • num = 200: Is 199 in the set? No, so 200 is a sequence start. Check 201, not in set. Sequence length: 1.
  • num = 1: Is 0 in the set? No, so 1 is a sequence start. Check 2, yes. Check 3, yes. Check 4, yes. Check 5, no. Sequence length: 4.
  • num = 3: Is 2 in the set? Yes. Skip.
  • num = 2: Is 1 in the set? Yes. Skip.

Longest sequence: 4.

Code

1
2
3
4
5
6
7
8
9
10
11
12
def longestConsecutive(nums):
    num_set = set(nums)
    best = 0
    for num in num_set:
        if num - 1 not in num_set:
            current = num
            length = 1
            while current + 1 in num_set:
                current += 1
                length += 1
            best = max(best, length)
    return best

Why This Is O(n)O(n) Despite the Nested While Loop

At first glance, the while loop inside the for loop looks like it could be O(n2)O(n^2). But look more carefully: the while loop only runs when a number is a sequence start, and each number in the array is visited by the while loop at most once across the entire execution. If a number is part of a sequence of length 5, it is visited once as part of that expansion, never again. Every element is touched at most twice total (once by the for loop, once by a while loop), so the overall time is O(n)O(n).

Happy Number

Consider LeetCode 202: a happy number is defined by the following process: starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat until the number either equals 1 (in which case it is happy) or loops endlessly in a cycle (in which case it is not). For example, 19 is happy: 12+92=821^2 + 9^2 = 82, 82+22=688^2 + 2^2 = 68, 62+82=1006^2 + 8^2 = 100, 12+02+02=11^2 + 0^2 + 0^2 = 1.

The hash set approach is straightforward: keep a set of numbers you have already seen. At each step, compute the next number in the sequence. If it is 1, return true. If it is already in the set, you have entered a cycle, so return false. Otherwise, add it to the set and continue.

1
2
3
4
5
6
7
8
def isHappy(n):
    seen = set()
    while n != 1:
        if n in seen:
            return False
        seen.add(n)
        n = sum(int(d) ** 2 for d in str(n))
    return True

An alternative approach uses Floyd's cycle detection (fast/slow pointers) to detect the cycle without extra space, but the set-based solution is simpler and perfectly efficient for this problem since the numbers converge to a small range quickly.

Set Operations: Intersection, Union, Difference

Beyond simple membership testing, sets support powerful bulk operations that are useful in specific problem types:

  • Intersection (a & b): elements in both sets. Useful for finding common elements between two collections, such as common characters or shared values.
  • Union (a | b): elements in either set. Useful for merging collections while automatically removing duplicates.
  • Difference (a - b): elements in a but not in b. Useful for finding missing elements or filtering out known values.

These operations typically run in O(min(len(a),len(b)))O(\min(\text{len}(a), \text{len}(b))) for intersection and O(len(a)+len(b))O(\text{len}(a) + \text{len}(b)) for union and difference. They are especially handy in problems involving multiple arrays or groups where you need to compare membership across collections.

Hash Set vs. Sorted Array + Binary Search

Both data structures give you fast lookup, but they have different tradeoffs:

  • Hash set: O(1)O(1) average lookup, O(n)O(n) average insertion, O(n)O(n) space. Unordered: you cannot iterate in sorted order or find the nearest element efficiently. Best when you only need existence checks.
  • Sorted array + binary search: O(logn)O(\log n) lookup, O(n)O(n) insertion (due to shifting), O(n)O(n) space. Ordered: you can find predecessors, successors, and ranges. Best when you need ordering or range queries.

In most interview problems involving pure existence checks or deduplication, a hash set is the right choice. Reach for sorted structures when the problem requires ordering, closest-value queries, or range-based operations.

Isomorphic Strings and Pattern Matching

Some problems ask whether two sequences have the same structure. That is, whether you can map each distinct element of one sequence to a distinct element of the other while preserving order. The core technique is to establish a bijection using two hash maps: one mapping from sequence A to sequence B, and one from B to A. If at any point a mapping contradicts an earlier one, the sequences are not isomorphic.

This pattern covers problems like Isomorphic Strings (LC 205), Word Pattern (LC 290), and Find and Replace Pattern (LC 890). In each case, the question reduces to: do the two sequences share the same "shape"?

Walkthrough: Isomorphic Strings

Given s = "egg" and t = "add":

  • Index 0: e -> a and a -> e. No conflicts, record both mappings.
  • Index 1: g -> d and d -> g. No conflicts, record both.
  • Index 2: g already maps to d, and t[2] is d. Consistent. Return true.

Now consider s = "foo" and t = "bar":

  • Index 0: f -> b, b -> f. Fine.
  • Index 1: o -> a, a -> o. Fine.
  • Index 2: o already maps to a, but t[2] is r. Contradiction. Return false.

Code

1
2
3
4
5
6
7
8
9
10
11
def isIsomorphic(s, t):
    map_s_to_t = {}
    map_t_to_s = {}
    for c1, c2 in zip(s, t):
        if c1 in map_s_to_t and map_s_to_t[c1] != c2:
            return False
        if c2 in map_t_to_s and map_t_to_s[c2] != c1:
            return False
        map_s_to_t[c1] = c2
        map_t_to_s[c2] = c1
    return True

Two maps are necessary because a single map only enforces one direction of the mapping. Without the reverse map, s = "ab" and t = "aa" would incorrectly pass since both a -> a and b -> a are individually fine, but the mapping is not a bijection.

Intersection and Difference of Collections

When a problem asks you to find common elements or exclusive elements between two arrays, sets give you the cleanest and most efficient solution. Convert each array to a set, then apply set intersection (&) or difference (-).

This pattern applies to Intersection of Two Arrays (LC 349), Intersection of Two Arrays II (LC 350), and Find the Difference of Two Arrays (LC 2215). For LC 349, you need unique common elements. For LC 350, you need common elements with multiplicity (requiring a counter/map instead of a plain set). For LC 2215, you need elements exclusive to each array.

Code: Intersection of Two Arrays

1
2
def intersection(nums1, nums2):
    return list(set(nums1) & set(nums2))

That is the entire solution. Converting to sets removes duplicates and gives O(1)O(1) membership checks. The intersection operator walks the smaller set and checks each element against the larger one, running in O(min(m,n))O(\min(m, n)).

For LC 350 where duplicates matter, use a hash map (counter) instead:

1
2
3
4
5
6
7
8
9
10
from collections import Counter

def intersect(nums1, nums2):
    counts = Counter(nums1)
    result = []
    for num in nums2:
        if counts[num] > 0:
            result.append(num)
            counts[num] -= 1
    return result

Contains Duplicate Variants

The basic Contains Duplicate check asks "does any value repeat?" The follow-up problems add constraints that require more nuanced data structures.

Contains Duplicate II (LC 219): Within Distance k

Now you must determine if there are two distinct indices i and j such that nums[i] == nums[j] and abs(i - j) <= k. The key idea: maintain a sliding window set of size at most k. As you advance through the array, add the current element to the set and remove the element that just left the window.

1
2
3
4
5
6
7
8
9
def containsNearbyDuplicate(nums, k):
    window = set()
    for i, num in enumerate(nums):
        if num in window:
            return True
        window.add(num)
        if len(window) > k:
            window.remove(nums[i - k])
    return False

The set never holds more than k + 1 elements, so each operation is O(1)O(1) and the total time is O(n)O(n).

Contains Duplicate III (LC 220): Within Distance k and Value Range t

This variant asks if there exist indices i and j with abs(i - j) <= k and abs(nums[i] - nums[j]) <= t. A plain set cannot answer range queries efficiently. Two approaches work:

  1. Sorted set (balanced BST): Maintain a sorted container of the window. For each new element, query for the closest value and check if it is within t. This gives O(nlogk)O(n \log k) time.
  2. Bucket sort: Divide numbers into buckets of size t + 1. Two numbers within range t must fall in the same bucket or adjacent buckets. This gives O(n)O(n) time by maintaining a hash map of bucket ID to value over the sliding window.

The bucket approach is elegant because it converts a range query into a constant number of hash map lookups.

Cycle Detection via Hash Set

Some problems generate a sequence of states and ask you to detect when the sequence starts repeating. The pattern is straightforward: serialize each state, store it in a set, and check membership before adding. Once you see a repeated state, you have found the cycle.

This applies to Prison Cells After N Days (LC 957) and Fraction to Recurring Decimal (LC 166), among others. The cycle length lets you skip ahead using modular arithmetic instead of simulating every step.

Walkthrough: Fraction to Recurring Decimal

When computing long division of, say, 1/6, you track the remainder at each step. If the remainder repeats, the decimal repeats from that point:

  • 1 / 6 = 0, remainder 1. Seen remainders: \{\}.
  • 10 / 6 = 1, remainder 4. Seen: \{1\}.
  • 40 / 6 = 6, remainder 4. We have seen remainder 4 before, so the digit sequence from that point repeats.

Result: 0.1(6).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def fractionToDecimal(numerator, denominator):
    if numerator % denominator == 0:
        return str(numerator // denominator)
    sign = "-" if (numerator < 0) ^ (denominator < 0) else ""
    numerator, denominator = abs(numerator), abs(denominator)
    integer_part = numerator // denominator
    remainder = numerator % denominator
    decimal = []
    remainder_map = {}
    while remainder and remainder not in remainder_map:
        remainder_map[remainder] = len(decimal)
        remainder *= 10
        decimal.append(str(remainder // denominator))
        remainder %= denominator
    if remainder:
        idx = remainder_map[remainder]
        decimal.insert(idx, "(")
        decimal.append(")")
    return f"{sign}{integer_part}.{''.join(decimal)}"

The hash map records which position in the decimal expansion each remainder first appeared, so you know exactly where the repeating block starts.

String Canonicalization

When a problem asks you to group, count, or deduplicate strings that are "equivalent" under some transformation, the technique is canonicalization: transform each string into a standard form, then use a set or map keyed by that canonical form.

Unique Morse Code Words (LC 804) asks how many distinct Morse representations exist for a list of words. Each word maps to a Morse string; the canonical form is simply the concatenated Morse codes:

1
2
3
4
5
6
7
8
9
def uniqueMorseRepresentations(words):
    morse = [".-","-...","-.-.","-..",".","..-.","--.","....",
             "..",".---","-.-",".-..","--","-.","---",".--.",
             "--.-",".-.","...","-","..-","...-",".--","-..-",
             "-.--","--.."]
    return len(set(
        "".join(morse[ord(c) - ord('a')] for c in word)
        for word in words
    ))

For Find and Replace Pattern (LC 890), the canonical form is the "shape" of a word: the sequence of first-occurrence indices. For example, "abb" becomes (0, 1, 1) and "mee" also becomes (0, 1, 1), so they match. This is essentially the same bijection idea from the isomorphic strings pattern, but expressed as a single canonical key instead of a pair of maps:

1
2
3
4
5
6
def findAndReplacePattern(words, pattern):
    def canonicalize(word):
        mapping = {}
        return tuple(mapping.setdefault(c, len(mapping)) for c in word)
    target = canonicalize(pattern)
    return [w for w in words if canonicalize(w) == target]

Complexity Analysis

For all the problems covered here:

  • Time: O(n)O(n): each element is processed a constant number of times, and each set operation is O(1)O(1) amortized.
  • Space: O(n)O(n): the set can grow up to the size of the input in the worst case.

The hash set is one of the simplest data structures, but its impact on algorithm design is enormous. Whenever a brute-force solution involves searching for elements inside a loop, a set or map can almost always eliminate that inner search and drop the complexity by a factor of n.

Recognizing this pattern

You're probably looking at hash set lookup / deduplication when:

  • The question is "does X exist," "is there a duplicate," "are all elements unique," or "how many distinct."
  • The brute force has a nested loop whose inner body is just "is this element in the rest of the array."
  • You need to detect a cycle in a generated sequence of states (numbers, remainders, board configurations) without caring about cycle length.
  • Two collections need to be compared for shared/exclusive elements and ordering is irrelevant.
  • The problem talks about "equivalent under a transformation" (anagrams, isomorphism, same pattern), where the canonical form becomes a set/map key.
  • You need O(1)O(1) membership over an arbitrary value domain where bitmap/array indexing won't fit.

Common templates:

  • Seen-so-far set: iterate once, return true on a hit, else add. Example: Contains Duplicate.
  • Sequence-start detection: put values in a set, only expand from x when x - 1 is absent. Example: Longest Consecutive Sequence.
  • State cycle detection: hash each visited state, stop when a state repeats. Example: Happy Number.
  • Bijection via two maps: enforce a consistent mapping in both directions. Example: Isomorphic Strings.
  • Canonical key dedupe: transform each input into a normalized form and dump into a set. Example: Unique Morse Code Words.
  • Sliding-window set: drop elements as they leave a window of width k. Example: Contains Duplicate II.

Practice Problems (108)

36 Valid Sudoku
128 Longest Consecutive Sequence
166 Fraction to Recurring Decimal
202 Happy Number
205 Isomorphic Strings
217 Contains Duplicate
219 Contains Duplicate II
220 Contains Duplicate III
290 Word Pattern
349 Intersection of Two Arrays
350 Intersection of Two Arrays II
500 Keyboard Row
535 Encode and Decode TinyURL
599 Minimum Index Sum of Two Lists
609 Find Duplicate File in System
705 Design HashSet
771 Jewels and Stones
804 Unique Morse Code Words
822 Card Flipping Game
859 Buddy Strings
888 Fair Candy Swap
890 Find and Replace Pattern
939 Minimum Area Rectangle
957 Prison Cells After N Days
970 Powerful Integers
1001 Grid Illumination
1002 Find Common Characters
1138 Alphabet Board Path
1207 Unique Number of Occurrences
1346 Check If N and Its Double Exist
1348 Tweet Counts Per Frequency
1357 Apply Discount Every n Orders
1396 Design Underground System
1410 HTML Entity Parser
1418 Display Table of Food Orders in a Restaurant
1436 Destination City
1452 People Whose List of Favorite Companies Is Not a Subset of Another List
1487 Making File Names Unique
1488 Avoid Flood in The City
1496 Path Crossing
1562 Find Latest Group of Size M
1624 Largest Substring Between Two Equal Characters
1640 Check Array Formation Through Concatenation
1684 Count the Number of Consistent Strings
1743 Restore the Array From Adjacent Pairs
1796 Second Largest Digit in a String
1797 Design Authentication Manager
1805 Number of Different Integers in a String
1807 Evaluate the Bracket Pairs of a String
1832 Check if the Sentence Is Pangram
1935 Maximum Number of Words You Can Type
1980 Find Unique Binary String
2133 Check if Every Row and Column Contains All Numbers
2154 Keep Multiplying Found Values by Two
2215 Find the Difference of Two Arrays
2261 K Divisible Elements Subarrays
2295 Replace Elements in an Array
2301 Match Substring After Replacement
2309 Greatest English Letter in Upper and Lower Case
2325 Decode the Message
2349 Design a Number Container System
2351 First Letter to Appear Twice
2352 Equal Row and Column Pairs
2367 Number of Arithmetic Triplets
2395 Find Subarrays With Equal Sum
2397 Maximum Rows Covered by Columns
2399 Check Distances Between Same Letters
2405 Optimal Partition of String
2424 Longest Uploaded Prefix
2441 Largest Positive Integer That Exists With Its Negative
2442 Count Number of Distinct Integers After Reverse Operations
2451 Odd String Difference
2554 Maximum Number of Integers to Choose From a Range I
2564 Substring XOR Queries
2605 Form Smallest Number From Two Digit Arrays
2661 First Completely Painted Row or Column
2670 Find the Distinct Difference Array
2716 Minimize String Length
2744 Find Maximum Number of String Pairs
2766 Relocate Marbles
2956 Find Common Elements Between Two Arrays
2965 Find Missing and Repeated Values
2975 Maximum Square Area by Removing Fences From a Field
2996 Smallest Missing Integer Greater Than Sequential Prefix Sum
3002 Maximum Size of a Set After Removals
3083 Existence of a Substring in a String and Its Reverse
3120 Count the Number of Special Characters I
3121 Count the Number of Special Characters II
3146 Permutation Difference between Two Strings
3159 Find Occurrences of an Element in an Array
3160 Find the Number of Distinct Colors Among the Balls
3217 Delete Nodes From Linked List Present in Array
3295 Report Spam Message
3375 Minimum Operations to Make Array Values Equal to K
3396 Minimum Number of Operations to Make Elements in Array Distinct
3447 Assign Elements to Groups with Constraints
3471 Find the Largest Almost Missing Integer
3484 Design Spreadsheet
3508 Implement Router
3606 Coupon Code Validator
3668 Restore Finishing Order
3678 Smallest Absent Positive Greater Than Average
3718 Smallest Missing Multiple of K
3731 Find Missing Elements
3740 Minimum Distance Between Three Equal Elements I
3741 Minimum Distance Between Three Equal Elements II
3761 Minimum Absolute Distance Between Mirror Pairs
3779 Minimum Number of Operations to Have Distinct Elements