Hash Map / Set

Frequency Counting

Build a frequency map (Counter) to count occurrences of each element. Used for anagram detection, character counts, majority element, finding duplicates, and grouping by frequency.

You are given a list of strings and asked to group together all words that are anagrams of each other. "eat", "tea", and "ate" go in one group; "tan" and "nat" in another; "bat" by itself. This is LeetCode 49, Group Anagrams, and it is a perfect entry point into one of the most broadly useful patterns in algorithm problems: frequency counting.

The grouping problem and canonical forms

The challenge with grouping anagrams is defining what makes two words "the same." You cannot compare them directly, since "eat" and "tea" are different strings. But they share a property: they contain exactly the same characters in exactly the same quantities. If you could reduce each word to a canonical form that captures this property, then all anagrams would map to the same key, and you could group them with a hash map.

There are two natural canonical forms for anagrams. The first is the sorted string: sort the characters of each word alphabetically, and "eat", "tea", and "ate" all become "aet". The second is a character count tuple: count the frequency of each letter and use that count vector as the key. For "eat", the count tuple would be (0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0): one 'e', one 'a', one 't', zeros elsewhere.

Sorted strings are simpler to implement. Character count tuples avoid the $O(k \log k)$ sorting cost per word (where $k$ is the word length) and instead run in $O(k)$ , but they produce longer keys. In practice, since words are typically short, sorted strings are the most common choice.

Code for Group Anagrams

from collections import defaultdict

def groupAnagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        key = tuple(sorted(s))
        groups[key].append(s)
    return list(groups.values())

For each string, sort its characters to produce a canonical key, then append the original string to the list stored at that key. At the end, return all the lists. The time complexity is $O(n \cdot k \log k)$ where $n$ is the number of strings and $k$ is the maximum string length. Space is $O(n \cdot k)$ to store all the strings in the hash map.

The fundamental building block: counting frequencies

At the heart of Group Anagrams is a more primitive operation: counting how often each element appears. Python's collections.Counter makes this trivial, but the concept is language-independent. You iterate through a collection, and for each element, increment its count in a hash map.

1

2

3

4

5

6

from collections import Counter

def count_frequencies(arr):
    freq = Counter(arr)
    # freq is a dict-like object: {element: count, ...}
    return freq

This $O(n)$ operation is the starting point for a surprising number of problems. Once you have a frequency map, you can answer questions like: What is the most common element? Are two collections permutations of each other? How many elements appear exactly k times? Can you make all frequencies unique by deleting characters?

Top K Frequent Elements

Consider LeetCode 347, Top K Frequent Elements: given an array and an integer k, return the k most frequent elements. For example, given nums = [1,1,1,2,2,3] and k = 2, the answer is [1, 2] because 1 appears three times and 2 appears twice.

The most intuitive approach is to count frequencies and then sort by frequency. That works but costs $O(n \log n)$ . You can do better with a heap: push all (frequency, element) pairs into a min-heap of size k, which gives $O(n \log k)$ . But the slickest approach is bucket sort, which runs in $O(n)$ .

The bucket sort idea is this: the maximum possible frequency of any element is n (the array length). Create an array of n+1 buckets, where bucket i holds all elements that appear exactly i times. Then walk the buckets from right to left (highest frequency first) and collect elements until you have k of them.

Walked example: bucket sort for Top K

Given nums = [1,1,1,2,2,3] and k = 2:

Step 1, count frequencies: {1: 3, 2: 2, 3: 1}.

Step 2, build buckets. The array length is 6, so create 7 buckets (indices 0 through 6). Bucket 3 gets [1] (because 1 appears 3 times). Bucket 2 gets [2]. Bucket 1 gets [3]. All other buckets are empty.

Step 3, collect from right to left. Start at bucket 6 (empty), bucket 5 (empty), bucket 4 (empty), bucket 3 (contains [1]), and add 1 to result. Bucket 2 (contains [2]) adds 2 to result. We now have k = 2 elements: [1, 2]. Done.

Code for Top K Frequent Elements

from collections import Counter

def topKFrequent(nums, k):
    freq = Counter(nums)
    buckets = [[] for _ in range(len(nums) + 1)]
    for num, count in freq.items():
        buckets[count].append(num)

    result = []
    for i in range(len(buckets) - 1, -1, -1):
        for num in buckets[i]:
            result.append(num)
            if len(result) == k:
                return result
    return result

This is $O(n)$ time and $O(n)$ space. The bucket array has at most n+1 entries, and each element is placed in exactly one bucket.

Valid Anagram

The simplest frequency comparison problem is LeetCode 242, Valid Anagram. Given two strings s and t, determine if t is an anagram of s. The approach is direct: count the character frequencies of both strings and compare the two frequency maps. If they are identical, the strings are anagrams.

1

2

3

4

from collections import Counter

def isAnagram(s, t):
    return Counter(s) == Counter(t)

That is the entire solution. Under the hood, this is $O(n)$ time where $n$ is the combined length of the strings, and $O(1)$ space if the character set is fixed (26 lowercase letters). You could also do this by sorting both strings and comparing, but that costs $O(n \log n)$ , which is unnecessary when frequency counting gives you a linear solution.

The same frequency comparison idea applies to LeetCode 383, Ransom Note. Given a ransom note string and a magazine string, determine if the ransom note can be constructed from the letters in the magazine. This is a one-sided anagram check: every character in the ransom note must appear in the magazine at least as many times.

from collections import Counter

def canConstruct(ransomNote, magazine):
    mag_freq = Counter(magazine)
    for ch in ransomNote:
        if mag_freq[ch] <= 0:
            return False
        mag_freq[ch] -= 1
    return True

A natural extension is LeetCode 1347, Minimum Number of Steps to Make Two Strings Anagram. Given two strings of equal length, count how many characters in t must be replaced so that t becomes an anagram of s. Count frequencies for both, then sum the positive differences where s has more of a character than t.

1

2

3

4

5

6

from collections import Counter

def minSteps(s, t):
    freq_s = Counter(s)
    freq_t = Counter(t)
    return sum(max(0, freq_s[ch] - freq_t[ch]) for ch in freq_s)

Majority Element and Boyer-Moore voting

For LeetCode 169, Majority Element, the task is to find the element that appears more than $n/2$ times. You could solve it with a frequency map: count everything, then find the element with count greater than $n/2$ . That is $O(n)$ time and $O(n)$ space.

But when you only need the winner and not the full frequency distribution, you can do better on space. The Boyer-Moore voting algorithm finds the majority element in $O(n)$ time and $O(1)$ space. The idea is to maintain a candidate and a count. Walk through the array: if the count is zero, set the current element as the candidate. If the current element equals the candidate, increment the count; otherwise, decrement it. At the end, the candidate is the majority element (assuming one exists).

The intuition is that the majority element can "survive" being canceled out by all other elements combined, because it appears more than half the time. For a deeper treatment of Boyer-Moore voting, see the dedicated article on that strategy.

Frequency of frequencies

Some problems operate not on the raw frequency map but on the distribution of frequencies. A good illustration is LeetCode 1647, Minimum Deletions to Make Character Frequencies Unique. You are given a string and need to delete the minimum number of characters so that no two distinct characters have the same frequency.

The approach is to build a frequency map, then look at the set of frequency values. If two characters share the same frequency, you must reduce one of them (by deleting characters from the string) until its frequency is unique. A greedy strategy works: sort the frequencies in descending order, and for each frequency, if it is already taken, reduce it to the largest available value below it.

from collections import Counter

def minDeletions(s):
    freq = Counter(s)
    counts = sorted(freq.values(), reverse=True)
    deletions = 0
    used = set()
    for count in counts:
        while count > 0 and count in used:
            count -= 1
            deletions += 1
        used.add(count)
    return deletions

This pattern, computing statistics over the frequency map itself, shows up in several problems. Whenever you see a problem that talks about "making frequencies unique" or "characters appearing the same number of times," think about building the frequency map first and then reasoning about the distribution of its values.

A simpler variant is LeetCode 1207, Unique Number of Occurrences, where you determine whether the number of occurrences of each value in an array is unique. Count frequencies, then check if the set of frequency values has the same size as the list of frequency values.

1

2

3

4

5

6

from collections import Counter

def uniqueOccurrences(arr):
    freq = Counter(arr)
    counts = freq.values()
    return len(counts) == len(set(counts))

Even more direct is LeetCode 1941, Check if All Characters Have Equal Number of Occurrences. Count frequencies and verify they are all the same.

1

2

3

4

5

from collections import Counter

def areOccurrencesEqual(s):
    freq = Counter(s)
    return len(set(freq.values())) == 1

First unique and k-th distinct

Some problems ask you to find elements that appear exactly once. The strategy is always the same two-pass approach: count frequencies first, then scan in the original order to find the element you need.

Take LeetCode 387, First Unique Character in a String, which asks for the index of the first character that does not repeat. Count all character frequencies, then walk the string from left to right and return the index of the first character with count 1.

from collections import Counter

def firstUniqChar(s):
    freq = Counter(s)
    for i, ch in enumerate(s):
        if freq[ch] == 1:
            return i
    return -1

This idea generalizes in LeetCode 2053, Kth Distinct String in an Array. An element is distinct if it appears exactly once. Count frequencies, then walk the array and collect distinct elements in order. Return the k-th one.

from collections import Counter

def kthDistinct(arr, k):
    freq = Counter(arr)
    for s in arr:
        if freq[s] == 1:
            k -= 1
            if k == 0:
                return s
    return ""

Reconstruct from frequencies

A more creative use of frequency counting is reconstruction: given a scrambled or encoded input, use character frequencies to deduce what elements must be present. The key insight is that certain characters uniquely identify certain elements, so you can peel them off in the right order.

With LeetCode 423, Reconstruct Original Digits from English, you are given a string containing the English words for digits 0 through 9 in jumbled order. You must recover which digits are present. The trick is that some letters appear in only one digit word: 'z' only appears in "zero", 'w' only in "two", 'u' only in "four", 'x' only in "six", 'g' only in "eight". Count those first, subtract their letters, then handle the remaining digits whose letters are now unique.

from collections import Counter

def originalDigits(s):
    freq = Counter(s)
    count = [0] * 10
    # Unique-letter digits first
    count[0] = freq['z']
    count[2] = freq['w']
    count[4] = freq['u']
    count[6] = freq['x']
    count[8] = freq['g']
    # Then digits identifiable after subtraction
    count[1] = freq['o'] - count[0] - count[2] - count[4]
    count[3] = freq['h'] - count[8]
    count[5] = freq['f'] - count[4]
    count[7] = freq['s'] - count[6]
    count[9] = freq['i'] - count[5] - count[6] - count[8]
    return ''.join(str(d) * count[d] for d in range(10))

The same idea applies to LeetCode 1419, Minimum Number of Frogs Croaking, but with a sequence of characters. Each frog says "croak" in order, and multiple frogs can overlap. Track the frequency of each stage character (c, r, o, a, k) as a pipeline: when you see a character, the previous stage count decreases and the current stage count increases. The maximum number of frogs active simultaneously is the answer.

def minNumberOfFrogs(croakOfFrogs):
    stages = {'c': 0, 'r': 1, 'o': 2, 'a': 3, 'k': 4}
    count = [0] * 5
    frogs = 0
    max_frogs = 0
    for ch in croakOfFrogs:
        idx = stages[ch]
        if idx == 0:
            count[0] += 1
            frogs += 1
            max_frogs = max(max_frogs, frogs)
        else:
            if count[idx - 1] == 0:
                return -1
            count[idx - 1] -= 1
            count[idx] += 1
            if idx == 4:
                count[4] -= 1
                frogs -= 1
    return max_frogs if all(c == 0 for c in count) else -1

Sliding window with frequency map

When a problem asks you to find all substrings that are anagrams of a pattern, or to check whether any permutation of a pattern exists as a substring, you combine frequency counting with a sliding window. Maintain a frequency map over a window of size equal to the pattern, and compare it against the pattern's frequency map as you slide.

A classic application is LeetCode 438, Find All Anagrams in a String, which asks for all start indices where an anagram of string p begins in string s. The approach: count the frequencies of p, then slide a window of length len(p) across s. As you add a character on the right and remove one on the left, update a running frequency map. When the running map matches the target, record the index.

from collections import Counter

def findAnagrams(s, p):
    if len(p) > len(s):
        return []
    p_freq = Counter(p)
    window = Counter(s[:len(p)])
    result = []
    if window == p_freq:
        result.append(0)
    for i in range(len(p), len(s)):
        window[s[i]] += 1
        left = s[i - len(p)]
        window[left] -= 1
        if window[left] == 0:
            del window[left]
        if window == p_freq:
            result.append(i - len(p) + 1)
    return result

Its close relative, LeetCode 567, Permutation in String, is the boolean version of the same problem: return True if any permutation of s1 is a substring of s2. The code is nearly identical. You just return True the first time the window matches instead of collecting all indices.

from collections import Counter

def checkInclusion(s1, s2):
    if len(s1) > len(s2):
        return False
    target = Counter(s1)
    window = Counter(s2[:len(s1)])
    if window == target:
        return True
    for i in range(len(s1), len(s2)):
        window[s2[i]] += 1
        left = s2[i - len(s1)]
        window[left] -= 1
        if window[left] == 0:
            del window[left]
        if window == target:
            return True
    return False

An optimization for both problems is to track a "matches" counter, the number of characters whose window count equals the target count, instead of comparing full dictionaries each time. This brings the per-step work from $O(26)$ to $O(1)$ .

Complexity analysis

Frequency counting with a hash map is always $O(n)$ time and $O(n)$ space, where $n$ is the number of elements. The space can be considered $O(1)$ when the domain of elements is fixed and small (like 26 lowercase letters).

For anagram grouping, the time depends on how you compute canonical keys. Sorted keys give $O(n \cdot k \log k)$ , and character count keys give $O(n \cdot k)$ , where $k$ is the maximum element length. For Top K Frequent with bucket sort, the time is $O(n)$ . For problems that involve sorting or heaping the frequency values, the time is $O(n + d \log d)$ where $d$ is the number of distinct elements.

The versatility of frequency counting makes it one of the first tools to reach for when a problem involves counting, grouping, or comparing elements. Build the frequency map, and then ask: what question am I trying to answer about these frequencies?

Recognizing this pattern

You're probably looking at frequency counting when:

The problem mentions "anagram," "permutation of," "rearrangement," or "same characters." These are all questions about multisets, which frequency maps describe exactly.
You need to group elements by an equivalence relation (anagrams, same shape, same multiset), and count vectors or sorted keys give you the canonical form.
The question is "most frequent," "kth most frequent," "first non-repeating," or "appears exactly once."
The element domain is small and fixed (26 lowercase letters, digits, byte values), making counting effectively $O(1)$ space.
A naive solution sorts the whole input just to compare or group, and frequency counts replace the sort with a single linear pass.

Common templates:

Canonical form group-by: map each element to a sorted/count-vector key and bucket into a hash map. Example: Group Anagrams.
Frequency equality check: build counters for two collections and compare directly. Example: Valid Anagram.
Bucket sort by count: index elements by their frequency and walk buckets from high to low. Example: Top K Frequent Elements.
Two-pass scan with frequencies: count first, then re-scan in original order to find the first/kth element matching a frequency condition. Example: First Unique Character in a String.
Sliding window with frequency map: maintain counts over a moving window and compare against a target counter. Example: Find All Anagrams in a String.