String Algorithms
Compute a hash for the first window and roll it across the string by adding the new character and removing the old one in O(1) per step. Used for repeated substring detection and multiple-pattern matching.
String matching sounds simple: slide a window of length m across a text of length n and check each position. The problem is that each character-by-character comparison costs , pushing the total to . The Rabin-Karp algorithm escapes this trap by replacing direct comparison with hash comparison. Two strings can only be equal if their hashes match, and a well-chosen hash can be updated in as the window slides, bringing the expected runtime down to .
The hash of a string is computed by treating each character as a digit in a base-B number system. For a string s of length m:
hash = s[0]*B^(m-1) + s[1]*B^(m-2) + ... + s[m-1]*B^0All arithmetic is done modulo a large prime M to keep values bounded. A common setup is B = 31 (or 26 for lowercase letters) and . Converting characters to integers typically subtracts ord('a') - 1 so that 'a' maps to 1, avoiding zero-valued characters that would collapse distinct strings to the same hash.
The key insight is that when the window advances one position to the right, you only need to remove the contribution of the leftmost character and fold in the new rightmost character:
new_hash = (old_hash - s[i] * B^(m-1)) * B + s[i + m]The term B^(m-1) is a constant that can be precomputed once. Every slide is a fixed-cost arithmetic operation rather than a fresh traversal of m characters.
Suppose the text is "abcabc" and the pattern is "cab". Let B = 31, .
First, compute the pattern hash and the hash of the first window "abc". Then slide:
"abc": hash doesn't match pattern hash "cab", skip.'a', add 'a' → window "bca", still no match.'b', add 'b' → window "cab", hash matches. Verify characters: "cab" == "cab". Found at index 2.The verification step on hash match is cheap (one comparison of length-m strings) and guards against false positives.
def rabin_karp(text: str, pattern: str) -> int:
base, mod = 31, 10**9 + 7
m, n = len(pattern), len(text)
if m > n:
return -1
# Precompute base^(m-1) mod M
power = pow(base, m - 1, mod)
# Build initial hashes
p_hash = t_hash = 0
for i in range(m):
p_hash = (p_hash * base + ord(pattern[i])) % mod
t_hash = (t_hash * base + ord(text[i])) % mod
for i in range(n - m + 1):
if t_hash == p_hash and text[i:i + m] == pattern:
return i
if i + m < n:
# Roll the window: evict text[i], admit text[i + m]
t_hash = (t_hash - ord(text[i]) * power) % mod
t_hash = (t_hash * base + ord(text[i + m])) % mod
t_hash %= mod # ensure non-negative
return -1The check text[i:i+m] == pattern only executes on hash matches. In practice, with a good hash function, this almost never triggers on non-matching windows, keeping the average runtime at .
A collision happens when two different strings produce the same hash. With a single large prime modulus the probability per window is roughly , which is usually acceptable. For problems where correctness is critical (competitive programming, adversarial inputs), use double hashing: maintain two independent hashes with different base/modulus pairs and only accept a match when both agree. The collision probability drops to roughly .
def rabin_karp_double(text: str, pattern: str) -> int:
B1, M1 = 31, 10**9 + 7
B2, M2 = 37, 10**9 + 9
m, n = len(pattern), len(text)
pw1 = pow(B1, m - 1, M1)
pw2 = pow(B2, m - 1, M2)
ph1 = ph2 = th1 = th2 = 0
for i in range(m):
c = ord(pattern[i])
ph1 = (ph1 * B1 + c) % M1
ph2 = (ph2 * B2 + c) % M2
c2 = ord(text[i])
th1 = (th1 * B1 + c2) % M1
th2 = (th2 * B2 + c2) % M2
for i in range(n - m + 1):
if th1 == ph1 and th2 == ph2:
return i # collision probability negligible; skip verify
if i + m < n:
th1 = (th1 - ord(text[i]) * pw1) * B1 % M1 + ord(text[i + m])
th1 %= M1
th2 = (th2 - ord(text[i]) * pw2) * B2 % M2 + ord(text[i + m])
th2 %= M2
return -1LeetCode 1044. Longest Duplicate Substring asks for the longest substring that appears at least twice. A naive approach is in the length of candidates. Rolling hash enables a cleaner solution by combining binary search with a hash set.
The key observation: if a duplicate substring of length k exists, one of length k - 1 certainly exists too. This monotone property makes binary search valid: search for the largest k such that some window of that length repeats.
def longest_dup_substring(s: str) -> str:
def has_duplicate(length: int) -> str:
base, mod = 31, 10**9 + 7
power = pow(base, length - 1, mod)
h = 0
for c in s[:length]:
h = (h * base + ord(c)) % mod
seen = {h: [0]}
for i in range(1, len(s) - length + 1):
h = (h - ord(s[i - 1]) * power) % mod
h = (h * base + ord(s[i + length - 1])) % mod
if h in seen:
# Verify to avoid false positives
candidate = s[i:i + length]
for j in seen[h]:
if s[j:j + length] == candidate:
return candidate
seen[h].append(i)
else:
seen[h] = [i]
return ""
lo, hi, result = 1, len(s) - 1, ""
while lo <= hi:
mid = (lo + hi) // 2
found = has_duplicate(mid)
if found:
result = found
lo = mid + 1
else:
hi = mid - 1
return resultEach call to has_duplicate runs in ; binary search calls it times, giving overall.
Time: expected for single-pattern search ( worst case if every window produces a hash collision). for longest duplicate substring via binary search.
Space: extra for basic Rabin-Karp; for the hash set in the duplicate-substring variant.
LeetCode 187. Repeated DNA Sequences is the textbook Rabin-Karp warm-up: find every 10-letter substring that appears more than once. A rolling hash over a length-10 window plus a set captures every repeat in one pass.
LeetCode 1147. Longest Chunked Palindrome Decomposition greedily matches a growing prefix against the corresponding suffix. Each comparison is one rolling-hash equality check, keeping the whole algorithm linear.
LeetCode 1923. Longest Common Subpath finds the longest path appearing in every one of input paths. Binary search on the answer length combined with rolling hash gives where is total path length, infeasible without hashing.
You're probably looking at rolling hash when:
Common templates:
Practice Problems (18)