String Algorithms
Precompute the longest proper prefix-suffix (LPS) array for the pattern. Use it to skip ahead in the text when a mismatch occurs, avoiding redundant comparisons. Achieves O(n + m) matching.
The naive approach to finding a pattern of length m in a text of length n is : slide the pattern one position at a time and compare from scratch at each position. On adversarial inputs, like searching for "aaaab" in "aaaaaaaaaa", the algorithm redoes enormous amounts of work. The Knuth-Morris-Pratt algorithm (KMP) eliminates this wasted effort and runs in by making one key observation: when a mismatch occurs after matching k characters, you already know the last k characters of the text. If those characters contain a prefix of the pattern, you can resume matching there instead of starting over.
The mechanism that makes this work is the LPS array (Longest Proper Prefix which is also a Suffix). For a pattern of length m, lps[i] stores the length of the longest proper prefix of pattern[0..i] that is also a suffix of that same substring. "Proper" means the prefix cannot be the entire string itself.
For the pattern "ABABCABAB":
lps[0] = 0: "A" has no proper prefix.lps[1] = 0: "AB": "A" ≠ "B".lps[2] = 1: "ABA": prefix "A" equals suffix "A".lps[3] = 2: "ABAB": prefix "AB" equals suffix "AB".lps[4] = 0: "ABABC": no match.lps[5] = 1: "ABABCA": prefix "A" equals suffix "A".lps[6] = 2: "ABABCAB": prefix "AB" equals suffix "AB".lps[7] = 3: "ABABCABA": prefix "ABA" equals suffix "ABA".lps[8] = 4: "ABABCABAB": prefix "ABAB" equals suffix "ABAB".So lps = [0, 0, 1, 2, 0, 1, 2, 3, 4]. When a mismatch occurs after matching j characters, the LPS entry lps[j-1] tells you the longest prefix of the pattern that you already know still matches. You jump j back to that position and continue without advancing the text pointer.
The LPS array is built by running essentially the same two-pointer logic you'll use during search, but applied to the pattern matching against itself.
def build_lps(pattern: str) -> list[int]:
lps = [0] * len(pattern)
j = 0 # length of the current matching prefix
for i in range(1, len(pattern)):
while j and pattern[i] != pattern[j]:
j = lps[j - 1] # fall back using previously computed values
if pattern[i] == pattern[j]:
j += 1
lps[i] = j
return lpsj tracks how long a matching prefix we currently have. When a mismatch occurs, rather than resetting to zero, we consult lps[j-1] to find the best fallback: the longest prefix that was also a suffix of the portion we already matched. This fallback might itself mismatch, so we loop until we either find a match or exhaust fallbacks down to zero.
With lps in hand, the search loop mirrors the construction loop exactly, just with two different strings.
def kmp_search(text: str, pattern: str) -> list[int]:
if not pattern:
return []
lps = build_lps(pattern)
j = 0
matches = []
for i in range(len(text)):
while j and text[i] != pattern[j]:
j = lps[j - 1]
if text[i] == pattern[j]:
j += 1
if j == len(pattern):
matches.append(i - j + 1)
j = lps[j - 1] # look for overlapping matches
return matchesWhen j reaches len(pattern), a full match is recorded at position i - j + 1. We then reset j = lps[j-1] to keep searching for overlapping occurrences. The matched suffix may serve as the prefix of the next match.
Both the build and search phases have the same amortized argument. The text pointer i (or its equivalent during build) advances exactly n times total. The pattern pointer j can only increase by 1 per iteration of the outer loop, and each fallback via lps[j-1] strictly decreases j. So j decreases at most as many times as it increases, meaning the total number of while iterations across the entire loop is bounded by n. The build phase costs by the same argument, giving overall.
LeetCode 28. Find the Index of the First Occurrence in a String is plain substring search: kmp_search returning the first index instead of all matches.
LeetCode 214. Shortest Palindrome. To find the shortest palindrome formed by prepending characters to a string s, you need the longest prefix of s that is itself a palindrome. Concatenate s + '#' + reverse(s) and run the LPS build, and lps[-1] gives the length of that longest palindromic prefix directly. The '#' separator prevents the reversed half from bleeding into the prefix computation.
LeetCode 459. Repeated Substring Pattern. A string s of length n is built from a repeating unit if and only if n % (n - lps[n-1]) == 0. Build the LPS array for s itself: lps[n-1] is the length of the longest proper prefix-suffix of the whole string. The candidate period is n - lps[n-1], and if it divides n evenly, the string is periodic.
def repeated_substring_pattern(s: str) -> bool:
lps = build_lps(s)
n = len(s)
period = n - lps[n - 1]
return period != n and n % period == 0You're probably looking at KMP when:
Common templates:
n - lps[n-1] is the candidate period; divides n iff the string repeats. Example: Repeated Substring Pattern.s + '#' + reverse(s) and read off the last entry. Example: Shortest Palindrome.Practice Problems (10)