← back

Dynamic Programming

String Matching DP (LCS / Edit Distance)

2D dp[i][j] for aligning two strings. Covers longest common subsequence, edit distance, regex/wildcard matching, interleaving strings, and distinct subsequences.

When you type "recieve" into a text editor and it suggests "receive," something under the hood is computing how similar two strings are. When a version control tool shows you a diff between two files, it is finding the longest sequence of lines shared between the old and new versions. These tasks, measuring string similarity, transforming one string into another, aligning two sequences, all share a common DP structure: a 2D table where rows represent characters from one string and columns represent characters from the other.

This is one of the most important DP patterns because it appears constantly in disguise. Edit distance, longest common subsequence, interleaving strings, regex matching, and wildcard matching are all the same skeleton with different transition rules.

Edit distance (Levenshtein distance)

LeetCode 72 defines the edit distance between two strings as the minimum number of single-character operations needed to transform one into the other. The allowed operations are:

  • Insert a character
  • Delete a character
  • Replace a character with a different one

Let us work through a concrete example: transforming "horse" into "ros."

Define dp[i][j] as the edit distance between the first i characters of "horse" and the first j characters of "ros." We build a table with rows 0 through 5 (for "horse") and columns 0 through 3 (for "ros").

Base cases: dp[0][j] = j (inserting j characters to build the first j characters of "ros" from nothing) and dp[i][0] = i (deleting i characters from the first i characters of "horse" to reach the empty string).

The table fills out like this:

1
2
3
4
5
6
7
      ""  r  o  s
  ""   0  1  2  3
  h    1  1  2  3
  o    2  2  1  2
  r    3  2  2  2
  s    4  3  3  2
  e    5  4  4  3

The answer is dp[5][3] = 3. The three operations: replace "h" with "r," delete "r" (now at position 2), delete "e." Let us see how the recurrence produced this table.

The three operations and their cells

At each cell dp[i][j], we ask: does s1[i-1] match s2[j-1]? (We use 1-indexed DP but 0-indexed strings.)

If the characters match (s1[i-1] == s2[j-1]), no operation is needed for this pair. We inherit the cost from the diagonal: dp[i][j] = dp[i-1][j-1].

If they do not match, we take the cheapest of three options, each corresponding to one operation:

  • dp[i-1][j-1] + 1: Replace s1[i-1] with s2[j-1]. We "use up" both characters and pay one operation.
  • dp[i-1][j] + 1: Delete s1[i-1]. We advance through s1 but stay at the same position in s2, hoping that subsequent characters will match.
  • dp[i][j-1] + 1: Insert s2[j-1] into s1. We stay at the same position in s1 but advance through s2, because the inserted character matches.

A helpful mental model: the diagonal neighbor is replace, the cell above is delete, and the cell to the left is insert. Getting these three neighbors right is the most common source of bugs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def edit_distance(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1 + min(dp[i-1][j-1], dp[i-1][j], dp[i][j-1])
    return dp[m][n]

Longest common subsequence (LCS)

LeetCode 1143 asks: what is the longest sequence of characters that appears in both strings, in order, but not necessarily contiguous? For "abcde" and "ace," the LCS is "ace" with length 3.

The table structure is the same: dp[i][j] is the length of the LCS of s1[:i] and s2[:j]. The recurrence is:

  • If s1[i-1] == s2[j-1]: dp[i][j] = dp[i-1][j-1] + 1. A match extends the subsequence by one.
  • Otherwise: dp[i][j] = max(dp[i-1][j], dp[i][j-1]). We skip either the current character of s1 or the current character of s2, whichever yields a longer subsequence.

Notice the difference from edit distance: on a match, edit distance copies the diagonal (no cost), while LCS adds one to the diagonal (we found a matching character, so the subsequence grows). On a mismatch, edit distance takes a min with +1 (an operation was performed), while LCS takes a max without adding anything: no character was matched, we just pick the better option from skipping one string or the other.

1
2
3
4
5
6
7
8
9
10
def lcs(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])
    return dp[m][n]

The connection between LCS and edit distance

These two problems are more closely related than they first appear. If you only allow insertions and deletions (no replacements), the edit distance between two strings equals m + n - 2 * LCS(s1, s2). You delete the characters in s1 that are not in the LCS (m - LCS deletions) and insert the characters in s2 that are not in the LCS (n - LCS insertions). This relationship is useful when a problem asks about converting one string to another using only inserts and deletes.

Space optimization: the rolling row trick

The 2D table uses O(mn)O(m \cdot n) space, but look at the recurrence: dp[i][j] depends on three cells, namely dp[i-1][j-1], dp[i-1][j], and dp[i][j-1]. All of these are either in the current row or the immediately previous row. We never need rows before i-1, so we can use a single 1D array and overwrite it row by row.

There is one catch. When we iterate left to right within a row, dp[j] currently holds the value from the previous row (dp[i-1][j]), which is our "cell above." And dp[j-1] has already been updated to the current row (dp[i][j-1]), which is our "cell to the left." But the diagonal value dp[i-1][j-1] was the old dp[j-1] before we overwrote it. We need to save it in a prev variable before the update.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def edit_distance_optimized(s1, s2):
    m, n = len(s1), len(s2)
    dp = list(range(n + 1))
    for i in range(1, m + 1):
        prev = dp[0]
        dp[0] = i
        for j in range(1, n + 1):
            temp = dp[j]
            if s1[i-1] == s2[j-1]:
                dp[j] = prev
            else:
                dp[j] = 1 + min(prev, dp[j], dp[j-1])
            prev = temp
    return dp[n]

The variable prev holds the diagonal value (dp[i-1][j-1]). Before we overwrite dp[j], we save its current value in temp, which becomes prev for the next column. Threading this correctly is the only tricky part. The space drops from O(mn)O(m \cdot n) to O(min(m,n))O(\min(m, n)). You can always swap the strings so the shorter one determines the array length.

Interleaving strings

Consider LeetCode 97: given strings s1, s2, and s3, determine if s3 is formed by interleaving s1 and s2, where characters from s1 and s2 appear in s3 in their original order, but can be woven together in any way.

This is another 2D string DP. Define dp[i][j] as: can the first i characters of s1 and the first j characters of s2 form the first i + j characters of s3?

The transition: dp[i][j] is True if either:

  • dp[i-1][j] is True and s1[i-1] == s3[i+j-1] (the next character in s3 comes from s1), or
  • dp[i][j-1] is True and s2[j-1] == s3[i+j-1] (the next character in s3 comes from s2).

Same 2D table, same row-by-row fill, just different transition logic. The pattern is recognizable once you see that the state needs to track independent progress through two sequences.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def isInterleave(s1, s2, s3):
    m, n = len(s1), len(s2)
    if m + n != len(s3):
        return False
    dp = [False] * (n + 1)
    for i in range(m + 1):
        for j in range(n + 1):
            if i == 0 and j == 0:
                dp[j] = True
            elif i == 0:
                dp[j] = dp[j-1] and s2[j-1] == s3[j-1]
            elif j == 0:
                dp[j] = dp[j] and s1[i-1] == s3[i-1]
            else:
                dp[j] = (dp[j] and s1[i-1] == s3[i+j-1]) or                          (dp[j-1] and s2[j-1] == s3[i+j-1])
    return dp[n]

When the pattern appears in disguise

Several well-known problems are 2D string DP wearing a costume:

Regular expression matching (LeetCode 10): dp[i][j] = does s[:i] match pattern p[:j]? The transitions handle literal characters, . (matches any single character), and * (matches zero or more of the preceding character). The * case is the tricky one: it can either consume zero characters (dp[i][j-2]) or consume one more character if it matches (dp[i-1][j] when s[i-1] matches p[j-2]).

Wildcard matching (LeetCode 44): similar structure but simpler rules. ? matches exactly one character, * matches any sequence (including empty). The * transition is: dp[i][j] = dp[i][j-1] (star matches empty) or dp[i-1][j] (star matches one more character).

Distinct subsequences (LeetCode 115): count the number of ways s2 appears as a subsequence of s1. On a match, dp[i][j] = dp[i-1][j-1] + dp[i-1][j]: either use this match or skip it. On a mismatch, dp[i][j] = dp[i-1][j], which skips the character in s1.

In every case, the skeleton is the same: two sequences, a 2D table tracking progress through both, and transitions based on whether characters at the current positions match, mismatch, or involve a special wildcard/operator.

Distinct subsequences

For LeetCode 115, given strings s and t, count the number of distinct subsequences of s that equal t. In other words, how many ways can you delete characters from s to produce t?

Define dp[i][j] as the number of ways to form t[0..j-1] from s[0..i-1]. Base cases: dp[i][0] = 1 for all i (there is exactly one way to form the empty string, which is to delete everything), and dp[0][j] = 0 for j > 0 (you cannot form a non-empty string from nothing).

The recurrence: if s[i-1] == t[j-1], we can either use this character match (dp[i-1][j-1]) or skip s[i-1] (dp[i-1][j]). If they do not match, we must skip s[i-1]: dp[i][j] = dp[i-1][j].

For s = "rabbbit", t = "rabbit": the answer is 3, corresponding to deleting each of the three bs in turn.

1
2
3
4
5
6
7
8
9
def numDistinct(s, t):
    m, n = len(s), len(t)
    dp = [0] * (n + 1)
    dp[0] = 1
    for i in range(1, m + 1):
        for j in range(n, 0, -1):  # iterate right to left to avoid overwriting
            if s[i-1] == t[j-1]:
                dp[j] += dp[j-1]
    return dp[n]

Note the right-to-left inner loop. Since we only use a 1D array, iterating left to right would overwrite dp[j-1] before we need it. Reversing the direction preserves the "previous row" values we depend on.

Regular expression matching

LeetCode 10 asks whether string s matches pattern p where . matches any single character and * matches zero or more of the preceding element.

Define dp[i][j] as: does s[0..i-1] match p[0..j-1]? Base case: dp[0][0] = True. For the first row, patterns like a*b* can match the empty string, so dp[0][j] = True if p[j-1] == '*' and dp[0][j-2] is True (the star eliminates its preceding element).

The transitions:

  • If p[j-1] is a literal or .: dp[i][j] = dp[i-1][j-1] when s[i-1] matches p[j-1].
  • If p[j-1] == '*': two choices. The star matches zero of the preceding element: dp[i][j-2]. Or, if s[i-1] matches p[j-2], the star matches one more character: dp[i-1][j].

The dp[i-1][j] transition for the star case is the subtle part. It says "the star already matched everything up to s[i-2], and s[i-1] matches the preceding element too, so extend the match."

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def isMatch(s, p):
    m, n = len(s), len(p)
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    dp[0][0] = True
    for j in range(2, n + 1):
        if p[j-1] == '*':
            dp[0][j] = dp[0][j-2]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if p[j-1] == '*':
                dp[i][j] = dp[i][j-2]  # zero occurrences
                if p[j-2] == '.' or p[j-2] == s[i-1]:
                    dp[i][j] = dp[i][j] or dp[i-1][j]  # one more occurrence
            elif p[j-1] == '.' or p[j-1] == s[i-1]:
                dp[i][j] = dp[i-1][j-1]
    return dp[m][n]

Minimum ASCII delete sum

With LeetCode 712, given two strings, find the lowest ASCII sum of deleted characters to make the two strings equal. This is a weighted variant of LCS: instead of maximizing the count of matched characters, you minimize the cost of characters you throw away.

Define dp[i][j] as the minimum ASCII delete cost to make s1[0..i-1] and s2[0..j-1] equal. Base cases: dp[i][0] is the sum of ASCII values of s1[0..i-1] (delete everything from s1), and dp[0][j] is the sum of ASCII values of s2[0..j-1].

If s1[i-1] == s2[j-1], no deletion needed: dp[i][j] = dp[i-1][j-1]. Otherwise, delete whichever character costs less: dp[i][j] = min(dp[i-1][j] + ord(s1[i-1]), dp[i][j-1] + ord(s2[j-1])).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def minimumDeleteSum(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        dp[i][0] = dp[i-1][0] + ord(s1[i-1])
    for j in range(1, n + 1):
        dp[0][j] = dp[0][j-1] + ord(s2[j-1])
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = min(dp[i-1][j] + ord(s1[i-1]),
                               dp[i][j-1] + ord(s2[j-1]))
    return dp[m][n]

An equivalent approach: find the LCS by maximum ASCII value instead of maximum count, then the answer is total_ascii(s1) + total_ascii(s2) - 2 * max_ascii_lcs. Both formulations yield the same result.

Longest palindromic subsequence

Take LeetCode 516: it asks for the longest palindromic subsequence of a single string. There are two clean ways to frame this:

Approach 1: LCS with the reverse. A palindromic subsequence of s is a common subsequence of s and reverse(s). So the answer is LCS(s, s[::-1]). You already have the LCS code above, so simply feed it the string and its reverse. This runs in O(n2)O(n^2) time and is trivial to implement.

Approach 2: Interval DP. Define dp[i][j] as the longest palindromic subsequence in s[i..j]. Base case: every single character is a palindrome of length 1 (dp[i][i] = 1). If s[i] == s[j], the two endpoints can frame a palindrome: dp[i][j] = dp[i+1][j-1] + 2. Otherwise, try excluding one end: dp[i][j] = max(dp[i+1][j], dp[i][j-1]).

1
2
3
4
5
6
7
8
9
10
11
12
13
def longestPalindromeSubseq(s):
    n = len(s)
    dp = [[0] * n for _ in range(n)]
    for i in range(n):
        dp[i][i] = 1
    for length in range(2, n + 1):
        for i in range(n - length + 1):
            j = i + length - 1
            if s[i] == s[j]:
                dp[i][j] = dp[i+1][j-1] + 2
            else:
                dp[i][j] = max(dp[i+1][j], dp[i][j-1])
    return dp[0][n-1]

The interval DP approach processes substrings by increasing length. It is worth knowing because it generalizes to other palindrome problems (e.g., minimum insertions to make a string a palindrome, which equals n - longestPalindromeSubseq(s)).

Complexity analysis

Time: O(mn)O(m \cdot n) where m and n are the lengths of the two strings. Every cell in the table is filled in O(1)O(1) time.

Space: O(min(m,n))O(\min(m, n)) with the rolling row optimization. Without it, O(mn)O(m \cdot n). For problems where you need to reconstruct the actual sequence (not just its length or distance), you typically need the full 2D table to backtrack through, bringing space back to O(mn)O(m \cdot n).

Recognizing this pattern

You're probably looking at string-matching DP when:

  • The problem hands you two strings (or sequences) and asks how they relate: similarity, edit cost, alignment, interleaving.
  • The natural state is "first i chars of A vs. first j chars of B," giving a clean 2D grid.
  • The transition at (i, j) splits on whether A[i-1] == B[j-1]: a match takes from the diagonal, and a mismatch chooses among the three neighbors.
  • Brute-force enumeration of subsequences would be 2n2^n but constraints are around m,n103m, n \le 10^3, fitting an O(mn)O(m \cdot n) table.

Common templates:

  • LCS skeleton: dp[i][j] = dp[i-1][j-1] + 1 on match, else max(dp[i-1][j], dp[i][j-1]). Example: Longest Common Subsequence.
  • Edit distance: three neighbors represent insert/delete/replace, take min + 1. Example: Edit Distance.
  • Distinct subsequences counting: dp[i][j] = dp[i-1][j-1] + dp[i-1][j] on match, else dp[i-1][j]. Example: Distinct Subsequences.
  • Wildcard / regex matching: match characters one-by-one but special tokens like * introduce extra transitions. Example: Regular Expression Matching.
  • Interleaving / merge check: dp[i][j] asks whether a third string can be formed from prefixes of two others. Example: Interleaving String.