← back

Optimal String Alignment Distance

#51 · NLP · Medium

⊣ Solve on deep-ml.com

Problem

Implement the Optimal String Alignment (OSA) distance between two strings. OSA distance is similar to Damerau-Levenshtein distance but does not allow applying multiple edit operations on the same substring. It supports insertions, deletions, substitutions, and transpositions of adjacent characters.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def optimal_string_alignment(s1, s2):
    len1 = len(s1)
    len2 = len(s2)
    dp = [[0] * (len2 + 1) for _ in range(len1 + 1)]

    for i in range(len1 + 1):
        dp[i][0] = i
    for j in range(len2 + 1):
        dp[0][j] = j

    for i in range(1, len1 + 1):
        for j in range(1, len2 + 1):
            cost = 0 if s1[i - 1] == s2[j - 1] else 1

            dp[i][j] = min(
                dp[i - 1][j] + 1,       # deletion
                dp[i][j - 1] + 1,       # insertion
                dp[i - 1][j - 1] + cost  # substitution
            )

            # transposition
            if i > 1 and j > 1 and s1[i - 1] == s2[j - 2] and s1[i - 2] == s2[j - 1]:
                dp[i][j] = min(dp[i][j], dp[i - 2][j - 2] + cost)

    return dp[len1][len2]

Explanation

  1. Build a DP table of size (len1+1) x (len2+1). Base cases: transforming an empty string requires i or j insertions.
  2. For each cell, consider three standard operations: deletion, insertion, and substitution.
  3. Additionally check for transposition: if the current and previous characters are swapped between the two strings, we can transpose at the cost of dp[i-2][j-2] + cost.
  4. The final answer is in dp[len1][len2].

Complexity

  • Time: O(m * n) where m and n are the lengths of the two strings
  • Space: O(m * n) for the DP table