← back

Exact Match Score with Normalization

#325 · NLP · Easy

⊣ Solve on deep-ml.com

Problem

Implement an exact match scoring function with text normalization. Given a predicted answer and a reference answer, normalize both (lowercasing, removing articles, punctuation, and extra whitespace) and check for an exact match.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import re
import string

def normalize_text(text: str) -> str:
    # Lowercase
    text = text.lower()
    # Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))
    # Remove articles
    text = re.sub(r'\b(a|an|the)\b', ' ', text)
    # Collapse whitespace
    text = ' '.join(text.split())
    return text.strip()

def exact_match_score(prediction: str, reference: str) -> float:
    return 1.0 if normalize_text(prediction) == normalize_text(reference) else 0.0

def batch_exact_match(
    predictions: list,
    references: list
) -> dict:
    if len(predictions) != len(references):
        raise ValueError("Lists must have equal length")

    scores = [exact_match_score(p, r) for p, r in zip(predictions, references)]
    return {
        "scores": scores,
        "average": sum(scores) / len(scores) if scores else 0.0,
        "total_matches": int(sum(scores)),
        "total": len(scores)
    }

Explanation

  1. Normalize both prediction and reference: convert to lowercase, strip punctuation, remove articles (a, an, the), and collapse multiple spaces.
  2. Compare the normalized strings; return 1.0 for a match, 0.0 otherwise.
  3. The batch version applies this to parallel lists and returns the average exact match score.

Complexity

  • Time: O(n) per comparison where n is string length; O(B * n) for a batch of B pairs
  • Space: O(n) for normalized copies