← back

Math Answer Verification with Equivalence Checking

#319 · LLM · Medium

⊣ Solve on deep-ml.com

Problem

Verify whether a student's math answer is equivalent to a reference answer. Handle numeric comparisons (with tolerance), symbolic equivalence for simple expressions, and string normalization for exact matching.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import re
from typing import Union

def normalize_answer(answer: str) -> str:
    answer = answer.strip()
    answer = answer.replace(" ", "")
    answer = re.sub(r'\\left|\\right', '', answer)
    answer = answer.replace("\\frac", "frac")
    answer = answer.replace("\\cdot", "*")
    answer = answer.replace("\\times", "*")
    return answer

def try_parse_number(s: str) -> Union[float, None]:
    s = s.strip().replace(",", "")
    # Handle fractions like 3/4
    frac_match = re.match(r'^(-?\d+)\s*/\s*(-?\d+)$', s)
    if frac_match:
        num, den = int(frac_match.group(1)), int(frac_match.group(2))
        return num / den if den != 0 else None
    try:
        return float(s)
    except ValueError:
        return None

def math_answer_verify(
    student_answer: str,
    reference_answer: str,
    tolerance: float = 1e-6
) -> bool:
    s_num = try_parse_number(student_answer)
    r_num = try_parse_number(reference_answer)

    if s_num is not None and r_num is not None:
        return abs(s_num - r_num) <= tolerance

    s_norm = normalize_answer(student_answer)
    r_norm = normalize_answer(reference_answer)

    return s_norm == r_norm

Explanation

  1. First, attempt to parse both answers as numbers (handling fractions like 3/4).
  2. If both parse as numbers, compare with a floating-point tolerance.
  3. Otherwise, normalize both strings by removing spaces, stripping LaTeX commands like \left and \right, and unifying multiplication symbols.
  4. Compare the normalized strings for exact match.

Complexity

  • Time: O(n) where n is the length of the answer strings
  • Space: O(n) for normalized copies