← back

Pass@k and Majority Voting Evaluation Metrics

#226 · Reinforcement Learning · Easy

⊣ Solve on deep-ml.com

Problem

Implement Pass@k and Majority Voting evaluation metrics for code generation. Pass@k estimates the probability that at least one of k generated samples is correct. Majority Voting selects the most common answer from k samples.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from collections import Counter
import math

def pass_at_k(n: int, c: int, k: int) -> float:
    """
    Unbiased estimator of pass@k.
    n: total samples generated
    c: number of correct samples
    k: number of samples to consider
    """
    if n - c < k:
        return 1.0
    # pass@k = 1 - C(n-c, k) / C(n, k)
    # Compute in log space for numerical stability
    numerator = 0.0
    denominator = 0.0
    for i in range(k):
        numerator += math.log(n - c - i)
        denominator += math.log(n - i)
    return round(1.0 - math.exp(numerator - denominator), 6)


def majority_voting(answers: list[str]) -> str:
    """
    Return the most common answer from a list of generated answers.
    """
    if not answers:
        return ""
    counts = Counter(answers)
    return counts.most_common(1)[0][0]

Explanation

  1. Pass@k uses the unbiased estimator: 1 - C(n-c, k) / C(n, k), where n is total samples, c is correct ones, and k is how many we pick.
  2. The computation is done in log space to avoid large combinatorial overflow.
  3. If fewer than k incorrect samples exist, pass@k is trivially 1.0.
  4. Majority Voting simply counts occurrences and returns the most frequent answer, implementing a simple consensus mechanism.

Complexity

  • Time: O(k) for pass@k, O(n) for majority voting
  • Space: O(1) for pass@k, O(n) for majority voting