Pass@k and Majority Voting Evaluation Metrics

#226 · Reinforcement Learning · Easy

Problem

Implement Pass@k and Majority Voting evaluation metrics for code generation. Pass@k estimates the probability that at least one of k generated samples is correct. Majority Voting selects the most common answer from k samples.

Solution

from collections import Counter
import math

def pass_at_k(n: int, c: int, k: int) -> float:
    """
    Unbiased estimator of pass@k.
    n: total samples generated
    c: number of correct samples
    k: number of samples to consider
    """
    if n - c < k:
        return 1.0
    # pass@k = 1 - C(n-c, k) / C(n, k)
    # Compute in log space for numerical stability
    numerator = 0.0
    denominator = 0.0
    for i in range(k):
        numerator += math.log(n - c - i)
        denominator += math.log(n - i)
    return round(1.0 - math.exp(numerator - denominator), 6)


def majority_voting(answers: list[str]) -> str:
    """
    Return the most common answer from a list of generated answers.
    """
    if not answers:
        return ""
    counts = Counter(answers)
    return counts.most_common(1)[0][0]

Explanation

Pass@k uses the unbiased estimator: 1 - C(n-c, k) / C(n, k), where n is total samples, c is correct ones, and k is how many we pick.
The computation is done in log space to avoid large combinatorial overflow.
If fewer than k incorrect samples exist, pass@k is trivially 1.0.
Majority Voting simply counts occurrences and returns the most frequent answer, implementing a simple consensus mechanism.

Complexity

Time: O(k) for pass@k, O(n) for majority voting
Space: O(1) for pass@k, O(n) for majority voting

← #225 #227 →