#226 · Reinforcement Learning · Easy
⊣ Solve on deep-ml.comImplement Pass@k and Majority Voting evaluation metrics for code generation. Pass@k estimates the probability that at least one of k generated samples is correct. Majority Voting selects the most common answer from k samples.
from collections import Counter
import math
def pass_at_k(n: int, c: int, k: int) -> float:
"""
Unbiased estimator of pass@k.
n: total samples generated
c: number of correct samples
k: number of samples to consider
"""
if n - c < k:
return 1.0
# pass@k = 1 - C(n-c, k) / C(n, k)
# Compute in log space for numerical stability
numerator = 0.0
denominator = 0.0
for i in range(k):
numerator += math.log(n - c - i)
denominator += math.log(n - i)
return round(1.0 - math.exp(numerator - denominator), 6)
def majority_voting(answers: list[str]) -> str:
"""
Return the most common answer from a list of generated answers.
"""
if not answers:
return ""
counts = Counter(answers)
return counts.most_common(1)[0][0]1 - C(n-c, k) / C(n, k), where n is total samples, c is correct ones, and k is how many we pick.