Implement Precision-Recall Curve

#278 · Machine Learning · Medium

Problem

Implement the Precision-Recall Curve calculation. Given true binary labels and predicted probabilities, compute precision and recall at various thresholds.

Solution

Sort predictions descending, sweep thresholds, and compute precision and recall at each step.

def precision_recall_curve(
    y_true: list[int],
    y_scores: list[float],
) -> dict:
    n = len(y_true)
    total_pos = sum(y_true)

    if total_pos == 0:
        return {"precision": [0.0], "recall": [0.0], "thresholds": []}

    # Sort by score descending
    paired = sorted(zip(y_scores, y_true), key=lambda x: -x[0])

    precisions = []
    recalls = []
    thresholds = []

    tp = 0
    fp = 0
    prev_score = None

    for i, (score, label) in enumerate(paired):
        if label == 1:
            tp += 1
        else:
            fp += 1

        if i + 1 < n and paired[i + 1][0] == score:
            continue  # process ties together

        precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
        recall = tp / total_pos

        precisions.append(round(precision, 6))
        recalls.append(round(recall, 6))
        thresholds.append(round(score, 6))

    # Add the start point (recall=0, precision=1)
    precisions.insert(0, 1.0)
    recalls.insert(0, 0.0)

    return {
        "precision": precisions,
        "recall": recalls,
        "thresholds": thresholds,
    }

Explanation

Sort all predictions by score in descending order.
Sweep the threshold from high to low. At each step, one more sample is classified as positive.
Precision = TP / (TP + FP) — fraction of positive predictions that are correct.
Recall = TP / total positives — fraction of actual positives that are detected.
The curve starts at (recall=0, precision=1) and generally shows a precision-recall tradeoff.
The area under the PR curve (Average Precision) is especially useful for imbalanced datasets.

Complexity

Time: O(n log n) for sorting
Space: O(n) for the curve points

← #277 #279 →