← back

Calculate AUC (Area Under ROC Curve)

#277 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Calculate the AUC (Area Under the ROC Curve) from scratch. Given the true binary labels and predicted probabilities, compute the AUC score using the trapezoidal rule.

Solution

Compute the ROC curve (FPR and TPR at various thresholds), then integrate using the trapezoidal rule.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def compute_auc(y_true: list[int], y_scores: list[float]) -> float:
    n = len(y_true)
    total_pos = sum(y_true)
    total_neg = n - total_pos

    if total_pos == 0 or total_neg == 0:
        return 0.0

    # Sort by score descending
    paired = sorted(zip(y_scores, y_true), key=lambda x: -x[0])

    fpr_list = [0.0]
    tpr_list = [0.0]

    tp = 0
    fp = 0
    prev_score = None

    for score, label in paired:
        if score != prev_score and prev_score is not None:
            fpr_list.append(fp / total_neg)
            tpr_list.append(tp / total_pos)
        if label == 1:
            tp += 1
        else:
            fp += 1
        prev_score = score

    fpr_list.append(fp / total_neg)
    tpr_list.append(tp / total_pos)

    # Trapezoidal rule for AUC
    auc = 0.0
    for i in range(1, len(fpr_list)):
        dx = fpr_list[i] - fpr_list[i - 1]
        auc += dx * (tpr_list[i] + tpr_list[i - 1]) / 2.0

    return round(auc, 6)

Explanation

  1. Compute the ROC curve by sorting predictions descending and sweeping thresholds.
  2. At each distinct threshold, record (FPR, TPR).
  3. Apply the trapezoidal rule to integrate: sum of (FPR[i] - FPR[i-1]) * (TPR[i] + TPR[i-1]) / 2.
  4. AUC = 1.0 for a perfect classifier, 0.5 for random guessing, and < 0.5 for worse-than-random.
  5. AUC is threshold-independent — it measures overall ranking quality.

Complexity

  • Time: O(n log n) for sorting
  • Space: O(n) for ROC curve points