← back

Calculate Expected Calibration Error (ECE)

#260 · Machine Learning · Medium

⊣ Solve on deep-ml.com

Problem

Calculate the Expected Calibration Error (ECE) for a classification model. ECE measures how well predicted probabilities match actual outcomes by binning predictions and comparing average confidence to average accuracy in each bin.

Solution

Partition predictions into equal-width confidence bins, compute the absolute difference between accuracy and confidence in each bin, and take the weighted average.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def expected_calibration_error(
    y_true: list[int],
    y_pred_prob: list[float],
    n_bins: int = 10,
) -> float:
    n = len(y_true)
    if n == 0:
        return 0.0

    bin_boundaries = [i / n_bins for i in range(n_bins + 1)]

    ece = 0.0
    for b in range(n_bins):
        lo = bin_boundaries[b]
        hi = bin_boundaries[b + 1]

        # Collect samples in this bin
        indices = []
        for i in range(n):
            if b == n_bins - 1:
                if lo <= y_pred_prob[i] <= hi:
                    indices.append(i)
            else:
                if lo <= y_pred_prob[i] < hi:
                    indices.append(i)

        if not indices:
            continue

        bin_size = len(indices)
        avg_confidence = sum(y_pred_prob[i] for i in indices) / bin_size
        avg_accuracy = sum(y_true[i] for i in indices) / bin_size

        ece += (bin_size / n) * abs(avg_accuracy - avg_confidence)

    return round(ece, 6)

Explanation

  1. Divide the probability range [0, 1] into n_bins equal-width intervals.
  2. Assign each prediction to its corresponding bin.
  3. For each bin, compute the average confidence (mean predicted probability) and the average accuracy (fraction of correct predictions).
  4. ECE = weighted sum of |accuracy - confidence| across bins, weighted by bin size.
  5. A perfectly calibrated model has ECE = 0.

Complexity

  • Time: O(n * b) where n is the number of samples and b is the number of bins
  • Space: O(n) in the worst case for indices per bin