Calculate Expected Calibration Error (ECE)

#260 · Machine Learning · Medium

Problem

Calculate the Expected Calibration Error (ECE) for a classification model. ECE measures how well predicted probabilities match actual outcomes by binning predictions and comparing average confidence to average accuracy in each bin.

Solution

Partition predictions into equal-width confidence bins, compute the absolute difference between accuracy and confidence in each bin, and take the weighted average.

def expected_calibration_error(
    y_true: list[int],
    y_pred_prob: list[float],
    n_bins: int = 10,
) -> float:
    n = len(y_true)
    if n == 0:
        return 0.0

    bin_boundaries = [i / n_bins for i in range(n_bins + 1)]

    ece = 0.0
    for b in range(n_bins):
        lo = bin_boundaries[b]
        hi = bin_boundaries[b + 1]

        # Collect samples in this bin
        indices = []
        for i in range(n):
            if b == n_bins - 1:
                if lo <= y_pred_prob[i] <= hi:
                    indices.append(i)
            else:
                if lo <= y_pred_prob[i] < hi:
                    indices.append(i)

        if not indices:
            continue

        bin_size = len(indices)
        avg_confidence = sum(y_pred_prob[i] for i in indices) / bin_size
        avg_accuracy = sum(y_true[i] for i in indices) / bin_size

        ece += (bin_size / n) * abs(avg_accuracy - avg_confidence)

    return round(ece, 6)

Explanation

Divide the probability range [0, 1] into n_bins equal-width intervals.
Assign each prediction to its corresponding bin.
For each bin, compute the average confidence (mean predicted probability) and the average accuracy (fraction of correct predictions).
ECE = weighted sum of |accuracy - confidence| across bins, weighted by bin size.
A perfectly calibrated model has ECE = 0.

Complexity

Time: O(n * b) where n is the number of samples and b is the number of bins
Space: O(n) in the worst case for indices per bin

← #259 #261 →