Expected Calibration Error

Compute the Expected Calibration Error (ECE) of a binary classifier.

What is calibration?

A classifier is perfectly calibrated if, among all examples it assigns probability p, exactly fraction p of them are positive. In other words, “confidence 0.7” should mean the true label is 1 roughly 70 % of the time.

Deep neural networks are often overconfident: they output probabilities near 0 or 1 even when they are wrong. ECE quantifies this gap.

Reference

Guo et al. 2017, “On Calibration of Modern Neural Networks.”

Algorithm

Divide [0, 1] into num_bins equal-width bins. For each bin b:

bin b covers [b/num_bins, (b+1)/num_bins)
last bin (b = num_bins − 1) is closed on the right: includes 1.0

For each non-empty bin:

conf_b = mean(probs[mask_b])
acc_b  = mean(correct[mask_b])          # correct = (pred == label)
pred   = 1 if prob > 0.5, else 0

Sum the weighted absolute gaps:

ECE = Σ_b  (|B_b| / N) × |conf_b − acc_b|

Empty bins contribute 0.

When to use ECE

Detecting overconfident models (common in neural networks).
Evaluating calibration after temperature scaling or Platt scaling.
Any application where probability estimates drive downstream decisions (medical diagnosis, fraud detection, risk scoring).

Inputs

probs: shape (N,) — binary classifier output probabilities (post-sigmoid), in [0, 1].
labels: shape (N,) — {0, 1} true labels (delivered as floats).
num_bins: int — number of equal-width bins over [0, 1] (default 10).

Output

Scalar ECE (float ≥ 0).

Expected Calibration Error

Hints