medium primitives

Expected Calibration Error

Compute the Expected Calibration Error (ECE) of a binary classifier.

What is calibration?

A classifier is perfectly calibrated if, among all examples it assigns probability p, exactly fraction p of them are positive. In other words, “confidence 0.7” should mean the true label is 1 roughly 70 % of the time.

Deep neural networks are often overconfident: they output probabilities near 0 or 1 even when they are wrong. ECE quantifies this gap.

Reference

Guo et al. 2017, “On Calibration of Modern Neural Networks.”

Algorithm

Divide [0, 1] into num_bins equal-width bins. For each bin b:

bin b covers [b/num_bins, (b+1)/num_bins)
last bin (b = num_bins − 1) is closed on the right: includes 1.0

For each non-empty bin:

conf_b = mean(probs[mask_b])
acc_b  = mean(correct[mask_b])          # correct = (pred == label)
pred   = 1 if prob > 0.5, else 0

Sum the weighted absolute gaps:

ECE = Σ_b  (|B_b| / N) × |conf_b − acc_b|

Empty bins contribute 0.

When to use ECE

  • Detecting overconfident models (common in neural networks).
  • Evaluating calibration after temperature scaling or Platt scaling.
  • Any application where probability estimates drive downstream decisions (medical diagnosis, fraud detection, risk scoring).

Inputs

  • probs: shape (N,) — binary classifier output probabilities (post-sigmoid), in [0, 1].
  • labels: shape (N,){0, 1} true labels (delivered as floats).
  • num_bins: int — number of equal-width bins over [0, 1] (default 10).

Output

Scalar ECE (float ≥ 0).

Hints

metrics calibration

Sign in to attempt this problem and view the solution.