We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Expected Calibration Error
Compute the Expected Calibration Error (ECE) of a binary classifier.
What is calibration?
A classifier is perfectly calibrated if, among all examples it assigns
probability p, exactly fraction p of them are positive. In other
words, “confidence 0.7” should mean the true label is 1 roughly 70 % of
the time.
Deep neural networks are often overconfident: they output probabilities near 0 or 1 even when they are wrong. ECE quantifies this gap.
Reference
Guo et al. 2017, “On Calibration of Modern Neural Networks.”
Algorithm
Divide [0, 1] into num_bins equal-width bins. For each bin b:
bin b covers [b/num_bins, (b+1)/num_bins)
last bin (b = num_bins − 1) is closed on the right: includes 1.0
For each non-empty bin:
conf_b = mean(probs[mask_b])
acc_b = mean(correct[mask_b]) # correct = (pred == label)
pred = 1 if prob > 0.5, else 0
Sum the weighted absolute gaps:
ECE = Σ_b (|B_b| / N) × |conf_b − acc_b|
Empty bins contribute 0.
When to use ECE
- Detecting overconfident models (common in neural networks).
- Evaluating calibration after temperature scaling or Platt scaling.
- Any application where probability estimates drive downstream decisions (medical diagnosis, fraud detection, risk scoring).
Inputs
-
probs: shape(N,)— binary classifier output probabilities (post-sigmoid), in[0, 1]. -
labels: shape(N,)—{0, 1}true labels (delivered as floats). -
num_bins: int — number of equal-width bins over[0, 1](default 10).
Output
Scalar ECE (float ≥ 0).
Hints
Sign in to attempt this problem and view the solution.