We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Precision, Recall & F1
Compute per-class precision, recall, and F1, plus their macro-averages.
Definitions
For class i, define:
-
True Positives (TP): examples whose true label and predicted label
are both
i— i.e.M[i, i]in the confusion matrix. -
False Positives (FP): examples predicted as
ibut with a different true label — rest of columni. -
False Negatives (FN): examples whose true label is
ibut that were predicted as something else — rest of rowi.
Precision[i] = TP / (TP + FP) = M[i, i] / sum(M[:, i])
Recall[i] = TP / (TP + FN) = M[i, i] / sum(M[i, :])
F1[i] = 2 · P[i] · R[i] / (P[i] + R[i])
0 / 0 convention: if the denominator is zero, the metric is defined as
0.0 (avoids NaN).
Macro-averaging
Macro-averaging computes each metric independently per class and then takes a simple (unweighted) mean:
Macro-P = mean(P[0], …, P[C-1])
Macro-R = mean(R[0], …, R[C-1])
Macro-F = mean(F1[0], …, F1[C-1])
This treats every class equally regardless of its frequency. The alternative — micro-averaging — pools all TP / FP / FN counts before dividing, which weights by class size. Use macro when class imbalance should not dominate the reported metric.
When to use precision vs recall vs F1
- Precision matters when false positives are costly (e.g. spam filter: you do not want legitimate email flagged).
- Recall matters when false negatives are costly (e.g. cancer screening: you do not want a positive case missed).
- F1 is the harmonic mean of the two — useful when you want a single number that balances both concerns.
Inputs
-
predictions: shape(N,)— predicted class indices (delivered as floats by the test harness). -
labels: shape(N,)— true class indices (delivered as floats). -
num_classes: integerC.
Output
Tensor of shape (3, C+1):
| Row | Contents |
|---|---|
| 0 |
P[0], …, P[C-1], macro-P |
| 1 |
R[0], …, R[C-1], macro-R |
| 2 |
F1[0], …, F1[C-1], macro-F1 |
Hints
Sign in to attempt this problem and view the solution.