AUC-ROC

Compute the Area Under the ROC Curve (AUC-ROC) for a binary classifier.

What is the ROC curve?

The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR / Recall) against the False Positive Rate (FPR) as you sweep the classification threshold from 1 down to 0:

TPR = TP / (TP + FN)   (fraction of positives correctly retrieved)
FPR = FP / (FP + TN)   (fraction of negatives incorrectly flagged)

A random classifier follows the diagonal (AUC = 0.5). A perfect classifier hugs the top-left corner (AUC = 1.0). An inverted classifier scores below the diagonal (AUC < 0.5).

AUC interpretation

AUC equals the probability that a randomly drawn positive example scores higher than a randomly drawn negative example:

AUC = P(score(pos) > score(neg))

This makes it threshold-independent and robust to class imbalance — you do not need to pick a decision boundary before computing it.

Rank-based formula (equivalent to trapezoidal AUC)

Instead of integrating the curve, count concordant pairs directly:

AUC = [Σ_{i∈pos, j∈neg} (1 if score_i > score_j, else 0.5 if equal, else 0)]
      / (n_pos × n_neg)

This is identical to the trapezoidal rule applied to the ROC curve and is the same quantity computed by sklearn.metrics.roc_auc_score.

Edge case

If there are no positive examples (n_pos == 0) or no negative examples (n_neg == 0), the ROC curve is undefined — return 0.0.

When to use AUC-ROC

Imbalanced binary classification (spam, fraud, medical diagnosis).
Comparing models without committing to a threshold.
When both TPR and FPR matter equally across all operating points.

Inputs

scores: shape (N,) — binary classifier scores (higher = more positive).
labels: shape (N,) — {0, 1} true labels (delivered as floats).

Output

Scalar AUC in [0, 1].

Hints