Train Binary Classifier End-to-End

Train a binary linear classifier end-to-end using sigmoid + binary cross-entropy (BCE) loss and full-batch gradient descent.

The model

Given a feature matrix x of shape (N, d) and binary labels y of shape (N,), the model computes class probabilities via:

$$p_i = \sigma(x_i^\top w) = \frac{1}{1 + e^{-x_i^\top w}}$$

The gradient of the mean BCE loss w.r.t. w has a clean closed form:

$$\nabla_w \mathcal{L} = \frac{1}{N} X^\top (p - y)$$

This is the residual (p - y) back-projected through the feature matrix — one line of code.

w = w0
for epoch in range(n_epochs):
    z    = x @ w
    p    = sigmoid(z)
    grad = (1/N) * x.T @ (p - y)
    w    = w - lr * grad
return w

Note: w0 is passed explicitly as an argument so tests are deterministic — no random initialisation inside the function.

Inputs:

Output: final weight vector w of shape (d,).

Edge cases: lr=0 or n_epochs=0 → return w0 unchanged.