Logistic Regression From Scratch

Implement logistic regression trained by gradient descent from scratch.

Logistic regression is the simplest linear classification model. It outputs class probabilities through a sigmoid-transformed linear combination of features:

$$p_i = \sigma(x_i^\top w) = \frac{1}{1 + e^{-x_i^\top w}}$$

The model is trained by minimising the binary cross-entropy (BCE) loss:

$$\mathcal{L} = -\frac{1}{N}\sum_{i=1}^{N} \left[y_i \log p_i + (1 - y_i) \log(1 - p_i)\right]$$

The gradient of this loss with respect to $w$ has a clean closed form:

$$\nabla_w \mathcal{L} = \frac{1}{N} X^\top (p - y)$$

Each gradient descent step is then:

$$w \leftarrow w - \eta \cdot \nabla_w \mathcal{L}$$

where $\eta$ is the learning rate (lr).

No bias term is fit separately. If you want a bias, augment x by prepending a column of ones — the first weight will act as the intercept.

Algorithm:

Initialise $w = \mathbf{0}_d$
For n_steps iterations:
- $p = \sigma(Xw)$
- $\text{grad} = X^\top (p - y) / N$
- $w \leftarrow w - \text{lr} \cdot \text{grad}$
Return final $$\mathcal{L} = -\frac{1}{N}\sum_{i=1}^{N} \left[y_i \log p_i + (1 - y_i) \log(1 - p_i)\right]$$0

Inputs:

x: feature matrix of shape (N, d) — augment with a column of 1s yourself for a bias term
y: binary labels of shape (N,) — values in $$\mathcal{L} = -\frac{1}{N}\sum_{i=1}^{N} \left[y_i \log p_i + (1 - y_i) \log(1 - p_i)\right]$$1
lr: float — learning rate (step size)
n_steps: int — number of gradient steps

Output: weight vector w of shape (d,).

Logistic Regression From Scratch

Hints