Ridge Regression

Implement ridge regression — L2-regularized least squares — in closed form.

Ridge regression minimises the regularised objective:

$$\hat{w} = \arg\min_w \|y - Xw\|^2 + \lambda \|w\|^2$$

The closed-form solution adds a scaled identity to $X^\top X$ before inverting:

$$\hat{w} = (X^\top X + \lambda I)^{-1} X^\top y$$

Why ridge?

Prevents overfitting by penalising large weights (bias toward zero).
Fixes the singular-matrix problem when $N < d$ (underdetermined) or when columns of $X$ are highly correlated — the $\lambda I$ term always makes the system positive-definite and therefore uniquely solvable.

Algorithm:

Let $d$ = number of features (x.shape[1]).
Build the $d \times d$ matrix $A = X^\top X + \lambda I$.
Build the $d$-dim vector $X^\top X$0.
Solve $X^\top X$1 for $X^\top X$2 using torch.linalg.solve (do not explicitly invert $X^\top X$3 — solve is more numerically stable and faster).

Numerical note: use torch.linalg.solve(A, b) / jnp.linalg.solve(A, b), not torch.linalg.inv(A) @ b. Both are mathematically equivalent but solve avoids forming the explicit inverse.

Inputs:

x: feature matrix of shape (N, d)
y: target vector of shape (N,)
lam: float — regularisation strength ($X^\top X$4)

Output: weight vector w of shape (d,).

Hints