Mini-Batch Training

Implement mini-batch SGD for linear regression.

Split the data into mini-batches of size batch_size and run one epoch (one pass through all batches). For each mini-batch, compute gradients and update weights.

If the data doesn’t divide evenly, the last batch may be smaller.

Use the same gradient formulas as standard SGD:

$\nabla w = \frac{-2}{B} x_{batch}^T (y_{batch} - \hat{y}_{batch})$
$\nabla b = \frac{-2}{B} \sum (y_{batch} - \hat{y}_{batch})$

where $B$ is the current batch size.

Input:

x: shape (N, 1), y: shape (N, 1)
w_init: shape (1, 1), b_init: shape (1,)
lr: learning rate, batch_size: mini-batch size

Output: A dict with final “w”, “b”, and “final_loss” after one epoch.

Hints