Implement mini-batch SGD for linear regression.
Split the data into mini-batches of size batch_size and run one epoch
(one pass through all batches). For each mini-batch, compute gradients and
update weights.
If the data doesn’t divide evenly, the last batch may be smaller.
Use the same gradient formulas as standard SGD:
where $B$ is the current batch size.
Input:
x: shape (N, 1), y: shape (N, 1) w_init: shape (1, 1), b_init: shape (1,) lr: learning rate, batch_size: mini-batch size Output: A dict with final “w”, “b”, and “final_loss” after one epoch.