Training Loop

Implement a simple SGD training loop for linear regression.

Given initial weight w and bias b, perform n_steps of gradient descent:

For each step:

Forward: $\hat{y} = x \cdot w + b$
Loss: $L = \frac{1}{N} \sum (y - \hat{y})^2$
Gradients: $\frac{\partial L}{\partial w} = \frac{-2}{N} x^T (y - \hat{y})$, $\frac{\partial L}{\partial b} = \frac{-2}{N} \sum (y - \hat{y})$
Update: $w \leftarrow w - lr \cdot \nabla w$, $b \leftarrow b - lr \cdot \nabla b$

Input:

Output: A dict with final “w” (shape (1,1)), “b” (shape (1,)), and “final_loss” (scalar).