We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Train with Adam End-to-End
Train a linear regressor end-to-end using Adam — implemented from
scratch. No optim.Adam; you own every line of the update rule.
The model
Given feature matrix x of shape (N, d) and targets y of shape (N,),
the MSE loss gradient at weights w is:
$$\nabla_w \mathcal{L} = \frac{2}{N} X^\top (Xw - y)$$
Adam update rule (1-indexed step t)
$$m_t = \beta_1 m_{t-1} + (1 - \beta_1)\, g_t$$ $$v_t = \beta_2 v_{t-1} + (1 - \beta_2)\, g_t^2$$ $$\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \quad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}$$ $$w_t = w_{t-1} - \eta\,\frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$$
Bias correction (dividing by 1 - beta^t) counteracts the cold-start
bias when m and v are initialized to zero. It converges to 1 as t grows.
Algorithm
m, v = m0, v0
w = w0
for t in 1 .. n_steps:
grad = (2/N) * x.T @ (x @ w - y)
m = beta1 * m + (1 - beta1) * grad
v = beta2 * v + (1 - beta2) * grad**2
m_hat = m / (1 - beta1**t)
v_hat = v / (1 - beta2**t)
w = w - lr * m_hat / (sqrt(v_hat) + eps)
return concat(w, m, v)
Inputs
-
x: shape(N, d)— feature matrix. -
y: shape(N,)— regression targets. -
w0: shape(d,)— initial weights. -
m0,v0: shape(d,)— initial Adam state (typically zeros). -
lr,beta1,beta2,eps: floats — Adam hyperparameters. -
n_steps: int — number of update steps.
Output
Returns shape (3*d,) — the concatenation of (final_w, final_m, final_v)
flattened. This makes the full optimizer state checkable in a single tensor.
Edge cases
-
n_steps=0: loop never runs; output isconcat(w0, m0, v0). -
lr=0:wnever changes, but m and v still update each step (the Adam state is tracked even when no weight update occurs).
Reference
Kingma & Ba, “Adam: A Method for Stochastic Optimization” (2014).
Hints
Sign in to attempt this problem and view the solution.