hard primitives

Implement Adam Optimizer Step

Implement a single step of the Adam optimizer.

$$m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t$$ $$v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2$$ $$\hat{m}_t = \frac{m_t}{1 - \beta_1^t}$$ $$\hat{v}_t = \frac{v_t}{1 - \beta_2^t}$$ $$w_t = w_{t-1} - \eta \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$$

Input:

  • weights: current parameters
  • gradients: current gradients
  • m: first moment estimate (running mean of gradients)
  • v: second moment estimate (running mean of squared gradients)
  • t: current timestep (integer, starting from 1)
  • lr: learning rate (default 0.001)
  • beta1: first moment decay (default 0.9)
  • beta2: second moment decay (default 0.999)
  • eps: small constant (default 1e-8)

Output: A map with new_weights, new_m, new_v

Hints

optimization adam advanced
Detecting runtime...