medium primitives

Implement Momentum Update

Implement a single step of SGD with momentum.

$$v_{t+1} = \mu \cdot v_t + \nabla w_t$$ $$w_{t+1} = w_t - \eta \cdot v_{t+1}$$

where $\mu$ is the momentum coefficient and $\eta$ is the learning rate.

Input:

  • weights: current parameters
  • gradients: current gradients
  • velocity: current velocity (momentum buffer)
  • lr: learning rate
  • momentum: momentum coefficient

Output: A map/tuple with new_weights and new_velocity

Hints

optimization momentum sgd
Detecting runtime...