Implement a single step of vanilla gradient descent.
$$w_{t+1} = w_t - \eta \cdot \nabla w_t$$
Input:
weights: current parameter tensor gradients: gradient tensor (same shape as weights) lr: learning rate $\eta$ Output: Updated weights after one gradient descent step