easy end_to_end

Gradient Clipping

Implement gradient clipping by norm.

Given a list of gradient tensors, compute the global norm and clip if it exceeds max_norm:

Global norm: $\|g\| = \sqrt{\sum_i \sum_j g_{ij}^2}$ (L2 norm of all gradients concatenated)
Clip: If $\|g\| > \text{max\_norm}$, scale all gradients by $\frac{\text{max\_norm}}{\|g\|}$

Input:

gradients: a list of tensors (the gradients)
max_norm: maximum allowed norm (float)

Output: A dict with:

“clipped_gradients”: list of clipped gradient tensors (same shapes)
“global_norm”: the original global norm (scalar)

Hints

gradient-clipping optimization training stability

Sign in to attempt this problem and view the solution.

Detecting runtime...