easy end_to_end

Gradient Clipping

Implement gradient clipping by norm.

Given a list of gradient tensors, compute the global norm and clip if it exceeds max_norm:

  1. Global norm: $\|g\| = \sqrt{\sum_i \sum_j g_{ij}^2}$ (L2 norm of all gradients concatenated)

  2. Clip: If $\|g\| > \text{max\_norm}$, scale all gradients by $\frac{\text{max\_norm}}{\|g\|}$

Input:

  • gradients: a list of tensors (the gradients)
  • max_norm: maximum allowed norm (float)

Output: A dict with:

  • “clipped_gradients”: list of clipped gradient tensors (same shapes)
  • “global_norm”: the original global norm (scalar)

Hints

gradient-clipping optimization training stability
Detecting runtime...