Implement gradient clipping by norm.
Given a list of gradient tensors, compute the global norm and clip if it
exceeds max_norm:
Global norm: $\|g\| = \sqrt{\sum_i \sum_j g_{ij}^2}$ (L2 norm of all gradients concatenated)
Clip: If $\|g\| > \text{max\_norm}$, scale all gradients by $\frac{\text{max\_norm}}{\|g\|}$
Input:
gradients: a list of tensors (the gradients) max_norm: maximum allowed norm (float) Output: A dict with: