RMSNorm

Implement RMSNorm from “Root Mean Square Layer Normalization” (Zhang & Sennrich, 2019).

RMSNorm is a simpler alternative to LayerNorm used in LLaMA, Gemma, and other modern LLMs. It normalizes by the root mean square without centering:

$$\text{RMSNorm}(x) = \frac{x}{\text{RMS}(x) + \epsilon} \cdot \gamma$$

where $\text{RMS}(x) = \sqrt{\frac{1}{d} \sum_{i=1}^{d} x_i^2}$

Input:

Output: Tensor of shape (batch, d).