Implement RMSNorm from “Root Mean Square Layer Normalization” (Zhang & Sennrich, 2019).
RMSNorm is a simpler alternative to LayerNorm used in LLaMA, Gemma, and other modern LLMs. It normalizes by the root mean square without centering:
$$\text{RMSNorm}(x) = \frac{x}{\text{RMS}(x) + \epsilon} \cdot \gamma$$
where $\text{RMS}(x) = \sqrt{\frac{1}{d} \sum_{i=1}^{d} x_i^2}$
Input:
x: shape (batch, d) gamma: shape (d,) — learnable scale parameter eps: float, small constant (default 1e-6)
Output: Tensor of shape (batch, d).