medium primitives

Implement Layer Normalization

Implement Layer Normalization.

Unlike Batch Normalization which normalizes across the batch, Layer Normalization normalizes across the features for each sample independently:

$$\hat{x}_i = \frac{x_i - \mu_i}{\sqrt{\sigma_i^2 + \epsilon}}$$ $$y_i = \gamma \hat{x}_i + \beta$$

where $\mu_i$ and $\sigma_i^2$ are the mean and variance computed over the feature dimension for each sample.

Input:

  • x: tensor of shape (N, D)
  • gamma: scale parameter of shape (D,)
  • beta: shift parameter of shape (D,)
  • eps: small constant (default 1e-5)

Output: Normalized tensor of shape (N, D)

Hints

normalization transformer neural-network
Detecting runtime...