Implement Layer Normalization

Implement Layer Normalization.

Unlike Batch Normalization which normalizes across the batch, Layer Normalization normalizes across the features for each sample independently:

$$\hat{x}_i = \frac{x_i - \mu_i}{\sqrt{\sigma_i^2 + \epsilon}}$$ $$y_i = \gamma \hat{x}_i + \beta$$

where $\mu_i$ and $\sigma_i^2$ are the mean and variance computed over the feature dimension for each sample.

Input:

Output: Normalized tensor of shape (N, D)