Implement Layer Normalization.
Unlike Batch Normalization which normalizes across the batch, Layer Normalization normalizes across the features for each sample independently:
$$\hat{x}_i = \frac{x_i - \mu_i}{\sqrt{\sigma_i^2 + \epsilon}}$$ $$y_i = \gamma \hat{x}_i + \beta$$
where $\mu_i$ and $\sigma_i^2$ are the mean and variance computed over the feature dimension for each sample.
Input:
x: tensor of shape (N, D) gamma: scale parameter of shape (D,) beta: shift parameter of shape (D,) eps: small constant (default 1e-5)
Output: Normalized tensor of shape (N, D)