GELU Activation

Implement the GELU (Gaussian Error Linear Unit) activation from “Gaussian Error Linear Units (GELUs)” (Hendrycks & Gimpel, 2016).

GELU is the default activation in BERT, GPT-2, and most modern transformers.

The exact formula is: $$\text{GELU}(x) = x \cdot \Phi(x)$$

where $\Phi(x)$ is the CDF of the standard normal distribution.

Use the commonly-used approximation: $$\text{GELU}(x) \approx 0.5 \cdot x \cdot \left(1 + \tanh\left(\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right)\right)$$

Input: Tensor x of any shape. Output: Tensor of same shape.

Hints