We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
easy
research
GELU Activation
Implement the GELU (Gaussian Error Linear Unit) activation from “Gaussian Error Linear Units (GELUs)” (Hendrycks & Gimpel, 2016).
GELU is the default activation in BERT, GPT-2, and most modern transformers.
The exact formula is: $$\text{GELU}(x) = x \cdot \Phi(x)$$
where $\Phi(x)$ is the CDF of the standard normal distribution.
Use the commonly-used approximation: $$\text{GELU}(x) \approx 0.5 \cdot x \cdot \left(1 + \tanh\left(\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right)\right)$$
Input: Tensor x of any shape.
Output: Tensor of same shape.
Hints
gelu
activation
hendrycks-gimpel-2016
transformer
Sign in to attempt this problem and view the solution.