We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
medium
research
SwiGLU Activation
Implement the SwiGLU activation from “GLU Variants Improve Transformer” (Shazeer, 2020).
SwiGLU is used in modern LLMs (PaLM, LLaMA). It splits the input into two halves and applies a gated activation:
$$\text{SwiGLU}(x, W, V, b, c) = \text{Swish}(xW + b) \otimes (xV + c)$$
Where Swish(z) = z * sigmoid(z).
For simplicity, implement the core operation given pre-computed linear projections:
-
gate: shape(batch, d)— the xW+b projection -
value: shape(batch, d)— the xV+c projection
$$\text{SwiGLU}(\text{gate}, \text{value}) = (\text{gate} \cdot \sigma(\text{gate})) \otimes \text{value}$$
Output: Tensor of shape (batch, d).
Hints
swiglu
glu
shazeer-2020
activation
llm
Sign in to attempt this problem and view the solution.