We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
ALiBi Position Bias
Implement ALiBi (Attention with Linear Biases) from “Train Short, Test Long” (Press et al., 2022).
ALiBi adds a linear position bias to attention scores instead of using position embeddings. For head h with slope m_h, the bias for query position i and key position j is:
$$\text{bias}(i, j) = -m_h \cdot |i - j|$$
The slopes are geometric: $m_h = \frac{1}{2^{8h/H}}$ for head index h (0-based) and H total heads.
Given attention scores S of shape (n_heads, seq_len, seq_len) and n_heads,
add the ALiBi bias to the scores.
Output: Tensor of shape (n_heads, seq_len, seq_len) — biased attention scores.
Hints
Sign in to attempt this problem and view the solution.