We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
medium
research
Relative Position Encoding
Implement relative position encoding from “Self-Attention with Relative Position Representations” (Shaw et al., 2018).
Instead of absolute position embeddings, add a learned bias based on the relative distance between query and key positions.
Given:
-
scores: shape(seq_len, seq_len)— raw attention scores (Q @ K^T / sqrt(d)) -
rel_bias: shape(2*max_dist+1,)— learned bias for relative positions [-max_dist, …, -1, 0, 1, …, max_dist] -
max_dist: integer — maximum relative distance to consider (clamp beyond)
For positions i and j, the relative position is clipped: $$r = \text{clip}(j - i, -\text{max\_dist}, \text{max\_dist})$$ Index into rel_bias: $\text{rel\_bias}[r + \text{max\_dist}]$
Output: Tensor of shape (seq_len, seq_len) — scores with relative position bias added.
Hints
relative-position
shaw-2018
position-encoding
attention
Sign in to attempt this problem and view the solution.