We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
medium
research
Sliding Window Attention
Implement Sliding Window Attention from “Longformer: The Long-Document Transformer” (Beltagy et al., 2020).
Instead of attending to all positions, each query only attends to a local window
of w positions on each side (total window size = 2w+1).
Given:
-
Q: shape(seq_len, d_k) -
K: shape(seq_len, d_k) -
V: shape(seq_len, d_k) -
window_size: integer w — attend to positions [i-w, i+w]
For position i, only attend to keys in range [max(0, i-w), min(seq_len-1, i+w)]. Positions outside the window get -infinity before softmax.
Output: Tensor of shape (seq_len, d_k).
Hints
sliding-window
longformer
beltagy-2020
local-attention
efficiency
Sign in to attempt this problem and view the solution.