We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
hard
framework
Efficient Attention with Masking
Implement causal (autoregressive) scaled dot-product attention using framework APIs.
Apply a causal mask so that position i can only attend to positions <= i.
Masked positions should be filled with -inf before the softmax.
Input:
-
Q: Query tensor of shape(seq_len, d_k) -
K: Key tensor of shape(seq_len, d_k) -
V: Value tensor of shape(seq_len, d_v)
Output: A tensor of shape (seq_len, d_v) — the attention output.
Steps:
-
Compute scores =
Q @ K^T / sqrt(d_k) - Create a causal mask (upper triangle = True)
-
Fill masked positions with
-inf - Apply softmax along the last dimension
- Multiply by V
API Reference:
-
PyTorch:
torch.triu,masked_fill,torch.softmax -
JAX:
jnp.triu,jnp.where,jax.nn.softmax
Hints
attention
causal-mask
torch.triu
jnp.triu
softmax
Sign in to attempt this problem and view the solution.