We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
hard
end_to_end
Self-Attention Layer
Implement scaled dot-product self-attention.
Given input X of shape (seq_len, d_model):
- Project to queries, keys, values: $Q = X \cdot W_Q$, $K = X \cdot W_K$, $V = X \cdot W_V$
- Compute attention scores: $\text{scores} = \frac{Q K^T}{\sqrt{d_k}}$
- Apply softmax row-wise: $\text{attn} = \text{softmax}(\text{scores})$
- Compute output: $\text{out} = \text{attn} \cdot V$
Input:
-
X: input of shape(seq_len, d_model) -
W_Q,W_K,W_V: projection matrices of shape(d_model, d_k)
Output: Attention output of shape (seq_len, d_k).
Hints
self-attention
transformer
softmax
scaled-dot-product
Sign in to attempt this problem and view the solution.