We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
hard
primitives
Implement Scaled Dot-Product Attention
Implement the Scaled Dot-Product Attention mechanism from “Attention Is All You Need”.
$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$
Input:
-
Q: query tensor of shape(seq_len_q, d_k) -
K: key tensor of shape(seq_len_k, d_k) -
V: value tensor of shape(seq_len_k, d_v)
Output: Attention output of shape (seq_len_q, d_v)
Note: $d_k$ is the dimension of the key vectors (last dimension of Q and K).
Hints
attention
transformer
self-attention
Sign in to attempt this problem and view the solution.