We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
medium
research
Cross Attention
Implement cross-attention as used in transformer decoders.
In cross-attention, queries come from the decoder and keys/values come from the encoder. This is the mechanism that allows the decoder to attend to encoder outputs.
Given:
-
Q: shape(tgt_len, d_k)โ decoder queries -
K: shape(src_len, d_k)โ encoder keys -
V: shape(src_len, d_k)โ encoder values
$$\text{CrossAttn}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right) V$$
Note: tgt_len and src_len can be different!
Output: Tensor of shape (tgt_len, d_k).
Hints
cross-attention
decoder
encoder-decoder
transformer
Sign in to attempt this problem and view the solution.