Implement scaled dot-product self-attention.
Given input X of shape (seq_len, d_model):
Input:
X: input of shape (seq_len, d_model) W_Q, W_K, W_V: projection matrices of shape (d_model, d_k)
Output: Attention output of shape (seq_len, d_k).