hard primitives

Implement Positional Encoding

Implement sinusoidal positional encoding as described in “Attention Is All You Need”.

$$PE_{(pos, 2i)} = \sin\left(\frac{pos}{10000^{2i/d_{model}}}\right)$$ $$PE_{(pos, 2i+1)} = \cos\left(\frac{pos}{10000^{2i/d_{model}}}\right)$$

where $pos$ is the position and $i$ is the dimension index.

Input:

  • seq_len: length of the sequence (integer)
  • d_model: dimension of the model (integer, must be even)

Output: A tensor of shape (seq_len, d_model) with positional encodings

Hints

transformer positional-encoding attention
Detecting runtime...