We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Sinusoidal Position Encoding
Why this matters
Self-attention is permutation-equivariant: shuffle the tokens, the output rows shuffle the same way. Without extra signal, the model can’t tell “the cat sat on the mat” from “the mat sat on the cat”. Position encoding fixes this by adding a position-dependent vector to the embeddings before attention.
Vaswani et al. (2017) introduced sinusoidal position encoding: deterministic, parameter-free, generalises to longer sequences than seen in training. Modern models often use learned or RoPE instead, but sinusoidal is the canonical reference.
The formula
For position pos ∈ [0, T) and dimension i ∈ [0, D/2):
PE[pos, 2i] = sin(pos / 10000^(2i/D))
PE[pos, 2i+1] = cos(pos / 10000^(2i/D))
Even-indexed dims get sin, odd-indexed dims get cos. The frequencies
1 / 10000^(2i/D) span a geometric sequence: dim 0 oscillates fast (high
frequency); dim D-1 oscillates slowly (low frequency, near-constant
across small pos).
Why sin/cos pairs?
Two key properties:
-
Bounded magnitude:
|PE[pos, j]| ≤ 1everywhere — no scale issues. -
Linear shift invariance:
PE[pos+k]is a linear function ofPE[pos]. Concretely, for each frequency, the (sin, cos) pair rotates by a fixed angle whenposincreases by k. This lets the model learn relative-position attention via dot products.
Worked example
T, D = 4, 4
pos = jnp.arange(T)[:, None] # (4, 1)
i = jnp.arange(D // 2)[None, :] # (1, 2)
div = 10000.0 ** (2 * i / D) # (1, 2): [1, 100]
angles = pos / div # (4, 2)
PE = jnp.zeros((T, D))
PE = PE.at[:, 0::2].set(jnp.sin(angles)) # even dims: sin
PE = PE.at[:, 1::2].set(jnp.cos(angles)) # odd dims: cos
Row 0 is [sin 0, cos 0, sin 0, cos 0] = [0, 1, 0, 1] — the zero
position is always [0, 1] repeated. Row 1 is
[sin 1, cos 1, sin 0.01, cos 0.01] — the high-freq dims move fast.
Common pitfalls
-
Off-by-one on the exponent: it’s
2i / D, noti / Dor2i / D-1. -
Even vs odd interleaving:
PE[:, 0::2] = sin,PE[:, 1::2] = cos. Some implementations stack[sin..., cos...]instead — that’s a different (but valid) convention. Stick to interleaving here. -
Integer division on
D:2i/Dmust be float arithmetic. Cast early or use2.0 * i / float(D). -
Dnot even: this formulation requiresD % 2 == 0so the sin/cos pairs cover all dims.
Problem
Implement sinusoidal_pos(seed, T, d_model):
-
Cast
T,d_modeltoint. (seedis unused — kept for signature consistency.) -
Build the
(T, D)matrix following the Vaswani formula above. -
Even-indexed dims
0, 2, 4, ...getsin; odd-indexed dims getcos. -
Return the matrix flattened with
.reshape(-1).
Inputs:
-
seed: int (unused). -
T: int — number of positions. -
d_model: int — embedding dim, even.
Output: 1-D array of length T * D.
Hints
Sign in to attempt this problem and view the solution.