Bidirectional RNN

Implement a bidirectional vanilla RNN that runs two passes over a sequence — one forward, one backward — and concatenates their hidden states at every timestep.

Why bidirectional?

A standard (unidirectional) RNN can only see past context: at time step t the hidden state summarises x_0, …, x_t. A bidirectional RNN fixes this by also running a second RNN from t = T-1 down to t = 0. Concatenating the two hidden states gives each output token access to both past and future context — crucial for tasks like named-entity recognition, part-of-speech tagging, and the encoder side of classic seq2seq models.

Cell rule (vanilla RNN)

$$h_t = \tanh\!\left([x_t;\, h_{t-1}]\, W\right)$$

where $[x_t; h_{t-1}]$ is the concatenation of $x_t \in \mathbb{R}^{d_{in}}$ and $h_{t-1} \in \mathbb{R}^{d_h}$, giving a vector of length $d_{in} + d_h$, and $W \in \mathbb{R}^{(d_{in}+d_h) \times d_h}$.

The same rule applies for both directions, each with its own weight matrix (w_fwd and w_bwd).

Algorithm

Forward pass: iterate $t = 0, 1, \ldots, T-1$, maintaining h_fwd. Store each h_fwd_t.

Backward pass: iterate $t = T-1, T-2, \ldots, 0$, maintaining h_bwd. Store each h_bwd_t.

Output: for each timestep $t$, concatenate [h_fwd_t, h_bwd_t] along the last dimension. Return a tensor of shape (N, T, 2 * d_h).

Inputs

x: shape (N, T, d_in) — batch of sequences.
w_fwd: shape (d_in + d_h, d_h) — forward direction weights.
w_bwd: shape (d_in + d_h, d_h) — backward direction weights.
h0_fwd: shape (N, d_h) — initial hidden state for forward pass.
h0_bwd: shape (N, d_h) — initial hidden state for backward pass.

Output: shape (N, T, 2 * d_h).

No bias (simplifies the interface; a bias can always be baked into the weight matrix by appending a constant feature).

Why bidirectional?

Cell rule (vanilla RNN)

Algorithm

Inputs

Hints