medium framework

Bidirectional RNN

Implement a bidirectional vanilla RNN that runs two passes over a sequence โ€” one forward, one backward โ€” and concatenates their hidden states at every timestep.

Why bidirectional?

A standard (unidirectional) RNN can only see past context: at time step t the hidden state summarises x_0, โ€ฆ, x_t. A bidirectional RNN fixes this by also running a second RNN from t = T-1 down to t = 0. Concatenating the two hidden states gives each output token access to both past and future context โ€” crucial for tasks like named-entity recognition, part-of-speech tagging, and the encoder side of classic seq2seq models.

Cell rule (vanilla RNN)

$$h_t = \tanh\!\left([x_t;\, h_{t-1}]\, W\right)$$

where $[x_t; h_{t-1}]$ is the concatenation of $x_t \in \mathbb{R}^{d_{in}}$ and $h_{t-1} \in \mathbb{R}^{d_h}$, giving a vector of length $d_{in} + d_h$, and $W \in \mathbb{R}^{(d_{in}+d_h) \times d_h}$.

The same rule applies for both directions, each with its own weight matrix (w_fwd and w_bwd).

Algorithm

Forward pass: iterate $t = 0, 1, \ldots, T-1$, maintaining h_fwd. Store each h_fwd_t.

Backward pass: iterate $t = T-1, T-2, \ldots, 0$, maintaining h_bwd. Store each h_bwd_t.

Output: for each timestep $t$, concatenate [h_fwd_t, h_bwd_t] along the last dimension. Return a tensor of shape (N, T, 2 * d_h).

Inputs

  • x: shape (N, T, d_in) โ€” batch of sequences.
  • w_fwd: shape (d_in + d_h, d_h) โ€” forward direction weights.
  • w_bwd: shape (d_in + d_h, d_h) โ€” backward direction weights.
  • h0_fwd: shape (N, d_h) โ€” initial hidden state for forward pass.
  • h0_bwd: shape (N, d_h) โ€” initial hidden state for backward pass.

Output: shape (N, T, 2 * d_h).

No bias (simplifies the interface; a bias can always be baked into the weight matrix by appending a constant feature).

Hints

rnn sequence-model bidirectional

Sign in to attempt this problem and view the solution.