medium end_to_end

LSTM Cell

Implement one step of an LSTM cell and process a sequence.

LSTM equations for each time step: $$\text{gates} = x_t \cdot W_x + h_{t-1} \cdot W_h + b$$

Split gates into 4 equal parts (each of size hidden_dim):

  • $f = \sigma(\text{gates}[0:H])$ — forget gate
  • $i = \sigma(\text{gates}[H:2H])$ — input gate
  • $g = \tanh(\text{gates}[2H:3H])$ — candidate
  • $o = \sigma(\text{gates}[3H:4H])$ — output gate

Then: $$c_t = f \odot c_{t-1} + i \odot g$$ $$h_t = o \odot \tanh(c_t)$$

Input:

  • x: shape (seq_len, input_dim)
  • W_x: shape (input_dim, 4*hidden_dim)
  • W_h: shape (hidden_dim, 4*hidden_dim)
  • b: shape (4*hidden_dim,)
  • h0: shape (hidden_dim,), c0: shape (hidden_dim,)

Output: Final hidden state h_T of shape (hidden_dim,).

Hints

lstm recurrent gates sequence-model
Detecting runtime...