← back

Implement Long Short-Term Memory (LSTM) Network

#59 · Deep Learning · Medium

⊣ Solve on deep-ml.com

Problem

Implement a Long Short-Term Memory (LSTM) network from scratch. Given input data, initial hidden and cell states, and weight/bias parameters for the forget, input, candidate, and output gates, compute the forward pass of a single LSTM cell for each time step and return the final hidden state.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def lstm_forward(input_seq, initial_hidden, initial_cell,
                 Wf, Uf, bf,
                 Wi, Ui, bi,
                 Wc, Uc, bc,
                 Wo, Uo, bo):
    h = initial_hidden
    c = initial_cell

    for x in input_seq:
        f = sigmoid(np.dot(Wf, x) + np.dot(Uf, h) + bf)
        i = sigmoid(np.dot(Wi, x) + np.dot(Ui, h) + bi)
        c_hat = np.tanh(np.dot(Wc, x) + np.dot(Uc, h) + bc)
        o = sigmoid(np.dot(Wo, x) + np.dot(Uo, h) + bo)

        c = f * c + i * c_hat
        h = o * np.tanh(c)

    return h

Explanation

  1. Initialize hidden state h and cell state c from provided initial values.
  2. For each time step in the input sequence, compute the four gates:
  3. - Forget gate f: decides what to discard from the cell state.
  4. - Input gate i: decides which new values to store.
  5. - Candidate c_hat: new candidate values via tanh.
  6. - Output gate o: decides what part of the cell state to output.
  7. Update the cell state: c = f * c + i * c_hat.
  8. Compute the new hidden state: h = o * tanh(c).
  9. Return the final hidden state after processing all time steps.

Complexity

  • Time: O(T * n^2) where T is sequence length and n is hidden size (due to matrix multiplications)
  • Space: O(n) for storing hidden and cell states