Understanding Long Short-Term Memory (LSTM) Networks
Recurrent Neural Networks (RNNs) are powerful, but they suffer from short-term memory. LSTMs were designed to solve this via a unique internal mechanisms called gates.
In this article, we'll demystify the mathematics behind LSTMs and assume a "first principles" approach to understanding how they process sequences.
The Architecture
Unlike standard feedforward neural networks, LSTMs have feedback connections. They can process not only single data points (such as images), but also entire sequences of data (such as speech or video).
1. The Forget Gate
The first step in our LSTM is to decide what information we're going to throw away from the cell state. This decision is made by a sigmoid layer called the "forget gate layer".
def forget_gate(x, h_prev, W_f, b_f):
# Concatenate input and previous hidden state
concat = np.concatenate((h_prev, x))
# Apply sigmoid
f_t = sigmoid(np.dot(W_f, concat) + b_f)
return f_t
2. The Input Gate
The next step is to decide what new information we're going to store in the cell state. This has two parts. First, a sigmoid layer called the "input gate layer" decides which values we'll update.
Conclusion
LSTMs are a significant step forward in what we can achieve with RNNs. While Transformers have taken over NLP, LSTMs remain crucial for timeseries analysis and distinct sequential tasks.