Long Short Term Memory (LSTM) Networks using PyTorch

Last Updated : 9 Oct, 2025

Long Short-Term Memory (LSTM) networks are a special type of Recurrent Neural Network (RNN) designed to address the vanishing gradient problem, which makes it difficult for traditional RNNs to learn long-term dependencies in sequential data.

👁 Long-Short-Term-Memory-LSTM-Networks-using-PyTorch

LSTM Networks using PyTorch

LSTMs use memory cells controlled by three gates:

Input Gate: decides what new information should be stored.
Forget Gate: decides what information should be discarded.
Output Gate: decides what information to output at each step.

This structure allows LSTMs to remember useful information for long periods while ignoring irrelevant details. In this article, we will learn how to implement an LSTM in PyTorch for sequence prediction on synthetic sine wave data.

Long Short-Term Memory (LSTM) Networks using PyTorch

LSTMs are widely used for sequence modeling tasks because of their ability to capture long-term dependencies. PyTorch provides a clean and flexible API to build and train LSTM models. In PyTorch, the nn.LSTM module handles the recurrence logic, while the rest of the architecture (such as fully connected layers, dropout, etc.) can be customized as needed.

Key Components

1. Input Size: Number of features in the input sequence at each time step.

2. Hidden Size: Number of features in the hidden state.

3. Number of Layers: Stacking multiple LSTM layers deepens the model.

4. Batch First: If set to True, input/output tensors are provided as (batch, seq_len, features) instead of (seq_len, batch, features).

5. Outputs:

Output Sequence: Hidden states at each time step.
Hidden State: Final hidden state for all layers.
Cell State: Final memory cell state for all layers.

Implementation

Let's implement LSTM network using PyTorch,

Step 1: Import Libraries and Prepare Data

We first import the necessary libraries such as torch, numpy and matplotlib and create a sine wave dataset. The data is split into input sequences of length 10, where the model predicts the next value.

np.linspace(): generates evenly spaced points.
np.sin(): creates sine values.
create_sequences(): prepares input-output pairs.
torch.tensor(): converts NumPy arrays into PyTorch tensors.

Step 2: Define the LSTM Model

We define an LSTM model using PyTorch’s nn.Module.

nn.LSTM: processes sequential data.
nn.Linear: maps hidden state outputs to predictions.
forward(): runs the data through LSTM + Fully Connected layer.

Step 3: Initialize Model, Loss Function, and Optimizer

Model: 1 input, 100 hidden units, 1 LSTM layer, 1 output.
Loss Function: Mean Squared Error (MSE) for regression.
Optimizer: Adam optimizer for efficient training.

Step 4: Train the LSTM Model

We train the model for 100 epochs.

Forward pass: model makes predictions.
Loss calculation: compare predicted vs. actual values.
Backpropagation: update weights.
Detach hidden states: prevent gradient buildup.

Output:

👁 training

Training

Step 5: Evaluate and Plot Predictions

We evaluate model using model.eval() and get the predicted outputs.

Output:

👁 plot

Plot

Applications

Natural Language Processing (NLP): Machine translation, text generation, sentiment analysis, and speech-to-text.
Time-Series Forecasting: Stock price prediction, weather forecasting, energy demand forecasting.
Healthcare: Patient monitoring (heart rate, ECG), disease progression modeling, medical event prediction.
Finance: Credit risk analysis, fraud detection, algorithmic trading.
Speech & Audio Processing: Speech recognition, voice assistants, music generation.
Anomaly Detection: Detecting unusual patterns in IoT sensors, cybersecurity logs, or industrial equipment.

Advantages

Easy Debugging: Dynamic computation graphs allow native Python debugging.
Flexible Architecture: Works well with varying input lengths.
Balanced API: Provides both high- and low-level control.
Strong Backing: Maintained by Meta with frequent updates.
Active Community: Large ecosystem of tutorials and examples.

Limitations

Less Mature than TensorFlow: Fewer enterprise-level tools.
Fewer Advanced Resources: Limited high-level tutorials for LSTMs.
Manual Optimization: Requires tuning for best performance.
Version Gaps: API changes may affect older code.

Comment

Article Tags:

Explore

Basics

Neural Networks

Deep Learning Models

Model Evaluation

Deep Learning Frameworks

Projects

Courses

URL: https://www.geeksforgeeks.org/deep-learning/long-short-term-memory-networks-using-pytorch/

⇱ Long Short Term Memory (LSTM) Networks using PyTorch - GeeksforGeeks