seq2seq Model

Last Updated : 8 Apr, 2026

Sequence‑to‑Sequence (Seq2Seq) models are neural networks designed to transform one sequence into another, even when the input and output lengths differ and are built using encoder‑decoder architecture.

It processes an input sequence and generates a corresponding output sequence.
Handles variable‑length input and output sequences
It is used in NLP, machine translation, speech recognition and time-series prediction.

👁 Seq2Seq-Model

Encoder and Decoder Stack in seq2seq model

Both the input and the output are treated as sequences of varying lengths and the model is composed of two parts:

1. Encoder:

Processes the input sequence token by token.
Encodes the entire sequence into a fixed-length context vector (or a series of hidden states) that summarizes the important information from the input.

2. Decoder:

Takes the context vector as input.
Generates the output sequence one token at a time, predicting each token based on the context vector and previously generated tokens.

The model is commonly used in tasks where there is a need to map sequences of varying lengths such as converting a sentence in one language to another or predicting a sequence of future events based on past data i.e time-series forecasting.

Seq2Seq with RNNs

In the simplest Seq2Seq model RNNs are used in both the encoder and decoder to process sequential data. For a given input sequence , a RNN generates a sequence of outputs through iterative computation based on the following equation:

Here

represents hidden state at time step t
represents input at time step t
and represents the weight matrices
represents hidden state from the previous time step (t-1)
represents the the activation function (commonly tanh for RNN hidden states).
represents output at time step t

Limitations of Vanilla RNNs:

Vanilla RNNs struggle with long-term dependencies due to the vanishing gradient problem.
To overcome this, advanced RNN variants like LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Unit) are used in Seq2Seq models. These architectures are better at capturing long-range dependencies.

How Does the Seq2Seq Model Work?

A Sequence-to-Sequence (Seq2Seq) model consists of two primary phases: encoding the input sequence and decoding it into an output sequence.

1. Encoding the Input Sequence

The encoder processes the input sequence token by token, updating its internal state at each step.
After processing the entire sequence, the encoder produces a context vector i.e a fixed-length representation summarizing the important information from the input.

2. Decoding the Output Sequence

The decoder takes the context vector and generates the output sequence one token at a time. For example, in machine translation:

Input: "I am learning"
Output: "Je suis apprenant"

Each token is predicted based on the context vector and previously generated tokens.

3. Teacher Forcing

During training, teacher forcing is commonly used. Instead of feeding the decoder’s own previous prediction as the next input, the actual target token from the training data is provided.

Benefits:

Accelerates training
Reduces error propagation

Teacher forcing is used only during training and not during inference, where the model relies on its own previous predictions.

Step-by-Step Seq2Seq Implementation

Step 1: Import libraries

We will import pytorch.

Step 2: Encoder

We will define:

Each input token is converted to a dense vector (embedding).
The GRU processes the sequence one token at a time, updating its hidden state.
The final hidden state is returned as the context vector, summarizing the input sequence.

Step 3: Decoder

We will define the decoder:

Takes the current input token and converts it to an embedding.
GRU uses the previous hidden state (or context vector initially) to compute the new hidden state.
The output is passed through a linear layer to get predicted token probabilities.

Step 4: Seq2Seq Model with Teacher Forcing

Batch size & vocab size: extracted from input and decoder.
Encoding: input sequence → encoder → context vector (hidden).
Start token: initialize decoder with token 0.
Loop over max_len:
Decoder predicts next token.
top1 → token with max probability.
Append top1 to outputs.
Teacher forcing: sometimes feed true target token instead of prediction.
Return predictions: concatenated sequence of token IDs.

Step 5: Usage Example with Outputs

Test with example,

src: random input token IDs.
trg: random target token IDs (used for teacher forcing).
outputs: predicted token IDs for each sequence.
.T: transpose to show batch sequences as rows.

Output:

👁 Screenshot-2025-09-29-164719

Output

Applications

Machine Translation: Converts text between languages like English to French.
Text Summarization: Produces concise summaries of documents or news articles.
Speech Recognition: Transcribes spoken language into text.
Image Captioning: Generates captions for images by combining visual features with sequence generation.
Time-Series Prediction: Predicts future sequences based on past temporal data.

Advantages

Flexibility: Handles tasks like translation, summarization and captioning with variable-length sequences.
Handling Sequential Data: Ideal for sequential data like natural language, speech and time series.
Context Awareness: Encoder-decoder structure captures input context effectively.
Attention Mechanism: Focuses on important parts of input, improving performance for long sequences.

Disadvantages

Computationally Expensive: Requires significant resources to train and optimize.
Limited Interpretability: Hard to understand the model's decision-making process.
Overfitting: Prone to overfitting without proper regularization.
Rare Word Handling: Struggles with rare words not seen during training.

Comment

Article Tags:

Machine Learning

python

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Courses

URL: https://www.geeksforgeeks.org/machine-learning/seq2seq-model-in-machine-learning/

⇱ seq2seq Model - GeeksforGeeks