![]() |
VOOZH | about |
Prompt tuning is a technique used to adapt pre-trained language models to downstream tasks without modifying the entire model. Instead of fine-tuning all parameters of the model, prompt tuning focuses on optimizing a small set of learnable tokens. In this article we will learn about them.
Let's understand it step by step:
Foundation of prompt tuning is pre-trained language model. These models are trained on vast amounts of text data and encode general linguistic knowledge. Examples include GPT-3, BERT and T5.
The soft prompts are concatenated with the actual input text before being passed to the model. This creates a composite input sequence where the soft prompts serve as a task-specific prefix. For example:
During training model’s output is compared to ground truth using a loss function like cross-entropy for classification tasks. Then gradients are backpropagated only through the soft prompts leaving rest of the model’s parameters unchanged.
Once the soft prompts are optimized they can be reused for inference on new inputs for the same task. The frozen model generates predictions based on the learned soft prompts, effectively adapting to the task without requiring full fine-tuning.
To better understand prompt tuning let’s break it down mathematically.
Consider a pre-trained LLM where:
To understand the benefits of prompt tuning, let's compare it with fine tuning. See the following schematic that explains the difference between prompt tuning and fine tuning from a mathematical point of view.
Instead of directly feeding the input text x into the model we prepend a learnable prompt p. These embeddings are initialized randomly and optimized during training to guide the model toward the desired task. The final input x′ to the model becomes:
Here are the learnable prompt tokens and is the original input text.
The LLM processes the concatenated input x′ to produce an output . For example if the task is sentiment classification the model might output probabilities for each class:
The model’s output is compared to the true label y using a loss function. For binary classification the cross-entropy loss is commonly used:
For example if y=1 (positive sentiment) and positive=0.7 then the loss is:
The optimization process involves using gradient descent to adjust the learnable prompt p. We update the prompt embeddings p based on the gradients of the loss function with respect to p:
Where:
After enough iterations the prompt p converges to a set of embeddings that guide the model to classify the sentiment of the input text more accurately.
Let’s use a simple python example where we optimize a learnable prompt (p) to guide a model for sentiment classification. The goal is to classify whether a sentence has a positive or negative sentiment
First, we will import necessary libraries like numpy, tenserflow and matplotlib.
Next, we define a simple neural network model using TensorFlow's Keras API. This model will mimic the behavior of a large language model (LLM) for our example.
Now, we prepare the input data and define the learnable prompt (p). The prompt will be optimized during training to guide the model.
We initialize the model, define the loss function and set up the optimizer for training.
We train the model by iteratively adjusting the learnable prompt (p) to minimize the loss.
p) is prepended to the input text embeddings.After training we plot the loss values to see how the model improved over time and print the optimized prompt.
Output:
👁 Screenshot-2025-03-23-122901By following these steps you can implement prompt tuning in python and adapt a pre-trained model for a specific task like sentiment analysis. This approach is lightweight, efficient and preserves the general knowledge of the original model.
Prompt tuning offers several key benefits over traditional fine-tuning:
While prompt tuning offers several advantages, it is not without limitations:
As NLP models continue to grow in size and complexity, techniques like prompt tuning plays a important role in making these models accessible and practical for real-world applications.