VOOZH about

URL: https://www.analyticsvidhya.com/blog/2022/01/hugging-face-transformers-pipeline-functions-advanced-nlp/

⇱ Hugging Face Transformers Pipeline Functions | Advanced NLP


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Hugging Face Transformers Pipeline Functions | Advanced NLP

Deepak Last Updated : 05 Jan, 2022
5 min read

This article was published as a part of the Data Science Blogathon.

Objective

This blog post will learn how to use the Hugging face transformers functions to perform prolonged Natural Language Processing tasks.

Prerequisites

Knowledge of Deep Learning and Natural Language Processing (NLP)

Introduction

Transformers was introduced in the paper Attention is all you need; it is an encoder-decoder architecture which means input processed (encoded) by one stack is used by (decoded) by another stack to generate the output. 

There are modifications around the Transforms architecture, using just the encoder stack as in BERT (Bidirectional encoder representation of transformer) or using decoder stack as in GPT (Generative Pre-trained Transformer) Architecture. T5 (Text to text transfer transformer), created by Google, uses both encoder and decoder stack.

Hugging Face Transformers functions provides a pool of pre-trained models to perform various tasks such as vision, text, and audio. Transformers provides APIs to download and experiment with the pre-trained models, and we can even fine-tune them on our datasets.

Why Use Transformers Library?

  • Easy-to-use state-of-the-art models: High performance on Natural Language Understanding(NLU) & Generation(NLG), Computer Vision, and audio tasks
  • Lower compute costs, smaller carbon footprint: Researchers can share trained models instead of retraining.
  • Choose the proper framework for every part of a model’s lifetime: Train state-of-the-art models in 3 lines of code, pick the appropriate framework for training, evaluation, and production.
  • Easily customize a model or an example to our needs: It provides examples for each architecture to reproduce the results published by its original authors.

Source

Transformers Pipeline

Pipelines are the abstraction for the complex code behind the transformers library; It is easiest to use the pre-trained models for inference. It provides easy-to-use pipeline functions for a variety of tasks, including but not limited to, Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction, and Question Answering.

For the machine learning/deep learning experiment, we need to preprocess the data, train the model and write an inference script; in contrast with Pipeline functions, we need to import it and pass our raw data. The Pipeline will preprocess our data in the backend, including tokenization and padding and all the relevant processing steps for the algorithm’s input, and return the output with just a call to it.

We need to install the Transformers library to use these fantastic pipeline functions. Head over to your Jupyter notebook, local or in Google Colab(Preferred).

Install the library using pip

!pip install transformers

Now, Let’s unwrap the magic box and see how it surprise us.

First import the Pipeline from transformers library

from transformers import pipeline

Let’s begin with Sentiment Analysis.

Sentiment Analysis

Sentiment analysis is used to predict the sentiment of the text, whether the text is positive or negative. To perform sentiment analysis using Pipeline, we need to initialize the Pipeline with the ‘sentiment-analysis’ task as follows.

sentimentAnalysis_pipeline = pipeline("sentiment-analysis")
test_sentence = “This is a really good movie. I loved it and will watch it again”
print(sentimentAnalysis_pipeline(test_sentence))


Source: Author

We can even pass a list of sentences, and the Pipeline will return inference for each of the examples in the list.

test_sentence1 = “This is a really good movie. I loved it and will watch it again”
test_sentence2 = “Worst movie i ever saw”
print(sentimentAnalysis_pipeline([test_sentence1,test_sentence2]))

Source: Author

For the first time, the Pipeline will download the underlying model; We can even choose what model we want to use with the model parameter; by default, it uses the ‘distillery-base-uncased-finetuned-sst-2-English model.

See how easy it was; we can even train our model on custom datasets. Check out my blog to know how to fine-tune the BERT model for sentiment analysis tasks.

Have you ever imagined being a writer or poet? Well, if not, the following Pipeline can help you trigger that side.

Let’s build a text generation pipeline.

Text Generation

The model will generate the following N characters given a few words or a sentence.

We need to initialize the Pipeline with the ‘text-generation’ task.

text_gen_pipeline = pipeline('text-generation', model='gpt2')
prompt = 'Before we proceed any further, hear me speak'
text_gen_pipeline(prompt, max_length=60)

Source: Author

By default, it will return a single output of max_length provided. However, we can set the num_return_sequences parameter to output as many sequences as we want.

To learn how to build a Text generation model using LSTM, check out the Github repository.

Now let’s build our last Pipeline for the question-answering task.

Question Answering

Given a text(context) and the question, extract the answer.

For QnA, we need to initialize the Pipeline with the “question-answering” task.

context = '''
Total fees for all services paid by the 
Company and its subsidiaries, on a 
consolidated basis, to statutory auditors 
of the Company and other firms in the 
network entity of which the statutory 
auditors are a part, during the year 
ended March 31, 2021, is 59.73 crore.
During the financial year 2020-21, the 
company issued on private placement 
basis and allotted, Unsecured 
Redeemable Non-Convertible 
Debentures (NCDs) of the face value of 
10,00,000/- (Rupees Ten lakh) each, 
aggregating 24,955 crores in seven 
tranches as per the terms of issue of the 
respective tranches. Further, the third 
tranche of 500 crores was received from 
the holders of partly paid NCDs (Series 
IA). The funds raised through NCDs 
have been utilized for repayment of 
existing borrowings and other purposes 
in the ordinary course of business.
'''
ans = ques_ans_pipeline({'question': 'What is the total fees paid by the company to auditors?',
 'context': f'{context}'})
print(ans)

Source: Author

Excellent, the model has accurately extracted the answer for the provided question. Also, it has returned the offsets, start & end, where the answer appears in the context, and the confidence score indicates how confident the model is in the extracted solution.

The above context is taken from the 2020-2021 Annual Report of Reliance company, link in the reference section. This is just an example; however, it can be used in the financial industry to analyze the long, eye troublings reports by just asking the right questions to the model.

End Notes

We can easily use other pipelines, including text summarization, named entity recognition, language translation, and many more. With this powerful transformers functionality, we can create excellent applications without even going into the coding ground. One of the advantages of using these pre-trained models is that we don’t have to train our models from scratch, which sometimes takes days to prepare on a large volume of data, reducing our resource consumption and ultimately reducing our running cost.

References

🤗 Transformers (huggingface.co)

Pipelines (huggingface.co)

AnnualReport_2020-21.aspx (ril.com)

About Me

I am a Machine Learning Engineer, Solving challenging business problems through data and machine learning. Feel free to connect with me on Linkedin.

Read more blogs on Hugging Face Transformers Functions.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Login to continue reading and enjoy expert-curated content.

Free Courses

Build a Document Retriever Search Engine with LangChain

​Learn to create a document retrieval search engine using LangChain. ​

Coding a ChatGPT-style Language Model From Scratch in Pytorch

Build a ChatGPT-style language model using PyTorch.

Ensemble Learning and Ensemble Learning Techniques

Learn ensemble learning, its techniques, and how it works in this course!

Naive Bayes from Scratch

Master Naïve Bayes for ML: Build classifiers, analyze data, and apply Bayes.

Dimensionality Reduction for Machine Learning

Master key dimensionality reduction techniques for ML success!

Responses From Readers

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
👁 Av Logo White

Continue your learning for FREE

Forgot your password?
👁 Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

👁 Popup Banner
👁 AI Popup Banner