VOOZH about

URL: https://www.analyticsvidhya.com/blog/2021/04/automate-nlp-tasks-using-evalml-library/

⇱ EvalML Library | Automate NLP Tasks using EvalML Library


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

Automate NLP Tasks using EvalML Library

Ronil Last Updated : 29 Apr, 2021
4 min read

“The quality of your communication shapes the quality of your life.”, with this beautiful line let’s s begin and understand what we will learn in this article. In my one of the article, I have explained how to automate machine learning problem statement using EvalML. In this article we will look at “is it possible to automate NLP task using EvalML?”.

What is EvalML?

It is an AutoML library that builds, optimizes, and evaluates machine learning pipelines using domain-specific objective functions. It actually avoids training and tunning of models by hand, it automates everything. Internally it uses data tables, which are 20x faster than data frames. Internally it constructs multiple pipelines that include state-of-the-art preprocessing, feature engineering, feature selection, and a variety of modeling techniques.

Install from PyPI

pip install evalml

Dataset Description: Here in this article we are using the Spam and Ham Dataset which is available here. This dataset having one dependent feature(Category) and one independent feature(Message). We aim to predict whether the message is spam or ham. A glance at the dataset.

Let’s start the coding part : 

# importing urllib.request to import dataset from hyperlink
from urllib.request import urlopen 
import pandas as pd 
# storing the csv file in variable
input_data = urlopen('https://featurelabs-static.s3.amazonaws.com/spam_text_messages_modified.csv')
# readinng csv file
data = pd.read_csv(input_data) 
data.head()

Finally, we read the data from the CSV file, now we develop a model with the help of EvalML and analyze the best model for our problem statement. Here there is no need to do text featurization manually, all things will be automatically performed by EvalML. Now let’s divide our dataset into dependent and independent features.

# Seperating our dataset into Independent And Dependent Features 
X=data.drop('Category',axis=1) 
y=data['Category']

Let’s check our dataset is balanced or not : 

# Check whether dataset is balanced or not
y.value_counts(normalize=True)

Let’s split our dataset into training and testing part : 

import evalml
# split dataset into training and testing part using evalml, here we have to mention problem_type, what kind of problem actually we are trying to solve. 
X_train,X_test,y_train,y_test = evalml.preprocessing.split_data(X,y,problem_type='binary')

Check out the documentation to learn more about parameters.

Let’s check out the available problem statement in EvalML :

# printing all problem statement available in evalml library.
evalml.problem_types.ProblemTypes.all_problem_types

Let’s perform the Automated EvalML task and check out the best suitable model for our problem statement :

# Search best suitable algorithm for our problem statement
automl = AutoMLSearch(X_train=X_train,y_train=y_train,problem_type='binary',max_batches=1,optimize_thresholds=True)
automl.search()

Finally, we get the best algorithm that is working well with our problem statement. Here it takes all the algorithms which are suitable for our problem statement then creates many pipelines automatically and tunes the parameters as well. Among all of the pipelines, it selects the best one, whose accuracy is more. Here we have to pass some parameters, you can deeply dig deeper into it using the documentation available here.

We can also tune our models based on particular objectives. Here objectives mean parameters used to check the efficiency or accuracy of models such as AUC, F1 score, recall, or Precision. You can play with “objectives” and “additional_objectives” parameters. EvalML will more focus on parameter pass to objectives, than additional_objectives. Don’t be confused, it will just train and tune the model in such a way that we will get optimized value for the parameters available in “objectives” and “additional_objectives”. In this problem statement the best pipeline is :

Best pipeline: Random Forest Classifier w/ Text Featurization Component
Best pipeline Log Loss Binary: 0.154849

Let’s get the best model based on accuracy and various objectives : 

# return a table showing different models based on their ranking
automl.rankings

Let’s get the best pipeline : 

# get best pipeline
best_pipeline = automl.best_pipeline

# Output : 
# GeneratedPipeline(parameters={'Random Forest Classifier':{'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1},})

Let’s describe our best pipeline in details : 

# describe best pipeline in more detail.
automl.describe_pipeline(automl.rankings.iloc[0]["id"])

 

Let’s evaluate test data using the best model : 

# evalute test data
scores = best_pipeline.score(X_test, y_test, objectives=evalml.objectives.get_core_objectives('binary')) 
print(f'Accuracy Binary: {scores["Accuracy Binary"]}')

# OUTPUT :
 Accuracy Binary: 0.9732441471571907 

Finally, save the model in a pickle file, and let’s finish this beautiful journey :

# save the model in model.pkl file.
best_pipeline.save("model.pkl")

I hope this article helps you to understand the basics of EvalML and how to use it to get the best suitable model for our problem statements. I believe that “Knowledge shared is knowledge squared.” So if you have any query don’t hesitate to drop a comment below or you can also connect me on LinkedIn. If you want any suggestions related to Data Science, Deep Learning, or Machine Learning don’t hesitate to ping me on LinkedIn, I will try my best to solve your query.

Login to continue reading and enjoy expert-curated content.

Free Courses

Build a Document Retriever Search Engine with LangChain

​Learn to create a document retrieval search engine using LangChain. ​

Coding a ChatGPT-style Language Model From Scratch in Pytorch

Build a ChatGPT-style language model using PyTorch.

Naive Bayes from Scratch

Master Naïve Bayes for ML: Build classifiers, analyze data, and apply Bayes.

Responses From Readers

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
👁 Av Logo White

Continue your learning for FREE

Forgot your password?
👁 Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

👁 Popup Banner
👁 AI Popup Banner