VOOZH about

URL: https://www.analyticsvidhya.com/blog/2020/12/feature-engineering-feature-improvements-scaling/

⇱ Feature Engineering : Feature Improvements using Scaling


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

Feature Engineering (Feature Improvements – Scaling)

Shanthababu Pandian Last Updated : 25 Oct, 2024
5 min read

Introduction

Data Science Lifecycle revolves around using various analytical methods to produce insights and followed by applying Machine Learning Techniques, to do predictions from the collected data from various sources, through that we could achieve major and innovative objectives, challenges and value added solutions for certain business problem statements. The entire process involves several steps like data cleaning, preparation, modelling, model evaluation, etc.

We can segregate the Data Science Life Cycle in % wise as below in the pie chart., So that we could understand better way, how each stage is playing important roles to build the model for prediction or classification.

Based on the nature of the Data and its source, there might be a few changes in %, So it is not necessarily to be stick with this pattern. Hope you understand my point.

What is called Feature(s) in Data Science/Machine Learning? (Try to understand first)

  1. The FEATURE is nothing but the character or a set of characteristics of a given Dataset.
  2. Simply saying that the given columns/fields/attributes in the given dataset, When these columns/fields/characteristics are converted into some measurable form I mean, as Numeric, then they are called as features.
    1. Mostly Numeric data type is columns/fields/attributes are straight features for the analysis.
    2. Sometimes Characters/String/another non-numeric type also converted into measurable form (Numeric) to analyze the given dataset.

Why we should understand the Feature of the dataset?

Once we understand the nature of the feature engineering process of the given dataset. We are able to extract notable information, insights as mentioned earlier. Absolutely that would help us to use the right algorithms to build the perfect model for the given problem statement and to achieve a successful model.

4W-1H of Feature Engineering

What is Feature Engineering?

  1. It is one of the major processes in the Data Science/Machine Learning life cycle. Here we’re transforming the given data into a reasonable form that is easier to interpret.
  2. Making data more transparent to helping the Machine Learning Model
  3. Creating new features to enhance the model.

Why is Feature Engineering?

  1. NUMBER OF FEATURES significantly could impact the model considerably, So that feature engineering is an important task in the Data Science life cycle.
  2. Certainly, FE is IMPROVING THE PERFORMANCE of machine learning models

Where and When is Feature Engineering?

  1. When we have a LOT OF FEATURES in the given dataset, feature engineering can become quite challenge and interesting
    too.
  2. The number of features could significantly impact the model considerably, So that feature engineering is an important task in Data
    Science life cycle.

Feature Improvements – Scaling

Under Feature Improvements, We are having so many things to discuss, Here I am picking just Scaling since this involves mathematics and statistics.

πŸ‘ Feature scaling

Before applying Machine Learning algorithms to the dataset, We have to carefully understand the magnitude of all key features, which is applicable for feature selection and finding independent and dependent variables. So we have to, scaling them accordingly to accommodate for the analysis and model preparations, the process of adjusting the magnitude of these features is SCALING or Feature Scaling.

Scaling is an important approach that allows us to limit the wide range of variables in the feature under the certain mathematical approach

  1. Standard Scalar
  2. Min-Max Scalar
  3. Robust Scalar

StandardScaler: Standardizes a feature by subtracting the mean and then scaling to unit variance. Unit variance means dividing all the values by the standard
deviation. StandardScaler makes the mean of the distribution 0. About 68% of the values will lie between -1 and 1. 

MinMaxScaler/Normalization: Will transform each value in the column proportionally within the range [0,1].Use this as the first scaler choice to transform a feature, as it will preserve the shape of the dataset (no distortion).

Scaling Process

Robust Scalar: Robust Scalar is specifically to handle the outliers. Since other scaling methods are not supported effectively. This method removes the median and scales the data according to the QUANTILE RANGE, from defaults to IQR: Interquartile Range. (Range Γ¨ 25% – 75%)

The IQR is the range – 1st quartile (25th quantile) and the 3rd quartile (75th quantile). We could see the outliers themselves are still present in the transformed data set.

Code Samples

Standard Scaler

import numpy as np
from sklearn import preprocessing
data1 = np.array([[-100.3],
[27.5],
[0],
[-200.9],
[1000]])
print('Before Scaling\n',data1)
standard_scaler = preprocessing.StandardScaler()
scaled = standard_scaler.fit_transform(data1)
print('\nAfter standard scaler\n',scaled)
Min-Max Scaler
import numpy as np
from sklearn import preprocessing
data1 = np.array([[-100.3],
[27.5],
[0],
[-200.9],
[1000]])
print('Before scalingn',data1)
minmax_scale = preprocessing.MinMaxScaler(feature_range=(1, 2))
scaled = minmax_scale.fit_transform(data1)
print('nAfter Min-Max Scalern',scaled)
Before scaling
 [[-100.3]
 [ 27.5]
 [ 0. ]
 [-200.9]
 [1000. ]]

After Min-Max Scaler
 [[1.08377051]
 [1.19019069]
 [1.1672912 ]
 [1. ]
 [2. ]]
Robust Scaler
import numpy as np
from sklearn import preprocessing
data1 = np.array([[-100.3],
[27.5],
[0],
[-200.9],
[1000]])
print('Before Scalingn',data1)
robust_scaler = preprocessing.RobustScaler()
scaled = robust_scaler.fit_transform(data1)
print('n After Robust Scalern',scaled)
Before Scaling
 [[-100.3]
 [ 27.5]
 [ 0. ]
 [-200.9]
 [1000. ]]

 After Robust Scaler
 [[-0.78482003]
 [ 0.21517997]
 [ 0. ]
 [-1.57198748]
 [ 7.82472613]]

Quick Comparison (StandardScaler MinMaxScaler/Normalization)

πŸ‘ standard scaler vs min max scaler

Feature Engineering itself very vast area, and Feature Improvements, is a subdivision of Feature Engineering and Scaling in a small portion. So try to understand how this topic is very important for Data Scientist and Machine Learning Engineers. Will discuss more in upcoming blogs!

Shanthababu Pandian has over 23 years of IT experience, specializing in data architecting, engineering, analytics, DQ&G, data science, ML, and Gen AI. He holds a BE in electronics and communication engineering and three Master’s degrees (M.Tech, MBA, M.S.) from a prestigious Indian university. He has completed postgraduate programs in AIML from the University of Texas and data science from IIT Guwahati. He is a director of data and AI in London, UK, leading data-driven transformation programs focusing on team building and nurturing AIML and Gen AI. He helps global clients achieve business value through scalable data engineering and AI technologies. He is also a national and international speaker, author, technical reviewer, and blogger.

Login to continue reading and enjoy expert-curated content.

Free Courses

Exploratory Data Analysis with Python & GenAI

Learn EDA with Python: Transform data into insights using PandasAI & more.

Data Science Course

Build a powerful 2026-ready data science resume using AI tools.

No Code Predictive Analytics with Orange

No-code AI course for business pros with real-world ML use cases.

Adaptive Email Agents with DSPy

Build adaptive email agents with DSPy using context and smart learning.

Introduction to AI & ML

AI & ML are transforming industries. Learn their impacts in this course.

Responses From Readers

Mohammed Shaik

Superb .. great effort

Joohnse Komala

Well presented,easy to understand,overall very good material. Worth reading.

Sanjay Mohan

Machine Learning enables the user to customize the products and make the life easier.

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner