VOOZH about

URL: https://www.analyticsvidhya.com/blog/2020/08/exploratory-data-analysiseda-from-scratch-in-python/

⇱ Exploratory Data Analysis(EDA) from Scratch | With Pythin Implementation


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

Exploratory Data Analysis(EDA) in Python!

Guest Blog Last Updated : 30 Jul, 2021
6 min read

Introduction

πŸ‘ Exploratory Data Analysis

Exploratory Data Analysis(EDA)

The major topics to be covered are below:

– Handle Missing value
– Removing duplicates
– Outlier Treatment
– Normalizing and Scaling( Numerical Variables)
– Encoding Categorical variables( Dummy Variables)
– Bivariate Analysis

πŸ‘ Box-plot after removing outliers

Box-plot after removing outliers

Handling Duplicate records

Handling Outlier

We Generally identify outliers with the help of boxplot, so here box plot shows some of the data points outside the range of the data.

πŸ‘ Image for post

Box-plot before removing outliers

Bivariate Analysis

  1. Two Categorical Variables

    1. Bar chart
    2. Grouped bar chart
    3. Point plot

πŸ‘ Image for post

Correlation between all the variables

Normalizing and Scaling

ENCODING

About the Author

πŸ‘ Image

Ritika Singh – Data Scientist

I am a Data scientist by profession and a Blogger by passion. I have been working on machine learning projects for more than 2 years. Here you will find articles on β€œMachine Learning, Statistics, Deep Learning, NLP and Artificial Intelligence”.

Login to continue reading and enjoy expert-curated content.

Free Courses

Exploratory Data Analysis with Python & GenAI

Learn EDA with Python: Transform data into insights using PandasAI & more.

Analyzing Data with Power BI

Turn raw data into insights with Power BI - dashboards, reports & more!

Responses From Readers

Why did you treat postal code as a numerical variable? It is not meaningful to represent it that way, since a numerical value for postal code will be misinterpreted by any machine learning algorithm. For example, the postal code "90049" will be matched with a label based on the correlation and the postal code "300" will be matched to the other label since it has a lower value, which is incorrect. It would be better represented as a categorical variable, even if there are many unique observations.

Hi Ritika, Can you pls. help me with the csv file that you used for this tutorial? I would like to use the file to learn the steps taught here.

rohith gaddam

cool and clear its easy to understand tq for the explanation i fall in love with ur blog

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner