VOOZH about

URL: https://www.analyticsvidhya.com/blog/2024/02/scatter-plot-visualization-in-python-using-matplotlib/

⇱ Scatter Plot Visualization in Python using matplotlib


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Scatter Plot Visualization in Python using matplotlib

Himanshu Pathak Last Updated : 08 Feb, 2024
6 min read

Introduction

Scatter plots are a powerful tool in a data scientist’s arsenal, allowing us to visualize the relationship between two variables. This blog will explore the ins and outs of creating stunning scatter Plot Visualization in Python using matplotlib. Scatter plots are invaluable for uncovering patterns, trends, and correlations within datasets, making them an essential component of exploratory data analysis.

Understanding the Basics of Scatter Plots:

Scatter plots are a fundamental visualization technique used to display the relationship between two numerical variables. They are particularly useful for identifying data patterns, trends, and correlations. The Matplotlib library provides a simple and intuitive way to create scatter plots in Python. Let’s dive into the basics of scatter plots and how to use Matplotlib to generate them.

Creating a Simple Scatter Plot

To create a simple scatter plot in Matplotlib, we can use the `scatter` function provided by the library. This function takes two arrays of data points – one for the x-axis and one for the y-axis – and plots them as individual points on the graph. Let’s follow a step-by-step example of creating a basic scatter plot using Matplotlib and Python.

Example

Creating a Scatter plot with IRIS Dataset

import matplotlib.pyplot as plt
# Load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
# Extract data for sepal length and petal length
sepal_length = iris.data[:, 0]
petal_length = iris.data[:, 1]
# Create the scatter plot
plt.scatter(sepal_length, petal_length)
# Add labels, title, and grid
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Petal Length (cm)")
plt.title("Sepal Length vs. Petal Length in Iris Dataset")
plt.grid(True)
# Show the plot
plt.show()

Output

Also read: A Beginner’s Guide to matplotlib for Data Visualization and Exploration in Python.

Customizing Scatter Plot Markers and Colors

One key advantage of using Matplotlib for scatter plots is the ability to customize the appearance of the data points. We can change the markers’ size, shape, and color to convey additional information or enhance the visual appeal of the plot. This section will explore various customization options available in Matplotlib for scatter plots.

Examples

The color is changed to red & markers are changed to β€˜>’.

import matplotlib.pyplot as plt
# Load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
# Extract data for sepal length and petal length
sepal_length = iris.data[:, 0]
petal_length = iris.data[:, 1]
# Color map for different species
# Create the scatter plot with customizations
plt.scatter(
    sepal_length,
    petal_length,
    c='red',  # Map colors based on species label
    s=50,  # Adjust marker size
    alpha=0.7,  # Set transparency
    linewidths=0,  # Remove border around markers (optional)
    marker='>'
)
# Add labels, title, and grid
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Petal Length (cm)")
plt.title("Sepal Length vs. Petal Length in Iris Dataset")
plt.grid(True)
# Show the plot
plt.show()

Output

Different Colors we can use based on:

Named Colors like red, blue, green etc. 

Example

plt.scatter(x, y, c='red')

plt.scatter(x, y, c='blue')

plt.scatter(x, y, c='green')

RGB/RGBA Tuples

Example

plt.scatter(x, y, c=(1, 0, 0))  # Red

plt.scatter(x, y, c=(0, 0, 1))  # Blue

plt.scatter(x, y, c=(0, 1, 0))  # Green

plt.scatter(x, y, c=(1, 0, 0, 0.5))  # Semi-transparent red

Hexadecimal Colors

Example

plt.scatter(x, y, c='#FF0000')  # Red

plt.scatter(x, y, c='#0000FF')  # Blue

plt.scatter(x, y, c='#00FF00')  # Green

Colormaps

Example

plt.scatter(x, y, c=y, cmap='viridis') # Use 'y' values to map colors

plt.scatter(x, y, cmap='inferno')  # Use a specific colormap

Different markers that we can use are

markerdescription
β€œ.”point
β€œ,”pixel
β€œo”circle
β€œv”triangle_down
β€œ^”triangle_up
β€œ<β€œtriangle_left
β€œ>”triangle_right
β€œ1”tri_down
β€œ2”tri_up
β€œ3”tri_left
β€œ4”tri_right
β€œ8”octagon
β€œs”square
β€œp”pentagon
β€œP”plus (filled)
β€œ*”star
β€œh”hexagon1
β€œH”hexagon2
β€œ+”plus
β€œx”x
β€œX”x (filled)
β€œD”diamond
β€œd”thin_diamond
β€œ|”vline
β€œ_”hline
0 (TICKLEFT)tickleft
1 (TICKRIGHT)tickright
2 (TICKUP)tickup
3 (TICKDOWN)tickdown
4 (CARETLEFT)caretleft
5 (CARETRIGHT)caretright
6 (CARETUP)caretup
7 (CARETDOWN)caretdown
8 (CARETLEFTBASE)caretleft (centered at base)
9 (CARETRIGHTBASE)caretright (centered at base)
10 (CARETUPBASE)caretup (centered at base)
11 (CARETDOWNBASE)caretdown (centered at base)

Using colormaps based on specific column values in the dataset

import matplotlib.pyplot as plt
# Load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
# Extract data for sepal length and petal length
sepal_length = iris.data[:, 0]
petal_length = iris.data[:, 1]
# Species labels (encoded numbers)
species = iris.target.astype(int)
# Color map for different species
cmap = plt.cm.get_cmap("viridis")  # Choose a colormap you like
# Create the scatter plot with customizations
plt.scatter(
sepal_length,
petal_length,
c=cmap(species),  # Map colors based on species label
s=50,  # Adjust marker size
alpha=0.7,  # Set transparency
linewidths=0,  # Remove border around markers (optional)
marker='>'
)
# Add labels, title, and grid
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Petal Length (cm)")
plt.title("Sepal Length vs. Petal Length in Iris Dataset (Colored by Species)")
plt.grid(True)
# Colorbar for species mapping (optional)
sm = plt.cm.ScalarMappable(cmap=cmap)
sm.set_array([])
plt.colorbar(sm, label="Species")
# Show the plot
plt.show()

Output

Adding Annotations and Text to Scatter Plots:

Annotations and text labels can provide valuable context and insights when visualizing data with scatter plots. Matplotlib offers a range of features to add annotations, text, and labels to the plot, allowing us to highlight specific data points or convey additional information. Let’s explore how to leverage these features to enhance the interpretability of scatter plots.

Annotating the different species in the above example.

import matplotlib.pyplot as plt
# Load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
# Extract data for sepal length and petal length
sepal_length = iris.data[:, 0]
petal_length = iris.data[:, 1]
# Species labels (encoded numbers)
species = iris.target
# Color map for different species
cmap = plt.cm.get_cmap("viridis")
# Define marker shapes based on species (optional)
markers = ["o", "s", "^"]
# Create the scatter plot with customizations
plt.scatter(
sepal_length,
petal_length,
c=cmap(species),
s=50,
alpha=0.7,
linewidths=0,
marker='o',
)
# Add annotations to specific points (optional)
# Choose data points and text for annotations
annotate_indices = [0, 50, 100]  # Modify these indices as needed
annotate_texts = ["Species 0", "Species 1", "Species 2"]
for i, text in zip(annotate_indices, annotate_texts):
plt.annotate(
     text,
     xy=(sepal_length[i], petal_length[i]),
     xytext=(10, 10),  # Offset for placement
     textcoords="offset points",
     fontsize=8,
     arrowprops=dict(facecolor="red", arrowstyle="->"),
)
# Add a general title or label (optional)
plt.title("Sepal Length vs. Petal Length in Iris Dataset (Colored by Species)")
# Add labels and grid
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Petal Length (cm)")
plt.grid(True)
# Colorbar for species mapping (optional)
sm = plt.cm.ScalarMappable(cmap=cmap)
sm.set_array([])
plt.colorbar(sm, label="Species")
# Show the plot
plt.show()

Output

Also read: Introduction to Matplotlib using Python for Beginners

Handling Multiple Groups in Scatter Plots

In real-world scenarios, we often encounter datasets with multiple groups or categories. Visualizing multiple groups in a single scatter plot can help us compare the relationships between different variables and identify group patterns. Matplotlib provides several techniques to handle multiple groups in scatter plots, such as using different colors or markers for each group.

Example

import matplotlib.pyplot as plt
# Sample data (modify as needed)
groups = ["Group A", "Group B", "Group C"]
x_data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
y_data = [[4, 6, 7], [2, 3, 5], [8, 5, 7]]
# Create the plot
plt.figure(figsize=(8, 6))  # Adjust figure size if needed
# Loop through groups and plot data points
for i, group in enumerate(groups):
plt.scatter(x_data[i], y_data[i], label=group, marker='o', alpha=0.7)
# Add labels, title, and legend
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot with Multiple Groups")
plt.legend()
# Grid (optional)
plt.grid(True)
# Show the plot
plt.show()

Output

Conclusion

In this blog, we’ve delved into the world of scatter plot visualization using the Matplotlib library in Python. We’ve covered the basics of creating simple scatter plots, customizing markers and colors, adding annotations and text, and handling multiple groups. With this knowledge, you’re well-equipped to create scatter plots that effectively communicate insights from your data. 

If you are looking for a Python course online, then explore: Learn Python for Data Science

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner