VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/implementing-dbscan-algorithm-using-sklearn/

โ‡ฑ Implementing DBSCAN algorithm using Sklearn - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Implementing DBSCAN algorithm using Sklearn

Last Updated : 1 Apr, 2026

DBSCAN is a clustering algorithm that groups closely packed points and marks low-density points as outliers. It does not require a predefined number of clusters and can detect clusters of arbitrary shapes. Using scikit-learn, it is used to identify clusters and detect noise in data.

  • Identifies core points using eps (distance) and min_samples (minimum neighbors).
  • Expands clusters from core points by including all reachable points within eps distance.
  • Marks points not belonging to any cluster as noise or outliers.

Step By Step Implementation

Here we implement the DBSCAN clustering algorithm on a moon-shaped dataset using Scikit-learn and visualize the results.

Step 1: Import Required Libraries

Import necessary libraries numpy for numerical operations, matplotlib.pyplot for visualization, make_moons to create a sample dataset, DBSCAN for clustering and NearestNeighbors to estimate distances for epsilon.

Step 2: Generate Moon-shaped Dataset

Here we generate a 2D moon-shaped dataset with 5000 points and some noise.

Step 3: Visualize the Dataset

Before clustering, we plot the dataset to understand its structure. Smaller markers and semi-transparency help in visualizing large datasets clearly.

Output:

๐Ÿ‘ Screenshot-2026-03-03-104254
Dataset

Step 4: Plot k-distance Graph for Epsilon Selection

The k-distance graph shows the distance of each point to its k-th nearest neighbor. The โ€œelbowโ€ of this graph helps in selecting the optimal epsilon (eps) for DBSCAN.

The k-distance graph plots each pointโ€™s distance to its k-th nearest neighbor to help choose the optimal epsilon for DBSCAN.

Output:

๐Ÿ‘ Screenshot-2026-03-03-104648
K-Distance Graph

Step 5: Apply DBSCAN Clustering

Here we apply DBSCAN on the dataset with the chosen eps and min_samples. DBSCAN automatically identifies clusters of varying shapes and sizes and labels noise points as -1.

Step 6: Visualize DBSCAN Clustering Results

Plot each cluster with a unique color. Noise points are highlighted in red. Smaller marker size and transparency make the visualization clear for 5000 points.

Output:

๐Ÿ‘ Screenshot-2026-03-03-105540
Result

Step 7: Summary of Clusters

Here we summarize the number of clusters detected by DBSCAN and the points that were classified as noise.

Output:

Number of clusters found: 2

Number of noise points: 0

Download full code from here.

Comment
Article Tags: