Davies-Bouldin Index

Last Updated : 23 Jul, 2025

Model evaluation is a crucial part of the machine learning process. When it comes to evaluating clustering in machine learning, optimizing the cluster count is a critical component. The Davies-Bouldin Index is a well-known metric to perform the same - it enables one to assess the clustering quality by comparing the average similarity between the pairwise most similar clusters. This article will explore the concepts related to the Davies-Bouldin Index and its implementation in Scikit-Learn.

Clustering

Clustering is a type of unsupervised machine learning that involves arranging/grouping the most similar data points into a fixed number of clusters. It is suitable for finding any structures or patterns within the data without requiring any prior knowledge about the grouping logic. It is very useful in segmentation tasks and finds a wide variety of applications like image compression, anomaly detection, and customer segmentation.

Davies-Bouldin Index

The Davies-Bouldin Index is a validation metric that is used to evaluate clustering models. It is calculated as the average similarity measure of each cluster with the cluster most similar to it. In this context, similarity is defined as the ratio between inter-cluster and intra-cluster distances. As such, this index ranks well-separated clusters with less dispersion as having a better score.

For a dataset , the Davies-Bouldin Index for k number of clusters can be calculated as

where:

is the intracluster distance within the cluster .

is the intercluster distance between the clusters .

The Davies-Bouldin index is very effective compared to other clustering evaluation metrics because:

It is flexible and works for any number of clusters.
It makes no assumptions about the shape of the clusters, unlike Silhouette Score evaluation metric.
It is easy to use and intuitive.

Syntax

The davies_bouldin_score is provided within the sklearn.metrics module of the scikit-learn library. The following syntax is to be followed:

sklearn.metrics.davies_bouldin_score(X, labels)

The method accepts two arguments - X (a list of data points with n features), and labels (a list of predicted labels for each of the n samples).
The method returns a float value representing the Davies-Bouldin score for the given data.

How to calculate Davies-Bouldin Index?

Calculate the average distance between points in each cluster and the centroid of the cluster.
Calculate the distance between the centroids of each cluster and the nearest cluster.
Calculate the ratio of the average distance between points in a cluster and the centroid of the cluster to the distance between the centroids of the cluster and the nearest cluster.
Repeat steps 1-3 for each cluster.
Calculate the average of the ratios for all clusters.

Lower vs. Higher DB Index Values

The Davies-Bouldin Index (DBI) is a metric for evaluating the validity of a clustering solution. It is a relative clustering validity index, meaning that it compares the clustering results to a hypothetical "ideal" clustering. A lower DBI value indicates a better clustering solution.

Higher DB index values correspond to poorer clustering solutions. This is because a higher DBI value indicates that the clusters are not well-separated and/or that the clusters are not compact.
However, a lower DB index value is desirable. It indicates that the clusters are well-separated and compact, which is often a good indication of a successful clustering solution.

Implementation

Below is the Python implementation of DB index using the sklearn library

Scikit-Learn - Scikit-Learn is a popular Python3 library that provides a variety of methods to enable implementation of machine learning techniques like clustering, regression and classification. It provides various model evaluation scores, such as the Davies-Bouldin score, as part of its sklearn.metrics module.

The code performs Agglomerative Hierarchical Clustering (AHC) on the Wine dataset and calculates the Davies-Bouldin Index (DBI) to evaluate the clustering results. It also plots a dendrogram to visualize the hierarchical structure of the data.

Output:

👁 Dendogram -Geeksforgeeks

Implementation 2

The code performs K-Means clustering on the Iris dataset and calculates the Davies-Bouldin Index (DBI) to evaluate the clustering results. It also plots a scatter plot to visualize the clustering results.

Output:

👁 Screenshot-2023-10-31-164703

Implementation 3

The code performs Gaussian Mixture Model (GMM) clustering on a synthetic dataset generated using scikit-learn's make_blobs function. It also calculates the Davies-Bouldin Index (DBI) to evaluate the clustering results and plots a scatter plot to visualize the clustering results.

Output:

👁 Screenshot-2023-10-31-164716

Advantages of the Davies-Bouldin Index

The DBI is a relative metric, which means that it can be used to compare the clustering results of different algorithms on the same dataset.
The DBI has a clear interpretation: a lower DBI value indicates more compact and well-separated clusters.
It is a relatively simple metric to calculate.
It provides a global measure of the quality of a clustering solution.
It is relatively insensitive to the choice of distance metric.

Limitations of the Davies-Bouldin Index

The Davies-Bouldin Index is sensitive to outliers and noise in the data.
The Davies-Bouldin Index does not take into account the structure or distribution of data, such as clusters within clusters or non-linear relationships.
The Davies-Bouldin Index only considers the pairwise distances between cluster centroids and cluster members.
The DBI can be computationally expensive to calculate for large datasets.
The DBI is not suitable for all clustering tasks. For example, it is not well-suited for clustering tasks where the clusters are not well-separated.

Comment

Article Tags:

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Courses

URL: https://www.geeksforgeeks.org/machine-learning/davies-bouldin-index/

⇱ Davies-Bouldin Index - GeeksforGeeks

Davies-Bouldin Index

Clustering

Davies-Bouldin Index

Syntax

How to calculate Davies-Bouldin Index?

Lower vs. Higher DB Index Values

Implementation

👁 Dendogram -Geeksforgeeks

Implementation 2

Implementation 3

Advantages of the Davies-Bouldin Index

Limitations of the Davies-Bouldin Index

Explore