VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/classifying-sonar-data-rocks-vs-mines-using-machine-learning/

⇱ Classifying Sonar Data: Rocks vs. Mines Using Machine Learning - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Classifying Sonar Data: Rocks vs. Mines Using Machine Learning

Last Updated : 23 Jul, 2025

The objective of this project is to classify sonar data to differentiate between rocks and mines using machine learning techniques. Sonar data, collected through sound waves, is processed to detect underwater objects. Machine learning models can analyze this data to predict whether an object is a rock or a mine.

Classification Project: Differentiating Between Rocks and Mines

Dataset Description

The dataset used in this project is sonar.all-data.csv, which contains sonar signals data collected to distinguish between rocks and mines.

Dataset Link -SonarData

  • This data is essential for training machine learning models that can predict whether a sonar reading corresponds to a rock or a mine.
  • The dataset consists of 60 numerical feature columns followed by a class label column.
  • Each feature represents a measurement from the sonar signals, and the class label indicates whether the reading corresponds to a rock (R) or a mine (M).

General Approach

  1. Exploratory Data Analysis (EDA): Understand the structure and characteristics of the dataset.
  2. Data Preparation: Prepare the dataset for machine learning models.
  3. Model Development: Train and evaluate machine learning models to classify sonar data.
  4. Model Evaluation: Check the performance of the trained models.

Let's Implement this project stepwise, and classify between Rocks and Mines

Step 1:Exploratory Data Analysis (EDA)

Understand the structure and characteristics of the dataset.

Import Libraries and Dataset

  • We begin by importing necessary Python libraries such as NumPy, Pandas, and Matplotlib for data manipulation and visualization.
  • Next, we load the sonar dataset into a Pandas DataFrame from a CSV file.

Check the Preview of Data

Display the first few rows of the dataset to understand its structure.

Output:

First few rows of the dataset:
0 1 2 3 4 5 6 7 8 ... 52 53 54 55 56 57 58 59 60
0 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 ... 0.0065 0.0159 0.0072 0.0167 0.0180 0.0084 0.0090 0.0032 R
1 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 ... 0.0089 0.0048 0.0094 0.0191 0.0140 0.0049 0.0052 0.0044 R
2 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 ... 0.0166 0.0095 0.0180 0.0244 0.0316 0.0164 0.0095 0.0078 R
3 0.0100 0.0171 0.0623 0.0205 0.0205 0.0368 0.1098 0.1276 0.0598 ... 0.0036 0.0150 0.0085 0.0073 0.0050 0.0044 0.0040 0.0117 R
4 0.0762 0.0666 0.0481 0.0394 0.0590 0.0649 0.1209 0.2467 0.3564 ... 0.0054 0.0105 0.0110 0.0015 0.0072 0.0048 0.0107 0.0094 R

[5 rows x 61 columns]

Check the Dataset Shape

Check the number of rows and columns to get an overview of the dataset’s size.

Output:

Shape of the dataset: (208, 61)

View theStatistical Summary

Use describe() to obtain statistical metrics like mean, standard deviation, and percentiles.

Output:

 0 1 2 3 4 5 ... 54 55 56 57 58 
59
count 208.000000 208.000000 208.000000 208.000000 208.000000 208.000000 ... 208.000000 208.000000 208.000000 208.000000 208.000000 208.000000
mean 0.029164 0.038437 0.043832 0.053892 0.075202 0.104570 ... 0.009290 0.008222 0.007820 0.007949 0.007941 0.006507
std 0.022991 0.032960 0.038428 0.046528 0.055552 0.059105 ... 0.007088 0.005736 0.005785 0.006470 0.006181 0.005031
min 0.001500 0.000600 0.001500 0.005800 0.006700 0.010200 ... 0.000600 0.000400 0.000300 0.000300 0.000100 0.000600
25% 0.013350 0.016450 0.018950 0.024375 0.038050 0.067025 ... 0.004150 0.004400 0.003700 0.003600 0.003675 0.003100
50% 0.022800 0.030800 0.034300 0.044050 0.062500 0.092150 ... 0.007500 0.006850 0.005950 0.005800 0.006400 0.005300
75% 0.035550 0.047950 0.057950 0.064500 0.100275 0.134125 ... 0.012100 0.010575 0.010425 0.010350 0.010325 0.008525
max 0.137100 0.233900 0.305900 0.426400 0.401000 0.382300 ... 0.044700 0.039400 0.035500 0.044000 0.036400 0.043900

Step 2: Data Preparation

Prepare the dataset for machine learning models.

Separate Features and Target

The last column is the target variable (rock or mine), and the rest are features.

Encode Labels

Convert categorical labels ('R' for rock and 'M' for mine) into numerical format.

Split Data

Divide the dataset into training and testing sets to evaluate model performance.

Step 3: Model Development

Train and evaluate machine learning models to classify sonar data.

k-Nearest Neighbors (kNN)

  • Initialize Variables: Define a range of neighbor values to test and prepare to store accuracy results.

Train kNN Model

Fit kNN models with different neighbor values and record accuracy.

Plot Results

Visualize accuracy for different neighbor values to select the best k.

Output:

👁 Figure_1
Plot the K-NN

Final kNN Model

Train the kNN model with the optimal number of neighbors and make predictions.

Logistic Regression

Fit a logistic regression model to the training data.

Principal Component Analysis (PCA)

Reduce the feature dimensions using PCA and fit a Logistic Regression model to the reduced features.

Support Vector Machines (SVM)

Step 4. Model Evaluation

Check the performance of the trained models.

Evaluate kNN

Compute the accuracy of the kNN model on the test set.

Output:

kNN Accuracy: 0.8809523809523809

Confusion Matrix

Display the confusion matrix to understand prediction results.

Output:

kNN Confusion Matrix:
[[25 1]
[ 4 12]]

Evaluate Logistic Regression

Compute the accuracy of the logistic regression model.

Output:

Logistic Regression Accuracy: 0.7857142857142857

Confusion Matrix

Show the confusion matrix for logistic regression results.

Output:

Logistic Regression Confusion Matrix:
[[19 7]
[ 2 14]]

Evaluate the PCA-based model

Output:

PCA + Logistic Regression Accuracy: 0.7619047619047619
PCA + Logistic Regression Confusion Matrix:
[[18 8]
[ 2 14]]

Evaluate the SVM model

Output:

SVM Accuracy: 0.8571428571428571
SVM Confusion Matrix:
[[22 4]
[ 2 14]]

Conclusion

In this project, the k-Nearest Neighbors (kNN) algorithm demonstrated better performance compared to Logistic Regression in classifying sonar data into rocks and mines. The accuracy of the kNN model was higher, making it a more suitable choice for this task.

Comment