![]() |
VOOZH | about |
The objective of this project is to classify sonar data to differentiate between rocks and mines using machine learning techniques. Sonar data, collected through sound waves, is processed to detect underwater objects. Machine learning models can analyze this data to predict whether an object is a rock or a mine.
Table of Content
The dataset used in this project is sonar.all-data.csv, which contains sonar signals data collected to distinguish between rocks and mines.
Dataset Link -SonarData
R) or a mine (M).Let's Implement this project stepwise, and classify between Rocks and Mines
Understand the structure and characteristics of the dataset.
Import Libraries and Dataset
Check the Preview of Data
Display the first few rows of the dataset to understand its structure.
Output:
First few rows of the dataset:
0 1 2 3 4 5 6 7 8 ... 52 53 54 55 56 57 58 59 60
0 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 ... 0.0065 0.0159 0.0072 0.0167 0.0180 0.0084 0.0090 0.0032 R
1 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 ... 0.0089 0.0048 0.0094 0.0191 0.0140 0.0049 0.0052 0.0044 R
2 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 ... 0.0166 0.0095 0.0180 0.0244 0.0316 0.0164 0.0095 0.0078 R
3 0.0100 0.0171 0.0623 0.0205 0.0205 0.0368 0.1098 0.1276 0.0598 ... 0.0036 0.0150 0.0085 0.0073 0.0050 0.0044 0.0040 0.0117 R
4 0.0762 0.0666 0.0481 0.0394 0.0590 0.0649 0.1209 0.2467 0.3564 ... 0.0054 0.0105 0.0110 0.0015 0.0072 0.0048 0.0107 0.0094 R
[5 rows x 61 columns]
Check the Dataset Shape
Check the number of rows and columns to get an overview of the dataset’s size.
Output:
Shape of the dataset: (208, 61)View theStatistical Summary
Use describe() to obtain statistical metrics like mean, standard deviation, and percentiles.
Output:
0 1 2 3 4 5 ... 54 55 56 57 58
59
count 208.000000 208.000000 208.000000 208.000000 208.000000 208.000000 ... 208.000000 208.000000 208.000000 208.000000 208.000000 208.000000
mean 0.029164 0.038437 0.043832 0.053892 0.075202 0.104570 ... 0.009290 0.008222 0.007820 0.007949 0.007941 0.006507
std 0.022991 0.032960 0.038428 0.046528 0.055552 0.059105 ... 0.007088 0.005736 0.005785 0.006470 0.006181 0.005031
min 0.001500 0.000600 0.001500 0.005800 0.006700 0.010200 ... 0.000600 0.000400 0.000300 0.000300 0.000100 0.000600
25% 0.013350 0.016450 0.018950 0.024375 0.038050 0.067025 ... 0.004150 0.004400 0.003700 0.003600 0.003675 0.003100
50% 0.022800 0.030800 0.034300 0.044050 0.062500 0.092150 ... 0.007500 0.006850 0.005950 0.005800 0.006400 0.005300
75% 0.035550 0.047950 0.057950 0.064500 0.100275 0.134125 ... 0.012100 0.010575 0.010425 0.010350 0.010325 0.008525
max 0.137100 0.233900 0.305900 0.426400 0.401000 0.382300 ... 0.044700 0.039400 0.035500 0.044000 0.036400 0.043900
Prepare the dataset for machine learning models.
Separate Features and Target
The last column is the target variable (rock or mine), and the rest are features.
Encode Labels
Convert categorical labels ('R' for rock and 'M' for mine) into numerical format.
Split Data
Divide the dataset into training and testing sets to evaluate model performance.
Train and evaluate machine learning models to classify sonar data.
Train kNN Model
Fit kNN models with different neighbor values and record accuracy.
Plot Results
Visualize accuracy for different neighbor values to select the best k.
Output:
Final kNN Model
Train the kNN model with the optimal number of neighbors and make predictions.
Logistic Regression
Fit a logistic regression model to the training data.
Principal Component Analysis (PCA)
Reduce the feature dimensions using PCA and fit a Logistic Regression model to the reduced features.
Support Vector Machines (SVM)
Check the performance of the trained models.
Evaluate kNN
Compute the accuracy of the kNN model on the test set.
Output:
kNN Accuracy: 0.8809523809523809Confusion Matrix
Display the confusion matrix to understand prediction results.
Output:
kNN Confusion Matrix:
[[25 1]
[ 4 12]]
Evaluate Logistic Regression
Compute the accuracy of the logistic regression model.
Output:
Logistic Regression Accuracy: 0.7857142857142857Confusion Matrix
Show the confusion matrix for logistic regression results.
Output:
Logistic Regression Confusion Matrix:
[[19 7]
[ 2 14]]
Evaluate the PCA-based model
Output:
PCA + Logistic Regression Accuracy: 0.7619047619047619
PCA + Logistic Regression Confusion Matrix:
[[18 8]
[ 2 14]]
Evaluate the SVM model
Output:
SVM Accuracy: 0.8571428571428571
SVM Confusion Matrix:
[[22 4]
[ 2 14]]
In this project, the k-Nearest Neighbors (kNN) algorithm demonstrated better performance compared to Logistic Regression in classifying sonar data into rocks and mines. The accuracy of the kNN model was higher, making it a more suitable choice for this task.