VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/alzheimers-detection-using-machine-learning/

⇱ Alzheimer's detection Using Machine Learning - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Alzheimer's detection Using Machine Learning

Last Updated : 23 Jul, 2025

Alzheimer's disease is a neurodegenerative disorder characterized by progressive cognitive decline and memory loss. Detecting Alzheimer's disease early is crucial for timely intervention and treatment. While I can provide some general information on Alzheimer's detection, it's important to consult medical professionals for accurate diagnosis and advice.

Alzheimer's detection

Machine learning has shown promise in aiding the detection of Alzheimer's disease. In this article, we will apply RandomForestClassifier and LocalOutlierFactor to detect Alzheimer's disease.

Steps:

  • Launch the Jupyter Notebook or Colab
  • Download the dataset from the given link.
  • Import required libraries.
  • Preprocess the dataset.
  • Perform Outlier detection.
  • Train and Test model.
  • Evaluate the model.
  • You can use visualization techniques (option - if needed)

COL

FULL-FORMS

Group

Nondemented or Demented

EDUC

Years of education

SES

Socioeconomic Status

MMSE

Mini-Mental State Examination

CDR

Clinical Dementia Rating

eTIV

Estimated Total Intracranial Volume

nWBV

Normalize Whole Brain Volume

ASF

Atlas Scaling Factor

The dataset contains 373 rows with 14 fields which contain Group fields with classes Demented and Non-Demented where we need to work on them.

Import required libraries:

We have imported the required packages.

Load the dataset

Output:

 Subject ID MRI ID Group Visit MR Delay M/F Hand Age EDUC \
0 OAS2_0001 OAS2_0001_MR1 Nondemented 1 0 M R 87 14
1 OAS2_0001 OAS2_0001_MR2 Nondemented 2 457 M R 88 14
2 OAS2_0002 OAS2_0002_MR1 Demented 1 0 M R 75 12
3 OAS2_0002 OAS2_0002_MR2 Demented 2 560 M R 76 12
4 OAS2_0002 OAS2_0002_MR3 Demented 3 1895 M R 80 12
SES MMSE CDR eTIV nWBV ASF
0 2.0 27.0 0.0 1987 0.696 0.883
1 2.0 30.0 0.0 2004 0.681 0.876
2 NaN 23.0 0.5 1678 0.736 1.046
3 NaN 28.0 0.5 1738 0.713 1.010
4 NaN 22.0 0.5 1698 0.701 1.034

Check the data info

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 373 entries, 0 to 372
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Subject ID 373 non-null object
1 MRI ID 373 non-null object
2 Group 373 non-null object
3 Visit 373 non-null int64
4 MR Delay 373 non-null int64
5 M/F 373 non-null object
6 Hand 373 non-null object
7 Age 373 non-null int64
8 EDUC 373 non-null int64
9 SES 354 non-null float64
10 MMSE 371 non-null float64
11 CDR 373 non-null float64
12 eTIV 373 non-null int64
13 nWBV 373 non-null float64
14 ASF 373 non-null float64
dtypes: float64(5), int64(5), object(5)
memory usage: 43.8+ KB

Preprocessing the dataset

Dropping irrelevant columns

Output:

(373, 10)

Check Group Features

Output:

Group
Nondemented 190
Demented 146
Converted 37
Name: count, dtype: int64

Replace the "Converted" with "Demented"

Replace Categorical String Data with Numerical Data

  • df['M/F'] = df['M/F'].replace(['F', 'M'], [0, 1]): Replaces the values in the 'M/F' (Male/Female) column, 'F' with 0 and 'M' with 1. This helps to convert categorical values into binary format.
  • df['Group'] = df['Group'].replace(['Demented', 'Nondemented'], [1, 0]): Replaces the values 'Demented' with 1 and 'Nondemented' with 0 in the 'Group' column for binary classification.

Check for Null Values

Output:

Group 0
M/F 0
Age 0
EDUC 0
SES 19
MMSE 2
CDR 0
eTIV 0
nWBV 0
ASF 0
dtype: int64

Imputating the missing values

As we can see that only the SES column has null values. Because SES denotes ranks of Socioeconomic Status, It is a categorical feature. So, We can replace the null values of SES with its mode.

Fill the MMSE null values with its Median

Separate Input Features and Target Variables Group

Y represents the target variable - labels as a Numpy array and X represents the list of selected columns as a matrix which is used for the classification task.

Numerical and Categorical Features

Treating Outliers

outlier_detector = LocalOutlierFactor(contamination=0.05): Creating an instance of the Local Outlier Factor (LOF) algorithm for outlier detection. The contamination parameter is set to 0.05, which means we are expecting 5% outliers.

outliers = outlier_detector.fit_predict(X): This step calculates the outlier scores for each sample. This also gives and assigns a label of -1 for outliers and 1 for inliers.

Standardize and One Hot Encoding

The next two steps represent that we are selecting only inliers for further processing. This step of code creates transformers for categorical numerical and numerical features and helps in training the model consistently.

  • MinMaxScalar helps to scale numerical features.
  • OneHotEncoder is used to encode categorical features.
  • These transformers are merged using a ColumnTransformer.

Visualizing dataset target class details:

Output:

👁 Target class distribution-Geeksforgeeks
Target class distribution

Splitting data for training and testing:

By default, 75% for training and 25% for testing.

Build the Pipeline

Building a pipeline for usage with Random Forest classifier and Preprocessor, then fit the model.

Predictions:

Evaluating the model:

Output:

Accuracy: 0.8651685393258427
Precision: 0.825
Recall: 0.868421052631579
F1 Score: 0.8461538461538461

Confusion Matrix:

Output:

👁 Confusion Matrix-Geeksforgeeks
Confusion Matrix
Comment