![]() |
VOOZH | about |
Linear Discriminant Analysis (LDA) also known as Normal Discriminant Analysis is supervised classification problem that helps separate two or more classes by converting higher-dimensional data space into a lower-dimensional space. It is used to identify a linear combination of features that best separates classes within a dataset.
For example we have two classes that need to be separated efficiently. Each class may have multiple features and using a single feature to classify them may result in overlapping. To solve this LDA is used as it uses multiple features to improve classification accuracy. LDA works by some assumptions and we are required to understand them so that we have a better understanding of its working.
For LDA to perform effectively, certain assumptions are made:
LDA can produce very good results if it meets these assumptions. For example when data points belonging to two classes are plotted, if they are not linearly separable LDA will attempt to find a projection that maximizes class separability.
The image shows classes (black and green) that are not linearly separable. LDA finds a new axis (red dashed line) that maximizes the distance between class means while minimizing within-class variance, improving class separation for better classification.
The perpendicular distance from the decision boundary to the data points shows how LDA reduces within-class variation and increases class separability. The data points are then projected onto the new axis, as shown in the figure below.
This shows how LDA creates a new axis to project the data and separate two classes along a linear path. However, when class distributions share the same mean, LDA cannot find a separating axis and non-linear discriminant analysis is needed.
LDA works by finding directions in the feature space that best separate the classes. It does this by maximizing the difference between the class means while minimizing the spread within each class.
Letβs assume we have two classes with d-dimensional samples such as where:
If β represents a data point its projection onto the line represented by the unit vector is β. Let the means of class and class β before projection be β and respectively. After projection the new means are and β.
Our aim to normalize the difference to maximize the class separation. The scatter for samples of class β is calculated as:
Similarly for class β:
The goal is to maximize the ratio of the between-class scatter to the within-class scatter, which leads us to the following criteria:
For the best separation we calculate the eigenvector corresponding to the highest eigenvalue of the scatter matrices .
We will perform linear discriminant analysis using Scikit-learn library on the Iris dataset.
We import all necessary libraries for data processing, visualization, dimensionality reduction, and modeling
We load the Iris dataset and convert it into a Pandas DataFrame. Features X and target labels y are separated
We scale the features and encode the target labels, then split the dataset into training and testing sets
We create a 3D scatter plot using the first three features to visualize the original data distribution
Output:
We train a Random Forest on the original features (first two features only for visualization) and plot the decision boundary
Output:
We reduce the feature space to 2 components to maximize class separability
We train a Random Forest on LDA-transformed features and plot its decision boundary
Output: