![]() |
VOOZH | about |
Manifold learning is a dimensionality reduction techniques which turns complex, high-dimensional data into simpler form while keeping important patterns and features. It works well when the data has curved or non-linear shapes that simple methods like PCA canβt handle. It has several features like:
Scikit-learn provides several manifold learning algorithms. We will use digits dataset from Scikit-learn which has images of numbers from 0 to 9. Each image is 8Γ8 pixels giving 64 values leading to many features in data. It consists of various steps:
t-SNE is an effective method for visualizing high dimensional data by reducing it to 2D or 3D representations. It is based on the concept of probability distributions and tries to minimize the divergence between the pairwise similarities of data points in high-dimensional space and the similarities in low-dimensional space. This results in a 2D or 3D visualization of the data that retains its inherent structure.
Output:
Interpretation of clusters:
Isomap is a dimensionality reduction approach based on the idea of geodesic distance. While mapping data points from a higher-dimensional space to a lower-dimensional space. It attempts to retain the geodesic distance between them.
Output:
Interpretation of clusters:
Locally Linear Embedding (LLE) is a dimensionality reduction method that seeks to preserve the local structure of the data. It works by attempting to map each point to a lower-dimensional space while maintaining its local neighborhood relationships.
Output:
Interpretation of clusters:
Multi-Dimensional Scaling (MDS) is a dimensionality reduction method that attempts to preserve pairwise distances between points while projecting them into a lower-dimensional space. It is particularly useful when you want to retain the pairwise relationships between data points.
Output:
Interpretation of clusters:
Here is the quick comparison of all the methods we learned so far.
| Method | Strengths | Weaknesses | Ideal Use Cases |
|---|---|---|---|
| t-SNE | Excellent for visualization of complex, high-dimensional data. Preserves local structures well. | Computationally expensive, can be slow with large datasets, lacks interpretability. | Visualizing high-dimensional data like image or text datasets. |
| Isomap | Retains geodesic distances, works well with smooth, non-linear manifolds. | Can be slow for large datasets, sensitive to noise. | Data with manifold-like geometry, such as speech data or certain physical phenomena. |
| LLE | Preserves local neighborhood relationships, good for non-linear data. | Struggles with high curvature or data that doesnβt lie on a smooth manifold. | Non-linear data where local relationships are more important than global structure. |
| MDS | Preserves pairwise distances, useful for metric data. | Less effective for non-linear data, computationally intensive. | Metric data where preserving distances between points is crucial |
Manifold learning methods like t-SNE, Isomap, LLE and MDS are tools for reducing the dimensionality of high-dimensional data especially when dealing with non-linear structures. Each method has its strengths and weaknesses and choosing the right technique depend on the characteristics of the data and the specific analysis goals.