Latent Semantic Analysis (LSA) is a technique used to find hidden relationships and meanings in text data. It analyzes how words appear in different documents and helps computers understand the context of words instead of only matching exact words.
Converts text into a document-term matrix.
Uses mathematical methods to reduce unnecessary data.
Finds relationships between similar words and documents.
Latent Semantic Analysis (LSA) first creates a Document-Term Matrix (DTM) to show word frequencies in different documents. It then uses Singular Value Decomposition (SVD) to reduce dimensions, remove less important information and find hidden relationships between words and documents based on meaning.
1. Document term matrix
The first step in LSA is creating a Document-Term Matrix (DTM).
Each cell shows how many times a word appears in a document.
Sometimes, TF-IDF scores are used instead of normal word counts to give more importance to meaningful and less common words.
This matrix helps identify patterns and relationships between words and documents.
2. Dimensionality Reduction
After creating the Document-Term Matrix , the matrix becomes very large and sparse. To simplify it, Latent Semantic Analysis uses Singular Value Decomposition (SVD), which breaks the matrix into smaller matrices and keeps only the most important components.