Probabilistic Matrix Factorization (PMF) is a collaborative filtering technique used in recommendation systems. Unlike traditional matrix factorization, PMF incorporates probability theory to model uncertainty in user-item interactions. This makes it highly effective for sparse datasets, where only a fraction of the ratings are observed. PMF extends MF by introducing a probabilistic model:
- Latent factors are assumed to follow Gaussian distributions.
- Observed ratings are modeled as Gaussian distributions around the predicted dot product.
- Noise & uncertainty in data are naturally captured.
Matrix Factorization
Matrix Factorization (MF) is a collaborative filtering approach that predicts missing entries in a user-item interaction matrix (R) by decomposing it into two smaller matrices.
- : user latent factor matrix.
- : item latent factor matrix.
- Each user and item is represented as a vector in a latent space.
- The dot product predicts the preference of user for item .
Key Concepts
1. Latent Factors
Latent factors represent hidden features of users and items.
- Each user is represented by a vector .
- Each item is represented by a vector .
- Their dot product gives the predicted rating.
Where,
- : number of latent dimensions (e.g., genre preferences for movies).
- : hidden representation of user i.
- : hidden representation of item j.
- : predicted rating.
2. Gaussian Priors
PMF assumes that latent factors are drawn from Gaussian (normal) distributions.
Where,
- : user latent factor matrix.
- : item latent factor matrix.
- : variance terms controlling spread.
- This prior ensures regularization, keeping factor values from becoming too large.
3. Observed Ratings
Each rating is modeled as a Gaussian centered on the dot product of user and item vectors.
Where,
- : observed rating given by user i to item j.
- : expected rating (mean of distribution).
- : variance (captures noise in user preferences).
- This means ratings are not exact but have uncertainty.
4. Objective Function
The goal is to find latent factors that maximize likelihood (fit the observed ratings) while avoiding overfitting.
Where,
- First term: squared error between actual and predicted ratings.
- Second term: : regularization for users.
- Third term: : regularization for items.
- : control the strength of regularization.
- Minimizing LLL balances accuracy (fit to data) and simplicity (avoid large values).
Implementation
We will implement PMF using Stochastic Gradient Descent (SGD) in NumPy.
Step 1: Import Library
We will import the necessary library such as NumPy.
Step 2: Define PMF Class
We will define the PMF model with parameters for ratings, latent dimensions, learning rate, regularization and training epochs.
- Initializes latent factors for users and items.
- Uses Stochastic Gradient Descent to update latent factors based on observed ratings.
- Computes Mean Squared Error (MSE) at the end of each epoch to track training progress.
- predict: estimates a single rating using the dot product of user and item vectors.
- compute_mse: calculates error only on observed entries.
- full_matrix: reconstructs the entire rating matrix with predicted values.
Step 3: Generate Synthetic Data and Train Model
We will
- Creates synthetic latent factors for users and items to simulate a rating dataset.
- Adds Gaussian noise to make the data realistic.
- Masks some entries to simulate missing values.
- Trains the PMF model with Stochastic Gradient Descent.
- Reconstructs the full rating matrix, filling in missing values with predictions.
Output:
Applications
- Movie Recommendations: Predict user ratings for unseen movies (e.g., Netflix, Movielens).
- E-commerce: Suggest products based on previous purchase history.
- Music Streaming: Recommend songs/playlists (Spotify, Last.fm).
- Social Media: Suggest friends, groups or content.
- Online Learning: Recommend courses/resources tailored to learner profiles.
Limitations
- Assumes linear interactions between latent factors.
- Computationally expensive for large datasets.
- Struggles with cold-start problem (new users/items with no history).
- Hyperparameter tuning (learning rate, regularization, factors) can be tricky.
- May converge to local minima if poorly initialized.