Probabilistic Matrix Factorization

Last Updated : 30 Sep, 2025

Probabilistic Matrix Factorization (PMF) is a collaborative filtering technique used in recommendation systems. Unlike traditional matrix factorization, PMF incorporates probability theory to model uncertainty in user-item interactions. This makes it highly effective for sparse datasets, where only a fraction of the ratings are observed. PMF extends MF by introducing a probabilistic model:

Latent factors are assumed to follow Gaussian distributions.
Observed ratings are modeled as Gaussian distributions around the predicted dot product.
Noise & uncertainty in data are naturally captured.

Matrix Factorization

Matrix Factorization (MF) is a collaborative filtering approach that predicts missing entries in a user-item interaction matrix (R) by decomposing it into two smaller matrices.

: user latent factor matrix.
: item latent factor matrix.
Each user and item is represented as a vector in a latent space.
The dot product predicts the preference of user for item .

Key Concepts

1. Latent Factors

Latent factors represent hidden features of users and items.

Each user is represented by a vector .
Each item is represented by a vector .
Their dot product gives the predicted rating.

Where,

: number of latent dimensions (e.g., genre preferences for movies).
: hidden representation of user i.
: hidden representation of item j.
: predicted rating.

2. Gaussian Priors

PMF assumes that latent factors are drawn from Gaussian (normal) distributions.

Where,

: user latent factor matrix.
: item latent factor matrix.
: variance terms controlling spread.
This prior ensures regularization, keeping factor values from becoming too large.

3. Observed Ratings

Each rating is modeled as a Gaussian centered on the dot product of user and item vectors.

Where,

: observed rating given by user i to item j.
: expected rating (mean of distribution).
: variance (captures noise in user preferences).
This means ratings are not exact but have uncertainty.

4. Objective Function

The goal is to find latent factors that maximize likelihood (fit the observed ratings) while avoiding overfitting.

Where,

First term: squared error between actual and predicted ratings.
Second term: : regularization for users.
Third term: : regularization for items.
: control the strength of regularization.
Minimizing LLL balances accuracy (fit to data) and simplicity (avoid large values).

Implementation

We will implement PMF using Stochastic Gradient Descent (SGD) in NumPy.

Step 1: Import Library

We will import the necessary library such as NumPy.

Step 2: Define PMF Class

We will define the PMF model with parameters for ratings, latent dimensions, learning rate, regularization and training epochs.

Initializes latent factors for users and items.
Uses Stochastic Gradient Descent to update latent factors based on observed ratings.
Computes Mean Squared Error (MSE) at the end of each epoch to track training progress.

predict: estimates a single rating using the dot product of user and item vectors.
compute_mse: calculates error only on observed entries.
full_matrix: reconstructs the entire rating matrix with predicted values.

Step 3: Generate Synthetic Data and Train Model

We will

Creates synthetic latent factors for users and items to simulate a rating dataset.
Adds Gaussian noise to make the data realistic.
Masks some entries to simulate missing values.
Trains the PMF model with Stochastic Gradient Descent.
Reconstructs the full rating matrix, filling in missing values with predictions.

Output:

Applications

Movie Recommendations: Predict user ratings for unseen movies (e.g., Netflix, Movielens).
E-commerce: Suggest products based on previous purchase history.
Music Streaming: Recommend songs/playlists (Spotify, Last.fm).
Social Media: Suggest friends, groups or content.
Online Learning: Recommend courses/resources tailored to learner profiles.

Limitations

Assumes linear interactions between latent factors.
Computationally expensive for large datasets.
Struggles with cold-start problem (new users/items with no history).
Hyperparameter tuning (learning rate, regularization, factors) can be tricky.
May converge to local minima if poorly initialized.

Comment

Article Tags:

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Courses

URL: https://www.geeksforgeeks.org/machine-learning/probabilistic-matrix-factorization/