Likelihood Function

Last Updated : 23 Jul, 2025

The likelihood function is an important concept in statistics and machine learning and forms basis in many key methods such as maximum likelihood estimation (MLE), Bayesian inference and model selection techniques like AIC and BIC. While it is often combined with probability, the likelihood function has a distinct interpretation and serves a unique role in statistical modeling.

Probability vs. Likelihood

Before defining the likelihood function, it is important to differentiate between probability and likelihood:

Probability is used to describe the likelihood of data given parameters, i.e. .
Likelihood treats the data as fixed and the parameters as variable, i.e., it measures .

Mathematical Definition of the Likelihood Function

Let be a random sample from a probability distribution with probability density function (pdf) or probability mass function (pmf) , where θ is an unknown parameter or vector of parameters.

Given a realization , the likelihood function is defined as:

Alternatively, the log-likelihood function is often used for computational convenience:

Example: Coin Toss

Suppose we toss a coin 10 times and get 7 heads.

Let be the probability of heads. The likelihood function is:

This function shows how likely it is to observe 7 heads, depending on the value of . You can plot this to see which value of makes the data most likely (this is called Maximum Likelihood Estimation).

Maximum Likelihood Estimation (MLE)

The most common use of the likelihood function is in maximum likelihood estimation. MLE seeks the parameter value that maximizes the likelihood function:

Or equivalently:

Example: Bernoulli Distribution

Suppose , where is unknown. Then,

Taking derivative and solving for :

Which is simply the sample mean.

Likelihood Function in Continuous Distributions

For continuous distributions, the form of the likelihood function is similar, though the interpretation of the density differs. For instance, for the normal distribution:

Let . Then,

The log-likelihood becomes:

Maximizing with respect to μ and yields the MLEs:

Properties of the Likelihood Function

1. Invariance under Reparameterization: If maximizes , and is a bijective transformation, then maximizes .

2. Not a Probability: Likelihoods do not integrate to 1. It’s a function of parameters, not a probability distribution.

3. Relative Scale Matters: The absolute value of the likelihood is often less important than relative likelihoods across different parameter values.

Likelihood vs Posterior in Bayesian Inference

In Bayesian inference, the likelihood plays a central role in updating beliefs:

Where:

is the prior,
is the posterior,
is the likelihood,
is the marginal likelihood or evidence.

Thus, the likelihood is the bridge between prior and posterior.

Visualization of Likelihood Functions in Python

1D Likelihood Visualization – Bernoulli Distribution

A Bernoulli model for the 1D case (estimating probability p).

Output:

👁 likelihood1

Likelihood Function (Bernoulli)

Inference:

The peak of the likelihood curve occurs at the MLE, .
The narrowness of the curve reflects how confident the estimate is: narrower = more information (larger n).
The shape is unimodal and smooth, indicating a unique MLE.

2D Likelihood Visualization – Normal Distribution in Python

A bivariate normal model for the 2D case (estimating μ and σ).

Output:

👁 likelihood2

Log-likelihood Contour

Inference

The peak of the contour represents the MLE .
The elliptical shape of the contours reflects the curvature of the likelihood function—more circular = lower correlation between parameters.
A flat or elongated shape would indicate parameter uncertainty or collinearity.

Applications of the Likelihood Function

1. Model Fitting

Likelihood-based methods are used to fit models in:

Regression (logistic, Poisson, etc.),
Time series (ARIMA),
Hidden Markov models (HMMs),
Neural networks (via cross-entropy, which is derived from the log-likelihood).

2. Hypothesis Testing

The likelihood ratio test (LRT) compares nested models:

3. Model Selection

Criteria like AIC and BIC are based on the likelihood:

AIC:
BIC:

Where:

= number of parameters,
= number of observations.

Limitations and Challenges

Overfitting: MLE tends to fit noise in small samples.
Non-identifiability: Likelihood may be flat or multimodal.
Computational issues: Numerical instability for large data or complex models.

Comment

Article Tags:

Data Science

ML-Statistics

Statistics

Explore

Introduction to Machine Learning

Python for Machine Learning

Introduction to Statistics

Feature Engineering

Model Evaluation and Tuning

Data Science Practice

Courses

URL: https://www.geeksforgeeks.org/data-science/likelihood-function/

⇱ Likelihood Function - GeeksforGeeks