![]() |
VOOZH | about |
The likelihood function is an important concept in statistics and machine learning and forms basis in many key methods such as maximum likelihood estimation (MLE), Bayesian inference and model selection techniques like AIC and BIC. While it is often combined with probability, the likelihood function has a distinct interpretation and serves a unique role in statistical modeling.
Before defining the likelihood function, it is important to differentiate between probability and likelihood:
Let be a random sample from a probability distribution with probability density function (pdf) or probability mass function (pmf) , where θ is an unknown parameter or vector of parameters.
Given a realization , the likelihood function is defined as:
Alternatively, the log-likelihood function is often used for computational convenience:
Suppose we toss a coin 10 times and get 7 heads.
Let be the probability of heads. The likelihood function is:
This function shows how likely it is to observe 7 heads, depending on the value of . You can plot this to see which value of makes the data most likely (this is called Maximum Likelihood Estimation).
The most common use of the likelihood function is in maximum likelihood estimation. MLE seeks the parameter value that maximizes the likelihood function:
Or equivalently:
Suppose , where is unknown. Then,
Taking derivative and solving for :
Which is simply the sample mean.
For continuous distributions, the form of the likelihood function is similar, though the interpretation of the density differs. For instance, for the normal distribution:
Let . Then,
The log-likelihood becomes:
Maximizing with respect to μ and yields the MLEs:
1. Invariance under Reparameterization: If maximizes , and is a bijective transformation, then maximizes .
2. Not a Probability: Likelihoods do not integrate to 1. It’s a function of parameters, not a probability distribution.
3. Relative Scale Matters: The absolute value of the likelihood is often less important than relative likelihoods across different parameter values.
In Bayesian inference, the likelihood plays a central role in updating beliefs:
Where:
Thus, the likelihood is the bridge between prior and posterior.
A Bernoulli model for the 1D case (estimating probability p).
Output:
A bivariate normal model for the 2D case (estimating μ and σ).
Output:
Likelihood-based methods are used to fit models in:
The likelihood ratio test (LRT) compares nested models:
Criteria like AIC and BIC are based on the likelihood:
Where: