![]() |
VOOZH | about |
Markov Chain Monte Carlo (MCMC) is a method to sample from a probability distribution when direct sampling is hard. It builds a Markov chain that moves step by step, visiting points that follow the target distribution. The more steps taken, the closer the samples get to the true distribution. It is composed of two components- Monte Carlo and Markov Chain. Lets understand them separately.
Monte Carlo Sampling is a technique for sampling a probability distribution and then using those samples to approximate desired quantity, i.e it uses randomness to estimate some deterministic quantity of interest.
Example: To find the area under the curve in figure-1, instead of complex integration we use the Monte Carlo method. We randomly place green dots inside the rectangle to improve accuracy, then find the ratio of dots under the curve to total dots. Multiplying this ratio by the rectangleβs area gives an estimate of the area under the curve.
Lets understand Monte Carlo method Mathematically, suppose we have Expectation (s) to estimate, this might be a highly complex integral or challenging to be estimated whereas using the Monte Carlo method we get the estimated values by taking the average of multiple random samples. Original expectations can be calculated by
whereas the approximated expectation that would be generated by stimulating large samples of f(x) can be achieved by:
Computing the average over a large number of samples could reduce the standard error and give us a fairly accurate approximation.
Markov Chains can be understood as a process of moving step-by-step through states where the choice of the next state depends only on the current state and the probability distribution of possible next states. Let's have a look at Markov Property,
Lets consider a system of 4 states as shown, 'Rain' or 'Car Wash" causing the 'Wet Ground' which is the followed by 'Slip'. Markov property simply makes an assumption that the probability of jumping from one state to the next state depends only on the current state not on the sequence of previous states which lead to this state. Mathematically it is:
It is quite evident from the mathematical equation that the Markov Property assumption could potentially save computational energy and time. If a process exhibits Markov Property then it is known as Markov Chain.
Markov Chain Monte Carlo is widely used in Bayesian inference to approximate posterior distributions that are often hard to compute exactly. Lets understand the challenge of Bayesian Inference. Bayes theorem lets us update our beliefs about unknown parameters by combining prior knowledge with observed data. Mathematically the posterior distribution is proportional to the likelihood multiplied by the prior:
But to get the exact posterior we must divide by the marginal likelihood or evidence:
This marginal probability acts as a normalisation constant to ensure the posterior integrates to one. But calculating this integral is often computationally expensive or impossible for complex models. To avoid directly calculating the normalisation constant, MCMC constructs a Markov chain whose long-run behaviour matches the target posterior distribution. The basic idea is:
Markov Chain Monte Carlo enforces the detailed balance condition so as to guarantee that the chain settles into the target distribution. This condition requires that the flow of probability from one state A to another B equals the flow from B back to A:
Here Ο is the target distribution (posterior) and T (x β y) is the probability of moving from state x to state y.
Suppose we are sampling from distribution p(x) = f(x) / Z where Z is the normalization constant. Our objective is to sample from p(x) in such a way that involves making use of numerator alone and avoids having to estimate denominator. Looking at the proposal probability(g) we will start.
Step-by-step process:
1.Start at an initial state X1 : We choose a starting point.
2.Propose a new state X2 : Generate a candidate state from a proposal distribution like a normal distribution centered at X1.
3.Evaluate the move: Compute the unnormalized density ratio:
And the proposal distribution ratio:
4. Decide to accept or reject: Calculate the acceptance probability:
Accept X2 with this probability otherwise remain at X1.
5.Burn-in period: Discard initial samples until the chain reaches a stationary state.
6. Collect samples: Use the remaining samples to estimate properties of the target distribution.
Replace p(x) with unnormalised target density f(x)/Z (since Z cancels out):
Rearranging:
Using shorthand:
Assuming:
(if the reverse move is always accepted)
Final Acceptance Rule:
And if the proposal distribution is symmetric(like Normal), then:
And the rule simplifies to:
Feature | MCMC | Rejection Sampling | Importance Sampling |
|---|---|---|---|
Scalability to High Dimensions | High- handles complex, high dimensional spaces | Poor- becomes inefficient as dimensions grow | Poor- performance degrades in high dimensions |
Adaptability to Posterior Shape | High-Samples according to target shape. | Low | Moderate- sensitive to choice of proposal distribution |
Flexibility in Model Complexity | High | Limited- not suited for complex models | Moderate |
Computational Efficiency | Moderate- computationally intensive but accurate | Low- many samples are rejected | Moderate- efficiency depends on proposal |