![]() |
VOOZH | about |
Survival analysis is a statistical branch that deals with analyzing the expected duration until one or more events happen, such as death in biological organisms or failure in mechanical systems. An important step of survival analysis is estimating the survival function, which gives the probability of survival past at certain point in time. One of the most used non-parametric methods for this purpose is the Kaplan-Meier estimator.
Let be a non-negative random variable representing time until event. The survival function is:
where:
In real-life data, we often encounter censoring, which means we donβt observe the exact survival time for all individuals. The most common type is right-censoring, where a subject has not yet experienced the event by the end of the study or is lost to follow-up.
For example, if a clinical trial ends after 2 years and a patient is still alive at that time, their survival time is said to be censored at 2 years.
The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from censored data. It constructs a step function that drops at each observed event time, incorporating both complete and censored data.
Given event times , with events at time , and individuals at risk just before , the Kaplan-Meier estimate is:
The Kaplan-Meier estimator treats survival as a product of conditional survival probabilities. At each event time, it calculates the probability of survival, considering only those who were still at risk.
The product of these conditional probabilities gives an overall estimate of the survival function, accounting for both events and censored observations. When a subject is censored, they are simply removed from the risk set at the time of censoring, but do not contribute to the event count .
Suppose we observe 5 individuals with the following survival times (in months) and censoring indicators:
Subject | Time | Event (1 = event, 0 = censored) |
|---|---|---|
1 | 2 | 1 |
2 | 3 | 0 |
3 | 4 | 1 |
4 | 5 | 1 |
5 | 6 | 0 |
We observe events at the following ordered times:
At each event time , the number at risk and the number of observed events are:
The Kaplan-Meier estimator is computed recursively by:
We now compute the values step-by-step:
The Kaplan-Meier estimator is a step function defined as:
For the Kaplan-Meier estimator to be valid, the following assumptions must hold:
Output:
The Kaplan-Meier estimator is widely used across disciplines: