VOOZH about

URL: https://www.geeksforgeeks.org/data-science/kaplan-meier-estimator-survival-analysis/

⇱ Kaplan-Meier Estimator (Survival Analysis) - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Kaplan-Meier Estimator (Survival Analysis)

Last Updated : 17 Jun, 2025

Survival analysis is a statistical branch that deals with analyzing the expected duration until one or more events happen, such as death in biological organisms or failure in mechanical systems. An important step of survival analysis is estimating the survival function, which gives the probability of survival past at certain point in time. One of the most used non-parametric methods for this purpose is the Kaplan-Meier estimator.

The Survival Function

Let be a non-negative random variable representing time until event. The survival function is:

where:

  • : A non-negative random variable representing the time until an event occurs (e.g. death, failure).
  • : A specific time point.
  • : The probability that the event has not occurred by time ; i.e., the subject survives beyond time .

The Problem of Censoring

In real-life data, we often encounter censoring, which means we don’t observe the exact survival time for all individuals. The most common type is right-censoring, where a subject has not yet experienced the event by the end of the study or is lost to follow-up.

For example, if a clinical trial ends after 2 years and a patient is still alive at that time, their survival time is said to be censored at 2 years.

The Kaplan-Meier Estimator

The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from censored data. It constructs a step function that drops at each observed event time, incorporating both complete and censored data.

Given event times , with events at time , and individuals at risk just before , the Kaplan-Meier estimate is:

The Kaplan-Meier estimator treats survival as a product of conditional survival probabilities. At each event time, it calculates the probability of survival, considering only those who were still at risk.

The product of these conditional probabilities gives an overall estimate of the survival function, accounting for both events and censored observations. When a subject is censored, they are simply removed from the risk set at the time of censoring, but do not contribute to the event count .

Example Calculation

Suppose we observe 5 individuals with the following survival times (in months) and censoring indicators:

Subject

Time

Event (1 = event, 0 = censored)

1

2

1

2

3

0

3

4

1

4

5

1

5

6

0

Ordered Event Times and Risk Sets

We observe events at the following ordered times:

At each event time , the number at risk and the number of observed events are:

Kaplan-Meier Estimates

The Kaplan-Meier estimator is computed recursively by:

We now compute the values step-by-step:

Survival Function

The Kaplan-Meier estimator is a step function defined as:

Assumptions of Kaplan-Meier Estimator

For the Kaplan-Meier estimator to be valid, the following assumptions must hold:

  1. Independent Censoring: Censored subjects have the same survival prospects as those who continue to be followed.
  2. Events Occur at Recorded Times: Exact times of events are known.
  3. Subjects Are Identically Distributed: The population is homogeneous with respect to survival distribution.

Implementation in Python

Output:

πŸ‘ kaplan
Kaplan-Meier Survival Curve

Applications

The Kaplan-Meier estimator is widely used across disciplines:

  • Medicine: Estimating patient survival, time to recurrence, or drug efficacy.
  • Engineering: Estimating time-to-failure of machines or components.
  • Economics: Modeling unemployment durations.
  • Ecology: Studying animal survival or migration patterns.

Advantages

  • Handles censored data effectively.
  • Does not assume any underlying distribution.
  • Provides a simple visual representation of survival over time.

Limitations

  • Cannot incorporate covariates directly (unlike Cox models).
  • Assumes independence and identical distribution among subjects.
  • Estimates can become unstable in the tail (few individuals at risk).
Comment

Explore