![]() |
VOOZH | about |
cox model in R is the part of Survival modeling and it is important in predictive analytics for defining the time of an event. in this article, we will discuss in detail the Cox model and implementation of the Cox model in R Programming Language.
The Cox Proportional Hazards Model is one of the most powerful statistical methods in survival analysis. Key Feature: An analysis that deals with studying the relationship between either survival time and/or failure time about one or more predictor variables. Key Features:
Now we will repair the data for the Survival Analysis. there are 3 types of the data we used.
R survival package helps in understanding survival comprehensively. Functions for creating a survival object, fitting Cox proportional-hazard models, and plotting the survival curves are included. Herein, we showed the usage of these functions towards the actual performance of specific survival analysis.
Output:
'data.frame': 228 obs. of 10 variables:
$ inst : num 3 3 3 5 1 12 7 11 1 7 ...
$ time : num 306 455 1010 210 883 ...
$ status : num 2 2 1 2 2 1 2 2 2 2 ...
$ age : num 74 68 56 57 60 74 68 71 53 61 ...
$ sex : num 1 1 1 1 1 1 2 2 1 1 ...
$ ph.ecog : num 1 0 0 1 0 1 2 2 1 2 ...
$ ph.karno : num 90 90 90 90 100 50 70 60 70 70 ...
$ pat.karno: num 100 90 90 60 90 80 60 80 80 70 ...
$ meal.cal : num 1175 1225 NA 1150 NA ...
$ wt.loss : num NA 15 15 11 0 0 10 1 16 34 ...
Before fitting the Cox model, data should be prepared and this time-to-event variable with a censoring indicator should be formatted properly.
Output:
inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss surv_obj
1 3 306 2 74 1 1 90 100 1175 NA 306
2 3 455 2 68 1 0 90 90 1225 15 455
3 3 1010 1 56 1 0 90 90 NA 15 1010+
4 5 210 2 57 1 1 90 60 1150 11 210
5 1 883 2 60 1 0 100 90 NA 0 883
6 12 1022 1 74 1 1 50 80 513 0 1022+
Feature engineering encompasses handling missing values, scaling variables, and creating interaction terms.
This code help us in handling missing values, scaling numeric predictor variables, and creating interaction terms in the context of preparing data for Cox proportional hazards modeling.
Fit Cox proportional hazards models using the 'coxph()' function.
Output:
Call:
coxph(formula = surv_obj ~ age + sex + ph.ecog + age_sex, data = lung)
n= 227, number of events= 164
(1 observation deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
age 0.4021 1.4950 0.2507 1.604 0.10876
sex -0.5402 0.5826 0.1684 -3.207 0.00134 **
ph.ecog 0.4920 1.6355 0.1160 4.241 2.22e-05 ***
age_sex -0.2248 0.7986 0.1745 -1.289 0.19752
---
Signif. codes: 0 β***β 0.001 β**β 0.01 β*β 0.05 β.β 0.1 β β 1
exp(coef) exp(-coef) lower .95 upper .95
age 1.4950 0.6689 0.9146 2.4439
sex 0.5826 1.7164 0.4188 0.8105
ph.ecog 1.6355 0.6114 1.3030 2.0530
age_sex 0.7986 1.2521 0.5673 1.1243
Concordance= 0.651 (se = 0.025 )
Likelihood ratio test= 32.14 on 4 df, p=2e-06
Wald test = 32.65 on 4 df, p=1e-06
Score (logrank) test = 33.33 on 4 df, p=1e-06
The summary of the Cox proportional hazards model provides several key pieces of information:
Overall, the significant variables are sex and ph.ecog, indicating these have a meaningful impact on survival in the dataset.
The survival curves can be visualized to understand the point in time at which the survival probabilities occur.
Output:
The plot shows one or more survival curves, depending on the number of strata in the model. Each curve represents the estimated survival probability over time for a group of subjects in the dataset. This plot is a powerful tool for visualizing and understanding the survival dynamics within the dataset, providing insights into how different variables in the Cox model influence survival probabilities.
Although the Cox model is one of the strongest tools for understanding time-to-event data, it finds applications across a wide range of disciplines and, more specifically, in survival analysis. Appropriate data preparation, model fitting, and validation are important components for the search for reliable answers using such methods. Continuous development of statistical methodologies with increasing computational tools, such as R, enhances the applicability and efficiency of survival analysis on most real-world datasets.