--- title: "Introduction to PMLE4SCR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to PMLE4SCR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(kableExtra) ``` ## Overview The `PMLE4SCR` package implements two-stage pseudo maximum likelihood estimation (PMLE) for copula-based regression models with semi-competing risks data. In semi-competing risks data, a non-terminal event (e.g., disease relapse) is subject to dependent censoring by a terminal event (e.g., death), but not vice versa. The package provides two main functions: - `PMLE4SCR()`: Two-stage pseudo maximum likelihood estimation - `MLE4SCR()`: Simultaneous maximum likelihood estimation ## Data We illustrate the package using the Bone Marrow Transplant (BMT) dataset from the `SemiCompRisks` package. This dataset contains 137 patients with acute leukemia aged 7 to 52. The non-terminal event is leukemia relapse and the terminal event is death. Three disease groups are considered: AML-low risk, AML-high risk, and ALL (acute lymphoblastic leukemia), with AML-low as the baseline group. ```{r data} library(PMLE4SCR) data(BMT, package = "SemiCompRisks") BMT$g <- factor(BMT$g, levels = c(2, 3, 1), labels = c("AML-low", "AML-high", "ALL")) ``` The key variables in the dataset are described below: ```{r data-table, echo = FALSE} var_desc <- data.frame( Variable = c("T1", "T2", "delta1", "delta2", "g"), Description = c( "Time to death (terminal event)", "Time to leukemia relapse (non-terminal event)", "Death indicator (1 = observed, 0 = censored)", "Relapse indicator (1 = observed, 0 = censored)", "Disease group (AML-low, AML-high, ALL)" ) ) knitr::kable(var_desc, caption = "Table 1: Key variables in the BMT dataset") ``` A preview of the first six observations: ```{r head} head(BMT[, c("T1", "T2", "delta1", "delta2", "g")]) ``` ## Fitting the Model with PMLE We fit a copula-based model using the Clayton copula, with proportional hazards (PH) marginals for both relapse and death. The disease group `g` is included as a covariate for both marginals and the copula parameter. ```{r pmle} myfit <- PMLE4SCR(BMT, time = "T2", death = "T1", status_time = "delta2", status_death = "delta1", T.fmla = ~ g, D.fmla = ~ g, copula.family = "Clayton", copula.control = list(formula = ~ g)) ``` ## Results ### Dependence between relapse and death The estimated copula parameters and corresponding Kendall's $\tau$ values measure the dependence between relapse and death times for each disease group. A larger $\tau$ indicates stronger dependence. ```{r gamma} knitr::kable(myfit$gamma, digits = 3, caption = "Table 2: Estimated copula parameters (PMLE)") ``` ### Regression coefficients for relapse (non-terminal event) The estimated regression coefficients for the marginal distribution of relapse time. Positive values indicate higher risk of relapse compared to the AML-low baseline group. ```{r betaT, echo = FALSE} tab <- myfit$betaT rownames(tab) <- NULL tab$para <- c("AML-high", "ALL") knitr::kable(tab, col.names = c("Group", "Estimate", "SE"), digits = 3, caption = "Table 3: Regression coefficients for relapse time (PMLE)") ``` ### Regression coefficients for death (terminal event) ```{r betaD, echo = FALSE} tab <- myfit$betaD rownames(tab) <- NULL tab$para <- c("AML-high", "ALL") knitr::kable(tab, col.names = c("Group", "Estimate", "SE"), digits = 3, caption = "Table 4: Regression coefficients for death time (PMLE)") ``` ## Comparison: PMLE vs Simultaneous MLE For comparison, we also fit the simultaneous MLE and compare the estimates from both methods side by side. ```{r mle} myfit_mle <- MLE4SCR(BMT, time = "T2", death = "T1", status_time = "delta2", status_death = "delta1", T.fmla = ~ g, D.fmla = ~ g, copula.family = "Clayton", copula.control = list(formula = ~ g)) ``` ```{r comparison, echo = FALSE} comp <- data.frame( Parameter = c("Copula (Intercept)", "Copula (AML-high)", "Copula (ALL)", "Relapse: AML-high", "Relapse: ALL", "Death: AML-high", "Death: ALL"), PMLE_Est = c(myfit$gamma$est, myfit$betaT$est, myfit$betaD$est), PMLE_SE = c(myfit$gamma$se, myfit$betaT$se, myfit$betaD$se), MLE_Est = c(myfit_mle$gamma$est, myfit_mle$betaT$est, myfit_mle$betaD$est), MLE_SE = c(myfit_mle$gamma$se, myfit_mle$betaT$se, myfit_mle$betaD$se) ) knitr::kable(comp, digits = 3, col.names = c("Parameter", "Estimate", "SE", "Estimate", "SE"), caption = "Table 5: Comparison of PMLE and simultaneous MLE estimates") |> kableExtra::add_header_above(c(" " = 1, "PMLE" = 2, "Simultaneous MLE" = 2)) ``` The two methods produce similar estimates. The PMLE is computationally more efficient and robust against copula misspecification for the marginal parameters. ## Reference Arachchige, S. J., Chen, X., and Zhou, Q. M. (2025). Two-stage pseudo maximum likelihood estimation of semiparametric copula-based regression models for semi-competing risks data. *Lifetime Data Analysis*, 31, 52-75.