scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Semiparametric inference of competing risks data with additive hazards and missing cause of failure under MCAR or MAR assumptions

01 Jan 2014-Electronic Journal of Statistics (The Institute of Mathematical Statistics and the Bernoulli Society)-Vol. 8, Iss: 1, pp 41-95
TL;DR: In this article, the authors considered a semiparametric model for lifetime data with competing risks and missing causes of death, and derived estimators of the regression and functional parameters under the missing at random (MAR) mechanism.
Abstract: In this paper, we consider a semiparametric model for lifetime data with competing risks and missing causes of death. We assume that an additive hazards model holds for each cause-specific hazard rate function and that a random right censoring occurs. Our goal is to estimate the regression parameters as well as the functional parameters such as the baseline and cause-specific cumulative hazard rate functions / cumulative incidence functions. We first introduce preliminary estimators of the unknown (Euclidean and functional) parameters when cause of death indicators are missing completely at random (MCAR). These estimators are obtained using the observations with known cause of failure. The advantage of considering the MCAR model is that the information given by the observed lifetimes with unknown failure cause can be used to improve the preliminary estimates in order to attain an asymptotic optimality criterion. This is the main purpose of our work. However, since it is often more realistic to consider a missing at random (MAR) mechanism, we also derive estimators of the regression and functional parameters under the MAR model. We study the large sample properties of our estimators through martingales and empirical process techniques. We also provide a simulation study to compare the behavior of our three types of estimators under the different mechanisms of missingness. It is shown that our improved estimators under MCAR assumption are quite robust if only the MAR assumption holds. Finally, three illustrations on real datasets are also given.
Citations
More filters
Journal ArticleDOI
TL;DR: Simulation studies show that the estimators perform well even in the presence of a large fraction of missing cause of failures, and that the regression coefficient estimator can be substantially more efficient compared to the previously proposed augmented inverse probability weighting estimator.
Abstract: The cause of failure in cohort studies that involve competing risks is frequently incompletely observed. To address this, several methods have been proposed for the semiparametric proportional cause-specific hazards model under a missing at random assumption. However, these proposals provide inference for the regression coefficients only, and do not consider the infinite dimensional parameters, such as the covariate-specific cumulative incidence function. Nevertheless, the latter quantity is essential for risk prediction in modern medicine. In this paper we propose a unified framework for inference about both the regression coefficients of the proportional cause-specific hazards model and the covariate-specific cumulative incidence functions under missing at random cause of failure. Our approach is based on a novel computationally efficient maximum pseudo-partial-likelihood estimation method for the semiparametric proportional cause-specific hazards model. Using modern empirical process theory we derive the asymptotic properties of the proposed estimators for the regression coefficients and the covariate-specific cumulative incidence functions, and provide methodology for constructing simultaneous confidence bands for the latter. Simulation studies show that our estimators perform well even in the presence of a large fraction of missing cause of failures, and that the regression coefficient estimator can be substantially more efficient compared to the previously proposed augmented inverse probability weighting estimator. The method is applied using data from an HIV cohort study and a bladder cancer clinical trial.

6 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a unified framework for inference about both the regression coefficients of the proportional cause-specific hazards model and the covariate-specific cumulative incidence functions under missing at random cause of failure.
Abstract: The cause of failure in cohort studies that involve competing risks is frequently incompletely observed. To address this, several methods have been proposed for the semiparametric proportional cause-specific hazards model under a missing at random assumption. However, these proposals provide inference for the regression coefficients only, and do not consider the infinite dimensional parameters, such as the covariate-specific cumulative incidence function. Nevertheless, the latter quantity is essential for risk prediction in modern medicine. In this paper we propose a unified framework for inference about both the regression coefficients of the proportional cause-specific hazards model and the covariate-specific cumulative incidence functions under missing at random cause of failure. Our approach is based on a novel computationally efficient maximum pseudo-partial-likelihood estimation method for the semiparametric proportional cause-specific hazards model. Using modern empirical process theory we derive the asymptotic properties of the proposed estimators for the regression coefficients and the covariate-specific cumulative incidence functions, and provide methodology for constructing simultaneous confidence bands for the latter. Simulation studies show that our estimators perform well even in the presence of a large fraction of missing cause of failures, and that the regression coefficient estimator can be substantially more efficient compared to the previously proposed augmented inverse probability weighting estimator. The method is applied using data from an HIV cohort study and a bladder cancer clinical trial.

5 citations

Book ChapterDOI
01 Jan 2017
TL;DR: In this article, some statistical inference procedures used when the cause of failure is missing or masked for some units are reviewed.
Abstract: Competing risks data arise when the study units are exposed to several risks at the same time but it is assumed that the eventual failure of a unit is due to only one of these risks, which is called the “cause of failure”. Statistical inference procedures when the time to failure and the cause of failure are observed for each unit are well documented. In some applications, it is possible that the cause of failure is either missing or masked for some units. In this article, we review some statistical inference procedures used when the cause of failure is missing or masked for some units.
Posted Content
TL;DR: In this paper, a maximum partial pseudolikelihood estimator under a missing at random assumption was proposed for population-averaged analysis with clustered competing risks data with informative cluster size and missing causes of failure.
Abstract: Clustered competing risks data are commonly encountered in multicenter studies. The analysis of such data is often complicated due to informative cluster size, a situation where the outcomes under study are associated with the size of the cluster. In addition, cause of failure is frequently incompletely observed in real-world settings. To the best of our knowledge, there is no methodology for population-averaged analysis with clustered competing risks data with informative cluster size and missing causes of failure. To address this problem, we consider the semiparametric marginal proportional cause-specific hazards model and propose a maximum partial pseudolikelihood estimator under a missing at random assumption. To make the latter assumption more plausible in practice, we allow for auxiliary variables that may be related to the probability of missingness. The proposed method does not impose assumptions regarding the within-cluster dependence and allows for informative cluster size. The asymptotic properties of the proposed estimators for both regression coefficients and infinite-dimensional parameters, such as the marginal cumulative incidence functions, are rigorously established. Simulation studies show that the proposed method performs well and that methods that ignore the within-cluster dependence and the informative cluster size lead to invalid inferences. The proposed method is applied to competing risks data from a large multicenter HIV study in sub-Saharan Africa where a significant portion of causes of failure is missing.
References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

Journal ArticleDOI
TL;DR: This article proposes methods for combining estimates of the cause-specific hazard functions under the proportional hazards formulation, but these methods do not allow the analyst to directly assess the effect of a covariate on the marginal probability function.
Abstract: With explanatory covariates, the standard analysis for competing risks data involves modeling the cause-specific hazard functions via a proportional hazards assumption Unfortunately, the cause-specific hazard function does not have a direct interpretation in terms of survival probabilities for the particular failure type In recent years many clinicians have begun using the cumulative incidence function, the marginal failure probabilities for a particular cause, which is intuitively appealing and more easily explained to the nonstatistician The cumulative incidence is especially relevant in cost-effectiveness analyses in which the survival probabilities are needed to determine treatment utility Previously, authors have considered methods for combining estimates of the cause-specific hazard functions under the proportional hazards formulation However, these methods do not allow the analyst to directly assess the effect of a covariate on the marginal probability function In this article we pro

11,109 citations


"Semiparametric inference of competi..." refers background in this paper

  • ...converges weakly in D[0, τ ], when n → +∞, to a zero mean Gaussian process with covariance matrix Σ5(t)ΣW′p,∞(t)Σ T 5 (t), where ΣW′p,∞(t) = 1 α ( Σ (11) W′p,∞ (t) Σ (12) W′p,∞ (t) Σ (12) W′p,∞ (t) Σ (22) W′p,∞ (t) ) ,...

    [...]

  • ... Σ (11) W p+1,∞ (t) Σ (12) W p+1,∞ (t) Σ (12) W p+1,∞ (t) Σ (22) W p+1,∞ (t) ...

    [...]

  • ...On the other hand, Gao and Tsiatis (2005) have considered a linear transformation competing risks model whereas Bakoyannis et al. (2010) have focused on the well-known Fine and Gray (2009) model....

    [...]

Journal ArticleDOI
TL;DR: In this article, it was shown that ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are missing at random and the observed data are observed at random, and then such inferences are generally conditional on the observed pattern of missing data.
Abstract: Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are “missing at random” and the observed data are “observed at random,” and then such inferences are generally conditional on the observed pattern of missing data. Second, ignoring the process that causes missing data when making Bayesian inferences about θ is generally appropriate if and only if the missing data are missing at random and the parameter of the missing data is “independent” of θ. Examples and discussion indicating the implications of these results are included.

8,197 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are missing at random and the observed data are observed at random, and then such inferences are generally conditional on the observed pattern of missing data.
Abstract: Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are “missing at random” and the observed data are “observed at random,” and then such inferences are generally conditional on the observed pattern of missing data. Second, ignoring the process that causes missing data when making Bayesian inferences about θ is generally appropriate if and only if the missing data are missing at random and the parameter of the missing data is “independent” of θ. Examples and discussion indicating the implications of these results are included.

3,914 citations


"Semiparametric inference of competi..." refers background in this paper

  • ...But we only assume a MAR assumption (Rubin, 1976) on the mechanism of missing information on the cause of death (and not a MCAR assumption as in previous sections)....

    [...]

Journal ArticleDOI
TL;DR: In this article, the Cox regression model for censored survival data is extended to a model where covariate processes have a proportional effect on the intensity process of a multivariate counting process, allowing for complicated censoring patterns and time dependent covariates.
Abstract: The Cox regression model for censored survival data specifies that covariates have a proportional effect on the hazard function of the life-time distribution of an individual. In this paper we discuss how this model can be extended to a model where covariate processes have a proportional effect on the intensity process of a multivariate counting process. This permits a statistical regression analysis of the intensity of a recurrent event allowing for complicated censoring patterns and time dependent covariates. Furthermore, this formulation gives rise to proofs with very simple structure using martingale techniques for the asymptotic properties of the estimators from such a model. Finally an example of a statistical analysis is included.

3,719 citations