scispace - formally typeset

Journal ArticleDOI

Regression analysis of competing risks data with general missing pattern in failure types

01 Mar 2016-Statistical Methodology (Elsevier)-Vol. 29, pp 18-31

Abstract: In competing risks data, missing failure types (causes) is a very common phenomenon. In a general missing pattern, if a failure type is not observed, one observes a set of possible types containing the true type along with the failure time. Dewanji and Sengupta (2003) considered nonparametric estimation of the cause-specific hazard rates and suggested a Nelson–Aalen type estimator under such general missing pattern. In this work, we deal with the regression problem, in which the cause-specific hazard rates may depend on some covariates, and consider estimation of the regression coefficients and the cause-specific baseline hazards under the general missing pattern using some semi-parametric models. We consider two different proportional hazards type semi-parametric models for our analysis. Simulation studies from both the models are carried out to investigate the finite sample properties of the estimators. We also consider an example from an animal experiment to illustrate our methodology.
Topics: Missing data (63%), Nelson–Aalen estimator (58%), Proportional hazards model (56%), Regression analysis (52%)

Summary (3 min read)

1 Introduction

  • Competing risks models are usually employed to analyse such type of data.
  • Due to inadequacy on the diagnostic mechanism one often uncertain about the true failure type or it is reluctant to reveal the value of J for certain individuals.
  • In carcinogenicity studies, death of individuals can be classified into death s with tumour or deaths due to other reasons.
  • See Dinse(1986) and Lagakos and Louis (1988) for details.
  • They have suggested a two-stage modeling in which the first stage involves reducing the number of parameters by imposing a natural structure on the underlying disease types and the second stage involves inference through a general extension of the partial likelihood based estimating equation (See Goetghebeur and Ryan, 1995).

2 Data and Models

  • 1. Note that these hazard rates in (3) can also be viewed as another set of cause-specific hazard rates with the different g’s in G representing the different failure types.
  • This may be a strong assumption; but the authors relax this to some extent in the second model.
  • The baseline cause-specific hazard rates, λ0j(t)’s, are now assumed to be proportional to each other under this model with eγj ’s being the proportionality constants.
  • Both the models can be independently tested from standard competing risks data without any missingness.
  • Under the general missing pattern as discussed, none of these can be tested.

3 Estimation under Model 1

  • Also, these cause-specific hazard rates for the ‘modified’ competing risks problem are of the same semi-parametric form as those for the original cause-specific hazard rates in (4).
  • Hence, the following partial likelihood is the most appropriate to estimate the regression parameters β, in the absence of any knowledge on the baseline cause-specific hazards λ∗0g(t), based on ‘modified’ competing risks data which is available without any missing failure type (Kalbfleisch and Prentice, 1980, Sec. 7.2.3).

3.1 Estimation of regression parameters

  • Then, at each of these g-type failure times, say t(gi), the authors consider the conditional probability that the individual (gi) with covariate value z(gi) fails at time t(gi), given the history up to time t(gi)− and that one failure with missing pattern g occurs at time t(gi).
  • The method of partial likelihood then gives LP1(β) = ∏ g∈G kg∏ i=1 [ λ∗g(t(gi), z(gi))∑ l∈R(t(gi)) λ ∗ g(t(gi), zl) ] = ∏ g∈G kg∏ i=1 [ ez(gi)β∑ l∈R(t(gi)) e zlβ ] , (7) where R(t(gi)) is the risk set at time t(gi)− and zl denotes the covariate value for the lth individual.
  • Clearly, this partial likelihood (7) can accommodate tied failure times with different missing pattern, but an approximation may be needed to deal with tied failure times with the same missing pattern.
  • Note that the standard asymptotic likelihood techniques can be applied to this partial likelihood (7) and to the estimate β̂ to make inference on β.
  • See Andersen and Gill (1982), Andersen and Borgan (1985) and Andersen et al. (1993, Ch. VII.2).

3.2 Estimation of baseline cumulative cause-specific hazards

  • The baseline cumulative cause-specific hazards Λ∗0g(t) = ∫ t 0 λ∗0g(u)du for the ‘modified’ competing risks problem can also be estimated as follows.
  • Under the same set of regularity conditions, as required for the asymptotic normality of β̂, the process( Λ̂∗0g(t)−.
  • There have been some works concerning estimation of these masking probabilities usually requiring either additional modeling assumptions or secondary data.
  • In practice, one can use “pooling-the-adjacent-violators” algorithm to achieve monotonicity.
  • If some of the {Ng(t)}’s are not observed to have any jump during the study, the corresponding Λ̂∗0g(t)’s, and the associated entries of Σ̂(t), turn out to be zero; the corresponding rows of P are also estimated, as given in Dewanji and Sengupta (2003), to be zero.

4 Estimation under Model 2

  • These have the similar semi-parametric form as those for the original cause-specific hazard rates in (5), except that the parametric component fg(z, t, θ ∼ ), for different g’s, are not of the simple exponential form.
  • Nevertheless, from Kalbfleisch and Prentice (1980, Sec. 7.2.3), the partial likelihood in (13) is the most appropriate to estimate the vector of regression parameters θ ∼ , in the absence of any knowledge on the baseline cause-specific hazards λ0(t), based on the ‘modified’ competing risks data which is available without any missingness.

4.1 Estimation of regression parameters

  • Then, at each of these failure times, say t(i), the authors consider the conditional probability that the individual (i) with covariate value z(i) fails at time t(i) with missing pattern g(i), given the history up to time t(i)− and that one failure occurs at time t(i).
  • See Dewanji (1992) for a special case of this partial likelihood.
  • An approximation may be needed to deal with tied failure times.
  • Note that the model (12) cannot be written as the underlying multiplicative hazard competing risks model of Andersen et al. (1993, Ch. VII.2) for the asymptotic results therein to be readily available, as for the model in (6).
  • The proofs of the asymptotic results follow the similar steps, as those in Andersen and Gill (1982) and Andersen et al. (1993, Ch VII.2), with little modification, as worked out by Prentice and Self (1983) in the context of ordinary survival data with general relative risk form.

4.2 Estimation of baseline cumulative cause-specific hazards

  • (16) The proof is similar to that in Andersen et al. (1993, p503-505) with little modification.
  • Scandinavian Journal of Statistics 12, 97-158. Basu, S. (2009).
  • Biometrika 97, 683-698. Dewanji, A. (1992).
  • A note on a test for competing risks with missing failure type.
  • Nonparametric prevalence and mortality estimators for animal experiments with incomplete cause of death data.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Regression Analysis of Competing Risks Data with General Missing
Pattern in Failure Types
Anup Dewanji
1
, P. G. Sankaran
2
, Debasis Sengupta
1
and Bappa Karmakar
1
1. Applied Statistics Unit, Indian Statistical Institute, Kolkata 700108, India
2. Department of Statistics, Cochin University of Science and Technology, Cochin-22, India
Summary. In competing risks data, missing failure types (causes) is a very common phenomenon. In a general
missing pattern, if a failure type is not observed, one observes a set of possible types containing the true type along
with the failure time. Dewanji and Sengupta (2003) considered nonparametric estimation of the cause-specific
hazard rates and suggested a Nelson-Aalen type estimator under such general missing pattern. In this work, we
deal with the regression problem, in which the cause-specific hazard rates may depend on some covariates, and
consider estimation of the regression coefficients and the cause-specific baseline hazards under the general missing
pattern using some semi-parametric models. We consider two different proportional hazards type semi-parametric
models for our analysis. We also consider an example from an animal experiment to illustrate our methodology.
Keywords: Competing risks; Cause-specific hazard; Missing failure type; Missing at random; Nelson-Aalen
estimator; Semi-parametric model
1 Introduction
In survival studies, it is common that death of individuals may be attributable to more than one cause or factor.
Competing risks models are usually employed to analyse such type of data. In competing risks set up. For each
individual, we observe a random vector (T,J) where T is possibly censored survival time and J represents cause
for death( exactly one of say m possible causes). However, due to inadequacy on the diagnostic mechanism one
often uncertain about the true failure type or it is reluctant to reveal the value of J for certain individuals. For
example, in carcinogenicity studies, death of individuals can be classified into death s with tumour or deaths due
to other reasons. However, there is uncertainty in assigning these causes of death even if the presence of tumour
can be ascertained. See Dinse(1986) and Lagakos and Louis (1988) for details. In certain cases, one cannot even
distinguish deaths with tumour and death without tumour because they are totally cannibalized.
In this work, we deal with the regression problem, in which the cause-specific hazard rates may depend on
some covariates, and consider estimation of the regression coefficients under some proportional hazards type semi-
parametric models, when observation on the failure type exhibits the general missing pattern as discussed before.
Recently, Chatterjee et al. (2010) have considered a similar problem in the context of partially observed disease
classification data with possibly large number of types. They have suggested a two-stage modeling in which the
first stage involves reducing the number of parameters by imposing a natural structure on the underlying disease
types and the second stage involves inference through a general extension of the partial likelihood based estimat-
ing equation (See Goetghebeur and Ryan, 1995). Apparently, however, they need to make certain assumptions
regarding the missing probabilities like most of the work on this issue. Also, Sen et al. (2010) have developed
a semiparametric Bayesian approach, where the partial information about the cause of death is incorporated by
means of latent variables, and proposed a simulation-based method using Markov Chain Monte Carlo (MCMC)
techniques to implement the Bayesian methodology.
2 Data and Models
For the m competing causes of failure, the corresponding cause-specific hazard rates, given the covariate value
Z = z, are defined as
λ
j
(t, z) = lim
t0
1
t
Pr [T [t, t + t), J = j|T t, z] , (1)
for j = 1, ···, m, where T denotes the failure time, J the failure type and Z the covariate that may be a vector. The
survival function S(t, z), for an individual with covariate value Z = z, can be written in terms of the cause-specific
hazard rates as S(t, z) = exp
h
R
t
0
P
m
j=1
λ
j
(u, z)du
i
.
Suppose that the data consists of the covariate value Z, the failure or censoring time X = min(T, C), where C
denotes the censoring time, a censoring indicator δ (1 for failure and 0 for censoring), and, if failure occurs, only
1
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session STS009) p.1229

partial information about the failure type is available in the form of a set G {1, ···, m} representing the possible
types of failure. This information on failure type is complete when the observed set G = g is a singleton set. Let
(x
i
, δ
i
, g
i
, z
i
) be an observation, where the components are the observed values of X, δ, G and Z, respectively, for
the ith individual, for i = 1, ···, n.
Let us write the masking probability p
gj
(t, z) as
p
gj
(t, z) = Pr [G = g|T = t, J = j, δ = 1, Z = z] , (2)
which is the conditional probability of observing the set g as the set of possible failure types, given that there
is a failure of type j at time t with covariate value z, for j = 1, ···, m and g 3 j. If g does not contain j, this
probability is zero so that, for a fixed j,
P
g
p
gj
(t, z) = 1. Note that the assumption p
gj
(t, z) = p
gj
0
(t, z), for all
j 6= j
0
g, is same as the missing-at-random assumption (Little and Rubin, 1987, p90) in this context. Assuming
that the missing mechanism is independent of the censoring mechanism and, for simplicity, the covariate value, the
probability p
gj
(t, z) equals Pr[G = g|T = t, J = j] = p
gj
(t), say, which is the conditional probability of observing
g 3 j as the set of possible failure types, given failure of type j at time t. Therefore, the missing pattern is allowed
to be non-ignorable. Noting that λ
j
(t, z)dt is the conditional probability of instantaneous failure due to type j at
time t, given survival up to time t and z, it follows that the hazard rate for failure due to cause j at time t and
with g 3 j observed as the set of possible causes is p
gj
(t)λ
j
(t, z). Hence, the hazard rate for failure at time t with
g observed as the set of possible causes, given covariate value z, is
λ
g
(t, z) =
X
jg
p
gj
(t)λ
j
(t, z) . (3)
As expected, the sum of the hazards given by (3), over the set G of all non-empty subsets g of {1, ···, m}, is the
total hazard λ
·
(t, z) =
P
m
j=1
λ
j
(t, z) since the coefficient of λ
j
(t, z), for a fixed j, in that sum is
P
g3j
p
gj
(t) = 1.
Note that these hazard rates in (3) can also be viewed as another set of cause-specific hazard rates with the different
g’s in G representing the different failure types. It is just another decomposition of the total hazard rate λ
·
(t, z)
resulting in a different competing risks problem. However, while information on the original failure types may be
masked, the same on these ‘modified’ failure types is directly available, thereby making estimation of the modified
cause-specific hazard rates λ
g
(t, z), given by (3), easier.
We consider two different proportional hazards type semi-parametric models to describe the competing risks
data. In the first model, referred to as Model 1, we have
λ
j
(t, z) = λ
0j
(t)e
zβ
, (4)
where λ
0j
(t) is an unknown and arbitrary function of t depending on j representing the baseline cause-specific
hazard rate for type j and zβ is a linear combination of the components of z with β being the vector of coefficient
parameters (assumed to be the same for all j). The vector of coefficient parameters β measures the effect of the
covariate vector (the kth parameter measuring the effect of the kth covariate) on the different cause-specific hazard
rates and this effect is assumed to be the same for all the failure types. This may be a strong assumption; but we
relax this to some extent in the second model.
Alternatively, in the second model, referred to as Model 2, we have
λ
j
(t, z) = λ
0
(t)e
γ
j
+zβ
j
, (5)
with γ
1
= 0 for identifiability and where λ
0
(t) is an unknown and arbitrary function of t. The vector of coefficient
parameters β
j
depends on j, thereby relaxing the assumption of same effect of the covariate vector on the different
cause-specific hazard rates, as in Model 1. However, the baseline cause-specific hazard rates, λ
0j
(t)’s, are now
assumed to be proportional to each other under this model with e
γ
j
’s being the proportionality constants. Formally,
λ
0j
(t) = λ
0
(t)e
γ
j
, for j = 2, ···, m, and λ
01
(t) = λ
0
(t). The constants γ
2
, ···, γ
m
are unknown and treated as
parameters along with the β
j
’s. Let us write
γ
= (γ
2
, ···, γ
m
)
T
, β
= (β
1
, ···, β
m
)
T
and
θ
= (
γ
, β
).
Note that although Model 2, given by (5), has more number of regression parameters than Model 1, given by
(4), Model 1 has m arbitrary functions (the baseline cause-specific hazard rates) to be estimated while Model 2
has only one. Therefore, clearly, neither model is a special case of the other; both the models are of some interest
depending on the situation. Both the models can be independently tested from standard competing risks data
without any missingness. However, under the general missing pattern as discussed, none of these can be tested. It
may not be possible to analyze the more general model, in which λ
j
(t, z) = λ
0j
(t)e
zβ
j
, at least by the technique
we use in this work.
3 Estimation under Model 1
Under Model 1, using (3), the hazard rate for failure at time t with g observed as the set of possible failure types,
given covariate value z, is given by
λ
g
(t, z) = λ
0g
(t)e
zβ
, (6)
2
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session STS009) p.1230

where λ
0g
(t) =
P
jg
p
gj
(t)λ
0j
(t), for all g G are arbitrary and unknown functions since λ
0j
(t)’s are. Also, these
cause-specific hazard rates for the ‘modified’ competing risks problem are of the same semi-parametric form as those
for the original cause-specific hazard rates in (4). Hence, the following partial likelihood is the most appropriate
to estimate the regression parameters β, in the absence of any knowledge on the baseline cause-specific hazards
λ
0g
(t), based on ‘modified’ competing risks data which is available without any missing failure type (Kalbfleisch
and Prentice, 1980, Sec. 7.2.3).
3.1 Estimation of regression parameters
Let t
(g1)
< ··· < t
(gk
g
)
be the k
g
ordered observed failure times (assuming no tie) with missing pattern g (that is,
with g being the set of possible failure types) and let z
(g1)
, ···, z
(gk
g
)
denote the corresponding covariate values.
Then, at each of these g-type failure times, say t
(gi)
, we consider the conditional probability that the individual
(gi) with covariate value z
(gi)
fails at time t
(gi)
, given the history up to time t
(gi)
and that one failure with
missing pattern g occurs at time t
(gi)
. The method of partial likelihood then gives
L
P 1
(β) =
Y
g∈G
k
g
Y
i=1
"
λ
g
(t
(gi)
, z
(gi)
)
P
lR(t
(gi)
)
λ
g
(t
(gi)
, z
l
)
#
=
Y
g∈G
k
g
Y
i=1
"
e
z
(gi)
β
P
lR(t
(gi)
)
e
z
l
β
#
, (7)
where R(t
(gi)
) is the risk set at time t
(gi)
and z
l
denotes the covariate value for the lth individual. Clearly, this
partial likelihood (7) can accommodate tied failure times with different missing pattern, but an approximation may
be needed to deal with tied failure times with the same missing pattern. This partial likelihood is maximized to get
an estimate of β. Let us denote it by
ˆ
β. Note that the standard asymptotic likelihood techniques can be applied to
this partial likelihood (7) and to the estimate
ˆ
β to make inference on β. See Andersen and Gill (1982), Andersen
and Borgan (1985) and Andersen et al. (1993, Ch. VII.2). In particular, under some regularity conditions, the
asymptotic distribution of (
ˆ
β β) is approximately a multivariate normal with mean zero and covariance matrix
that may be consistently estimated by I
1
(
ˆ
β), where I(β) is the matrix of second order partial derivatives of
log L
P 1
(β).
3.2 Estimation of baseline cumulative cause-specific hazards
The baseline cumulative cause-specific hazards Λ
0g
(t) =
R
t
0
λ
0g
(u)du for the ‘modified’ competing risks problem
can also be estimated as follows. Let us consider the n(2
m
1)-dimensional counting process {N
gi
(t), g G, i =
1, ···, n}, where N
gi
(t) counts the number of failures with missing pattern g up to time t for individual i. Consider
the multiplicative intensity model with covariates, in which the corresponding intensity process is given by
α
gi
(t) = λ
g
(t, z
i
)Y
i
(t) = λ
0g
(t)e
z
i
β
Y
i
(t),
where Y
i
(t) = 1 if the ith individual is at risk at time t and λ
0g
(t) is as in (6). We have, for each non-empty
subset g of {1, ···, m} and for each i,
dN
gi
(t) = α
gi
(t)dt + dM
gi
(t),
where M
gi
(t)’s are local square integrable martingales. Then, following Andersen and Borgan (1985), the general-
ized Nelson-Aalen estimate for Λ
0g
(t) =
R
t
0
λ
0g
(u)du, for g G, is given by
ˆ
Λ
0g
(t) =
Z
t
0
J(u)
nS
(0)
(
ˆ
β, u)
dN
g
(u), (8)
where N
g
(t) =
P
n
i=1
N
gi
(t), Y (t) =
P
n
i=1
Y
i
(t), J(u) = I{Y (u) > 0} and S
(0)
(
ˆ
β, u) = n
1
P
n
i=1
Y
i
(u)e
z
i
ˆ
β
. This
integral formula for the estimate reduces to a finite sum (Lawless, 2003, p449)
ˆ
Λ
0g
(t) =
X
i:t
(gi)
t
1
P
n
l=1
Y
l
(t
(gi)
)e
z
l
ˆ
β
.
Under the same set of regularity conditions, as required for the asymptotic normality of
ˆ
β, the process
ˆ
Λ
0g
(t) Λ
0g
(t), g G
converges weakly to a (2
m
1)-variate mean zero Gaussian process (Andersen et al., 1993,
Ch VII.2.2). In particular, for fixed g, g
0
G and time t, the covariance of
ˆ
Λ
0g
(t) Λ
0g
(t),
ˆ
Λ
0g
0
(t) Λ
0g
0
(t)
can
be consistently estimated by ˆσ
gg
0
(t) =
δ
gg
0
Z
t
0
S
(0)
(
ˆ
β, u)
2
dN
g
(u) +
Z
t
0
X(
ˆ
β, u)dN
g
(u)
T
I
1
(
ˆ
β)
Z
t
0
X(
ˆ
β, u)dN
g
0
(u)
, (9)
3
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session STS009) p.1231

where δ
gg
0
= 1, if g and g
0
are the same set, and 0 otherwise, and X(
ˆ
β, u) = n
1
S
(1)
(
ˆ
β, u)× S
(0)
(
ˆ
β, u)
2
with
S
(1)
(
ˆ
β, u) = n
1
P
n
i=1
z
i
Y
i
(u)e
z
i
ˆ
β
being a vector of the same size as that of the covariate Z. Note that this
expression (9) for the estimated covariance also reduces to a finite sum.
Note that the masking probabilities p
gj
(t)’s are usually not known and need to be estimated. There have
been some works concerning estimation of these masking probabilities usually requiring either additional modeling
assumptions or secondary data. See Basu (2009) for a review on this, which also finds that the performance of
model-based estimates is less than desirable. Nevertheless, in order to be able to estimate these probabilities in
practice, as in Dewanji and Sengupta (2003), we make the simplifying assumption that p
gj
(t)’s are independent
of time t, though it may depend on g and j. We will write p
gj
(t) as p
gj
in the subsequent discussion and denote
the (2
m
1) × m matrix of p
gj
’s by P . Then, writing
Λ
0
(t) as the (2
m
1) × 1 vector of Λ
0g
(t)’s and
Λ
0
(t) as
the m × 1 vector of the original cumulative baseline cause-specific hazards Λ
0j
(t)’s, where Λ
0j
(t) =
R
t
0
λ
0j
(u)du,
we have, from (3),
Λ
0
(t) = P
Λ
0
(t). (10)
Then, following Dewanji and Sengupta (2003), we have a consistent estimate of Λ
0
(t) as
ˆ
Λ
0
(t) =
b
P
T
ˆ
Σ
1
(t)
b
P
1
b
P
T
ˆ
Σ
1
(t)
b
Λ
0
(t), (11)
where
ˆ
Σ(t) is the (2
m
1) × (2
m
1) matrix of ˆσ
gg
0
(t)’s,
b
P denotes a consistent estimate of P and
ˆ
Λ
0
(t) is the
(2
m
1) × 1 vector of
ˆ
Λ
0g
(t)’s.
Note that
ˆ
Λ
0
(t)
Λ
0
(t)
=
b
P
T
ˆ
Σ
1
(t)
b
P
1
b
P
T
ˆ
Σ
1
(t)
b
Λ
0
(t)
b
P
T
ˆ
Σ
1
(t)
b
P
1
b
P
T
ˆ
Σ
1
(t)
b
P
Λ
0
(t)
=
b
A(t)
ˆ
Λ
0
(t)
Λ
0
(t)
+
b
A(t)
P
b
P
Λ
0
(t),
where
b
A(t) =
b
P
T
ˆ
Σ
1
(t)
b
P
1
b
P
T
ˆ
Σ
1
(t) is a consistent estimate of
P
T
Σ
1
(t)P
1
P
T
Σ
1
(t) with Σ(t) being
the asymptotic covariance matrix of
ˆ
Λ
0
(t)
Λ
0
(t). Hence, using the weak convergence result of
ˆ
Λ
0
(t)
Λ
0
(t)
and the fact that
b
P is a consistent estimate of P , one can establish weak convergence of
ˆ
Λ
0
(t)
Λ
0
(t) to a m-
variate mean zero Gaussian process with the covariance matrix at time t given by
P
T
Σ
1
(t)P
1
, which can be
consistently estimated by
b
P
T
ˆ
Σ
1
(t)
b
P
1
. This result on weak convergence to a Gaussian process similarly holds
for the estimate of Dewanji and Sengupta (2003) and can be useful for nonparametric one- and k-sample tests for
the cumulative cause-specific hazards in the spirit of Andersen and Borgan (1985, Section 5) and Andersen et al.
(1993, Ch. V).
Note, however, that this estimate cannot be guaranteed to be non-decreasing, although it is expected to be
so for large sample because of its consistency. In practice, one can use “pooling-the-adjacent-violators” algorithm
to achieve monotonicity. If some of the {N
g
(t)}’s are not observed to have any jump during the study, the
corresponding
b
Λ
0g
(t)’s, and the associated entries of
ˆ
Σ(t), turn out to be zero; the corresponding rows of P are
also estimated, as given in Dewanji and Sengupta (2003), to be zero. The same estimation procedure goes through
with the observed {N
g
(t)}’s as long as the resulting
b
P is of full column-rank. Even if the rank of
b
P is less than
m, some components of
Λ
0
(t) may be estimable.
4 Estimation under Model 2
A similar method can be developed under Model 2. Using (3), the cause-specific hazard rate for failure at time t
with g observed as the set of possible failure types, given covariate value z, is given by
λ
g
(t, z) = λ
0
(t)f
g
(z, t,
θ
), (12)
where f
g
(z, t,
θ
) = f
g
(z, t,
γ
, β
) =
X
jg
p
gj
(t)e
γ
j
+zβ
j
, for all g G. These have the similar semi-parametric form
as those for the original cause-specific hazard rates in (5), except that the parametric component f
g
(z, t,
θ
), for
different g’s, are not of the simple exponential form. In particular, when the masking probabilities (p
gj
(t)’s) are
independent of time, the parametric component f
g
(z, t,
θ
) is also independent of time and is written as f
g
(z,
θ
).
4
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session STS009) p.1232

Nevertheless, from Kalbfleisch and Prentice (1980, Sec. 7.2.3), the partial likelihood in (13) is the most appropriate
to estimate the vector of regression parameters
θ
, in the absence of any knowledge on the baseline cause-specific
hazards λ
0
(t), based on the ‘modified’ competing risks data which is available without any missingness.
4.1 Estimation of regression parameters
Let t
(1)
< ··· < t
(k)
be the k ordered observed failure times with covariates z
(1)
, ···, z
(k)
and missing patterns
g
(1)
, ···, g
(k)
, respectively. Then, at each of these failure times, say t
(i)
, we consider the conditional probability
that the individual (i) with covariate value z
(i)
fails at time t
(i)
with missing pattern g
(i)
, given the history up to
time t
(i)
and that one failure occurs at time t
(i)
. The method of partial likelihood then gives
L
P 2
(
θ
) =
k
Y
i=1
"
λ
g
(i)
(t
(i)
, z
(i)
)
P
lR(t
(i)
)
P
g∈G
λ
g
(t
(i)
, z
l
)
#
=
k
Y
i=1
f
g
(i)
(z
(i)
,
θ
)
P
lR(t
(i)
)
P
g∈G
f
g
(z
l
,
θ
)
, (13)
where R(t
(i)
) is the risk set at time t
(i)
. See Dewanji (1992) for a special case of this partial likelihood. An
approximation may be needed to deal with tied failure times. This partial likelihood (13) is maximized to get
an estimate of
θ
, denoted by
ˆ
θ
. Note that the model (12) cannot be written as the underlying multiplicative
hazard competing risks model of Andersen et al. (1993, Ch. VII.2) for the asymptotic results therein to be readily
available, as for the model in (6). However, similar asymptotic likelihood techniques can still be applied to this
partial likelihood (13) and to the estimate
ˆ
θ
for making inference on
θ
. The proofs of the asymptotic results
follow the similar steps, as those in Andersen and Gill (1982) and Andersen et al. (1993, Ch VII.2), with little
modification, as worked out by Prentice and Self (1983) in the context of ordinary survival data with general
relative risk form. In particular, under some regularity conditions as mentioned in Prentice and Self (1983), the
asymptotic distribution of (
ˆ
θ
θ
0
), where
θ
0
denotes the true value of
θ
, is approximately a multivariate normal
with mean zero and covariance matrix that may be consistently estimated by I
1
(
ˆ
θ
), where I(
θ
) is the matrix
of second order partial derivatives of log L
P 2
(
θ
).
4.2 Estimation of baseline cumulative cause-specific hazards
Estimation of Λ
0
(t) =
R
t
0
λ
0
(u)du follows the similar argument as those in Andersen et al. (1993, p482) and, given
the maximum partial likelihood estimate
ˆ
θ
, a natural estimator for Λ
0
(t) is
ˆ
Λ
0
(t) =
Z
t
0
J(u)
n
P
g∈G
S
(0)
g
(
ˆ
θ
, u)
dN(u), (14)
where N(t) =
P
n
i
P
g∈G
N
gi
(t) and S
(0)
g
(
θ
, u) = n
1
n
X
i=1
f
g
(z
i
,
θ
)Y
i
(u). Under some regularity conditions, as
required for asymptotic normality of
ˆ
θ
, the process
n
ˆ
Λ
0
(t) Λ
0
(t)
converges weakly to a mean zero Gaussian
process with variance function σ
2
(t) along with a consistent estimate ˆσ
2
(t) which are given by
σ
2
(t) =
Z
t
0
X
g
s
(0)
g
(
θ
0
, u)
!
1
λ
0
(u)du +
Z
t
0
P
g
s
(1)
g
(
θ
0
, u)
P
g
s
(0)
g
(
θ
0
, u)
λ
0
(u)du
T
Σ
1
Z
t
0
P
g
s
(1)
g
(
θ
0
, u)
P
g
s
(0)
g
(
θ
0
, u)
λ
0
(u)du
,
(15)
and
ˆσ
2
(t) = n
1
Z
t
0
dN(u)
P
g
S
(0)
g
(
ˆ
θ
, u)
2
+
Z
t
0
P
g
S
(1)
g
(
ˆ
θ
, u)
P
g
S
(0)
g
(
ˆ
θ
, u)
2
dN(u)
T
I
1
(
ˆ
θ
)
Z
t
0
P
g
S
(1)
g
(
ˆ
θ
, u)
P
g
S
(0)
g
(
ˆ
θ
, u)
2
dN(u)
.
(16)
5
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session STS009) p.1233

Citations
More filters

Book ChapterDOI
Isha Dewan1, Uttara Naik-Nimbalkar2Institutions (2)
01 Jan 2017
TL;DR: In this article, some statistical inference procedures used when the cause of failure is missing or masked for some units are reviewed.
Abstract: Competing risks data arise when the study units are exposed to several risks at the same time but it is assumed that the eventual failure of a unit is due to only one of these risks, which is called the “cause of failure”. Statistical inference procedures when the time to failure and the cause of failure are observed for each unit are well documented. In some applications, it is possible that the cause of failure is either missing or masked for some units. In this article, we review some statistical inference procedures used when the cause of failure is missing or masked for some units.

Journal ArticleDOI
01 Jun 2017
TL;DR: This paper considers the nonparametric estimation of cumulative cause specific reversed hazard rates for left censored competing risks data under masked causes of failure with maximum likelihood estimators and least squares type estimators.
Abstract: In the analysis of competing risks data, it is common that the exact cause of failure for certain study subjects is missing. This problem of missing failure type may be due to inadequacy in the diagnostic mechanism or reluctance to report the exact cause of failure. In the present paper, we consider the nonparametric estimation of cumulative cause specific reversed hazard rates for left censored competing risks data under masked causes of failure. We first develop maximum likelihood estimators of cumulative cause specific reversed hazard rates. We then consider the least squares type estimators for cumulative cause specific reversed hazard rates, when the information about the conditional probability of exact failure type given a set of possible failure types is available. Simulation studies are conducted to assess the performance of the proposed estimators. We illustrate the applicability of the proposed methods using a data set. Abstract

References
More filters


Book
01 Jan 1987
TL;DR: This work states that maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse and large-Sample Inference Based on Maximum Likelihood Estimates is likely to be high.
Abstract: Preface.PART I: OVERVIEW AND BASIC APPROACHES.Introduction.Missing Data in Experiments.Complete-Case and Available-Case Analysis, Including Weighting Methods.Single Imputation Methods.Estimation of Imputation Uncertainty.PART II: LIKELIHOOD-BASED APPROACHES TO THE ANALYSIS OF MISSING DATA.Theory of Inference Based on the Likelihood Function.Methods Based on Factoring the Likelihood, Ignoring the Missing-Data Mechanism.Maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse.Large-Sample Inference Based on Maximum Likelihood Estimates.Bayes and Multiple Imputation.PART III: LIKELIHOOD-BASED APPROACHES TO THE ANALYSIS OF MISSING DATA: APPLICATIONS TO SOME COMMON MODELS.Multivariate Normal Examples, Ignoring the Missing-Data Mechanism.Models for Robust Estimation.Models for Partially Classified Contingency Tables, Ignoring the Missing-Data Mechanism.Mixed Normal and Nonnormal Data with Missing Values, Ignoring the Missing-Data Mechanism.Nonignorable Missing-Data Models.References.Author Index.Subject Index.

18,186 citations




Book
27 Nov 2002
Abstract: Basic Concepts and Models. Observation Schemes, Censoring and Likelihood. Some Nonparametric and Graphical Procedures. Inference Procedures for Parametric Models. Inference procedures for Log-Location-Scale Distributions. Parametric Regression Models. Semiparametric Multiplicative Hazards Regression Models. Rank-Type and Other Semiparametric Procedures for Log-Location-Scale Models. Multiple Modes of Failure. Goodness of Fit Tests. Beyond Univariate Survival Analysis. Appendix A. Glossary of Notation and Abbreviations. Appendix B. Asymptotic Variance Formulas, Gamma Functions and Order Statistics. Appendix C. Large Sample Theory for Likelihood and Estimating Function Methods. Appendix D. Computational Methods and Simulation. Appendix E. Inference in Location-Scale Parameter Models. Appendix F. Martingales and Counting Processes. Appendix G. Data Sets. References.

4,151 citations