! .
BAYESIAN ANALYSIS AND MODEL SELECTION
FOR
INTERVAL-CENSORED SURVIVAL
DATA
by
Debajyoti Sinha, Ming-Hui Chen, and Sujit K. Ghosh
Institute
of
Statistics Mimeograph Series No. 2298
May 1997
NORTH CAROLINA STATE UNIVERSITY
Raleigh,
North
Carolina
The
Library
of
the
Departm~~,
., ;
idtistica
Nann
Carolina
State
U'"versi/y
Date
'es
Mimeo
Serl.
No.
2298
9
97
and
Model
. May 1 .
Analysis
I
Bayes1an
Interval-
.
for
S
lect10n
. 1
data
e d
surV1va
d
Ghosh
censore
Chen
an
BY:
Sinha,
I==:N=-ame
~
__
,
I
Bayesian
Analysis
and
Model
Selection
for
Interval-censored
Survival
Data
Debajyoti
Sinha;
Ming-Hui
Chent
and
Sujit
K.Ghosh
t
,
April
10, 1997
Abstract
Interval-censored
data
occur in survival analysis when
the
survival
time
of
each
patient
is only known
to
be within
an
interval
and
these censoring intervals differ from
patient
to
patient.
This
kind
of
data
pose some challenges
to
the
semi
parametric
analysis
and
model diagnostics. For such
data,
we
present
some Bayesian discretized semi
parametric
models, incorporating
the
proportional
and
non-proportional
hazards structures, along
with
the
associated
statistical
analyses
and
tools for model selection using
sampling based methods.
The
scope
of
these methodologies is
illustrated
through
a re-analysis
of
the
historical
data
set from Finkelstein (1986).
Key
Words:
CPO,
Gibbs sampler,
Prior
process.
1
Introd
uction
Many clinical
trials
and
medical studies use periodic scheduled follow-ups
of
each
patient
to
monitor
the
time
to
an
event
of
interest or disease (Le. survival
time
T
of
the
patient)
whose occurrence is
not
apparent
from outside.
The
occurrence
of
such event
can
be detected only
through
some invasive
procedure (such as testing blood
or
tissue samples etc.) performed during these clinic visits. Medical
researchers often come across interval censoring in such studies when
the
patients
miss some
of
the
scheduled
appointments
for reasons
not
related
to
the
survival times
and
the
observed censoring intervals
containing
their
survival times frequently overlap with each
other.
Interval-censored survival
data
have
'Department
of
Mathematics,
University
of
New Hampshire,
Durham,
NH
03824-3591.
Dr.
Sinha's
research was
supported
by
the
grant
R29-CA69222-02 from NCI.
tDepartment
of
Mathematical
Sciences, Worcester Polytechnic
Institute,
Worcester,
MA
01609-2276
*Department
of
Statistics,
North
Carolina
State
University, Raleigh,
NC
27695-8203
1
",
.
~~.,
:... ;
,~~,
recently received much
attention
in biostatistical
and
statistical
literature
due
to
diseases such as AIDS
and
some forms
of
cancers. For recent reviews, see
Satten
(1996),
and
Frydman
(1995).
The
data
set
in Table 3
of
Finkelstein
and
Wolfe (1985) is a historical
data
set
of
interval-censored
data.
In this
data
set, 46 early
breast
cancer patients receiving only
radiotherapy
(covariate value z =
0)
and
48
patients
receiving radio-chemotherapy
(z
= 1) were monitored for cosmetic change
during
weekly clinic
visits.
But,
some
patients
missed some
of
their weekly visits. So,
the
data
on survival
time
are
typically
recorded as, for example, (7,18]
(at
the
7th
week clinic-visit,
patient
had
shown no change
and
then
in
the
next clinic visit
at
the
18th
week the
patient's
tissue showed
that
the
change
had
already occurred).
Since,
the
clinic visits
of
different
patients
occurred
at
different times,
the
censoring intervals in
the
data
set are found
to
be often overlapping.
We are interested
to
see
the
effect
of
the
covariate z associated
with
the
patient,
on
the
survival
time
T.
A
popular
semiparametric approach
to
model survival time, in
the
presence
of
covariate
effects is proposed
in
the
Co~'s
(1972) proportional hazards model, given by, A(tlz) =
Ao(t)e.8
z
.
Here
A(tlz) = -It
10gP(T
> tlz) is
the
hazard function
of
T given z,
(3
is
the
time-independent regression
coefficient for
the
covariate z
and
AO(t)
is
the
baseline
hazard
function. Finkelstein (1986)
and
Satten
(1996) analyzed interval-censored
data
under
the
assumption
of
Cox model.
But,
such
an
assumption
of
time-independent regression coefficient
may
not
always be valid.
The
major
contribution
of
the
present
paper
is two fold.
With
the
advancement
of
the
sampling based
computational
tools,
it
is now
feasible
to
consider more general models which incorporates time-varying coefficients. Secondly, while
powerful
computational
tools enable us
to
fit remarkably complex models we should
not
loose sight
of
the
need
to
make
suitably
parsimonious choices. So,
we
develop some Bayesian tools for model selection
and
model validation. So far,
to
our
knowledge there is no formal
statistical
method
to
select among
the
models
we
propose
or
to
check any modeling assumption such as time-independent coefficient for
interval-censored
data.
In addition, Bayesian
method
enables us
to
obtain
exact small sample inference
on
the
parameter
of
interest (i.e.
the
regression coefficient), from
the
moderate
sized
data
set
even
with
a high-dimensional nuisance
parameter
(Le.,
the
baseline hazard).
In Section
2,
we
propose a Bayesian version
of
discretized Cox model
and
a model
with
time-varying
coefficients. In Section 3,
we
describe model fitting using sampling based method. In Section 4,
we
present some Bayesian model selection
and
model checking methods.
In
Section 5,
we
illustrate
the
proposed
methods
by reanalyzing the
breast
cancer
data
of
Finkelstein
and
Wolfe (1985). Section 6
concludes
with
some remarks.
2
2
Models
(2)
131c+1
I
131,
...
,131e
'"
N
(13Ie,
w~)
for k =
0,
..
·,9
- 1
and
the
N(13o,
w1J)
and
13
is apriori
(2)
13
We
take
the
hazard
to
be a piecewise
constant
function
with
A(tlz) =
A/c0k
for t E
l/c,
where
O/c
=e
13
1r.,
lie
= (ale-1,
ale]
for k =
1,2,
...
,9,0
=
ao
<
a1
< ... < a
g
=
00,
and
9 is
the
total
number
of
grid
intervals.
The
length
of
each grid can be taken
to
be sufficiently small
to
approximate
any
hazard
function for all practical purposes. Now,
we
present two Bayesian
semiparametric
discretized models,
viz. a discretized version
of
the
Cox model (which
we
call M
o
)
and
a discretized
hazard
model with
time-dependent regression coefficient (which
we
call M
1
).
More precisely, these features are
captured
through
their
prior specifications as follows:
indep
(
M
o
: (1)
Ale
'"
Gamma
Tile,
ile) for k = 1,
..
·,9;
independent
of
A=
(A1'
..
" A
g
).
M
1
:
(1) A has
same
prior as in M
o
;
13Ie's
are
apriori independent
of
A .
In above, we assume
that
the
hyperparameters
of
these models, viz.,
Tile'S,
ile'S,
WIe'S
and
130
are known
in advance.
M
o
is a discretized version
of
Cox model with a discretized version
of
the
gamma
process prior
(Kalbfleisch 1978) for
the
baseline
hazard
AOO
where
TlIe/ile
is
the
prior
mean
and
TlIe/i~
is
the
prior
variance
of
Ale.
When
the
grid intervals are sufficiently small, this discretized version will be indistin-
guishable from
the
actual
time-continuous
gamma
process.
The
discretized
autocorrelated
prior process
for
13Ie's
in M
1
allows
the
covariate effect
to
change over time,
but
also incorporates
the
prior informa-
tion
that
the
values
of
the
coefficient
13
in adjacent intervals are expected
to
be
somewhat
close
and
the
dependence
among
the
13's
decrease as
the
intervals become further
apart.
This
assumption
seems
to
be in complete accordance
with
some studies where
the
covariate effect
may
change over time,
but
is
not
expected
to
change
too
wildly over time.
The
parameters
w/c
's·
can
be used as a
tuning
device
to
determine
our
prior opinion
about
the
possible change in
the
magnitude
of
13
over
time.
For example,
apriori we expect
the
131c+1
to
be within approximately
1.96wIe
from
the
131e
with
95% confidence.
The
w/c's
should depend on
the
lengths
of
the
lie's allowing
the
coefficient
to
change
more
for bigger grid
intervals.
It
is possible
to
use
an
autocorrelated prior process for
the
baseline
hazard
also. For details
on
the
use
and
properties
of
an
autocorrelated process, see Sinha
and
Dey (1997),
and
Sargent (1996).
Our
major
interest is
to
compare the Cox model
(M
o
)
with
the
time
varying coefficient model
(Md.
For
the
example
of
breast cancer
data,
we
consider following values
of
the
hyperparameters.
3