Bayesian analysis and model selection for interval-censored survival data.

doi:10.1111/J.0006-341X.1999.00585.X

! .

BAYESIAN ANALYSIS AND MODEL SELECTION

FOR

INTERVAL-CENSORED SURVIVAL

DATA

by

Debajyoti Sinha, Ming-Hui Chen, and Sujit K. Ghosh

Institute

of

Statistics Mimeograph Series No. 2298

May 1997

NORTH CAROLINA STATE UNIVERSITY

Raleigh,

North

Carolina

The

Library

of

the

Departm~~,

., ;

idtistica

Nann

Carolina

State

U'"versi/y

Date

'es

Mimeo

Serl.

No.

2298

9

97

and

Model

. May 1 .

Analysis

I

Bayes1an

Interval-

.

for

S

lect10n

. 1

data

e d

surV1va

d

Ghosh

censore

Chen

an

BY:

Sinha,

I==:N=-ame

~

__

,

I

Bayesian

Analysis

and

Model

Selection

for

Interval-censored

Survival

Data

Debajyoti

Sinha;

Ming-Hui

Chent

and

Sujit

K.Ghosh

t

,

April

10, 1997

Abstract

Interval-censored

data

occur in survival analysis when

the

survival

time

of

each

patient

is only known

to

be within

an

interval

and

these censoring intervals differ from

patient

to

patient.

This

kind

of

data

pose some challenges

to

the

semi

parametric

analysis

and

model diagnostics. For such

data,

we

present

some Bayesian discretized semi

parametric

models, incorporating

the

proportional

and

non-proportional

hazards structures, along

with

the

associated

statistical

analyses

and

tools for model selection using

sampling based methods.

The

scope

of

these methodologies is

illustrated

through

a re-analysis

of

the

historical

data

set from Finkelstein (1986).

Key

Words:

CPO,

Gibbs sampler,

Prior

process.

1

Introd

uction

Many clinical

trials

and

medical studies use periodic scheduled follow-ups

of

each

patient

to

monitor

the

time

to

an

event

of

interest or disease (Le. survival

time

T

of

the

patient)

whose occurrence is

not

apparent

from outside.

The

occurrence

of

such event

can

be detected only

through

some invasive

procedure (such as testing blood

or

tissue samples etc.) performed during these clinic visits. Medical

researchers often come across interval censoring in such studies when

the

patients

miss some

of

the

scheduled

appointments

for reasons

not

the

survival times

and

the

observed censoring intervals

containing

their

survival times frequently overlap with each

other.

Interval-censored survival

data

have

'Department

of

Mathematics,

University

of

New Hampshire,

Durham,

NH

03824-3591.

Dr.

Sinha's

research was

supported

by

the

grant

R29-CA69222-02 from NCI.

tDepartment

of

Mathematical

Sciences, Worcester Polytechnic

Institute,

Worcester,

MA

01609-2276

*Department

of

Statistics,

North

Carolina

State

University, Raleigh,

NC

27695-8203

1

",

.

~~.,

:... ;

,~~,

recently received much

attention

in biostatistical

and

statistical

literature

due

to

diseases such as AIDS

and

some forms

of

cancers. For recent reviews, see

Satten

(1996),

and

Frydman

(1995).

The

data

set

in Table 3

of

Finkelstein

and

Wolfe (1985) is a historical

data

set

of

interval-censored

data.

In this

data

set, 46 early

breast

cancer patients receiving only

radiotherapy

(covariate value z =

0)

and

48

patients

receiving radio-chemotherapy

(z

= 1) were monitored for cosmetic change

during

weekly clinic

visits.

But,

some

patients

missed some

of

their weekly visits. So,

the

data

on survival

time

are

typically

recorded as, for example, (7,18]

(at

the

7th

week clinic-visit,

patient

had

shown no change

and

then

in

the

next clinic visit

at

the

18th

week the

patient's

tissue showed

that

the

change

had

already occurred).

Since,

the

clinic visits

of

different

patients

occurred

at

different times,

the

censoring intervals in

the

data

set are found

to

be often overlapping.

We are interested

to

see

the

effect

of

the

covariate z associated

with

the

patient,

on

the

survival

time

T.

A

popular

semiparametric approach

to

model survival time, in

the

presence

of

covariate

effects is proposed

in

the

Co~'s

(1972) proportional hazards model, given by, A(tlz) =

Ao(t)e.8

z

.

Here

A(tlz) = -It

10gP(T

> tlz) is

the

hazard function

of

T given z,

(3

is

the

time-independent regression

coefficient for

the

covariate z

and

AO(t)

is

the

baseline

hazard

function. Finkelstein (1986)

and

Satten

(1996) analyzed interval-censored

data

under

the

assumption

of

Cox model.

But,

such

an

assumption

of

time-independent regression coefficient

may

not

always be valid.

The

major

contribution

of

the

present

paper

is two fold.

With

the

advancement

of

the

sampling based

computational

tools,

it

is now

feasible

to

consider more general models which incorporates time-varying coefficients. Secondly, while

powerful

computational

tools enable us

to

fit remarkably complex models we should

not

loose sight

of

the

need

to

make

suitably

parsimonious choices. So,

we

develop some Bayesian tools for model selection

and

model validation. So far,

to

our

knowledge there is no formal

statistical

method

to

select among

the

models

we

propose

or

to

check any modeling assumption such as time-independent coefficient for

interval-censored

data.

In addition, Bayesian

method

enables us

to

obtain

exact small sample inference

on

the

parameter

of

interest (i.e.

the

regression coefficient), from

the

moderate

sized

data

set

even

with

a high-dimensional nuisance

parameter

(Le.,

the

baseline hazard).

In Section

2,

we

propose a Bayesian version

of

discretized Cox model

and

a model

with

time-varying

coefficients. In Section 3,

we

describe model fitting using sampling based method. In Section 4,

we

present some Bayesian model selection

and

model checking methods.

In

Section 5,

we

illustrate

the

proposed

methods

by reanalyzing the

breast

cancer

data

of

Finkelstein

and

Wolfe (1985). Section 6

concludes

with

some remarks.

2

Models

(2)

131c+1

I

131,

...

,131e

'"

N

(13Ie,

w~)

for k =

0,

..

·,9

- 1

and

the

N(13o,

w1J)

and

13

is apriori

(2)

13

We

take

the

hazard

to

be a piecewise

constant

function

with

A(tlz) =

A/c0k

for t E

l/c,

where

O/c

=e

13

1r.,

lie

= (ale-1,

ale]

for k =

1,2,

...

,9,0

=

ao

<

a1

< ... < a

g

=

00,

and

9 is

the

total

number

of

grid

intervals.

The

length

of

each grid can be taken

to

be sufficiently small

to

approximate

any

hazard

function for all practical purposes. Now,

we

present two Bayesian

semiparametric

discretized models,

viz. a discretized version

of

the

Cox model (which

we

call M

o

)

and

a discretized

hazard

model with

time-dependent regression coefficient (which

we

call M

1

).

More precisely, these features are

captured

through

their

prior specifications as follows:

indep

(

M

o

: (1)

Ale

'"

Gamma

Tile,

ile) for k = 1,

..

·,9;

independent

of

A=

(A1'

..

" A

g

).

M

1

:

(1) A has

same

prior as in M

o

;

13Ie's

are

apriori independent

of

A .

In above, we assume

that

the

hyperparameters

of

these models, viz.,

Tile'S,

ile'S,

WIe'S

and

130

are known

in advance.

M

o

is a discretized version

of

Cox model with a discretized version

of

the

gamma

process prior

(Kalbfleisch 1978) for

the

baseline

hazard

AOO

where

TlIe/ile

is

the

prior

mean

and

TlIe/i~

is

the

prior

variance

of

Ale.

When

the

grid intervals are sufficiently small, this discretized version will be indistin-

guishable from

the

actual

time-continuous

gamma

process.

The

discretized

autocorrelated

prior process

for

13Ie's

in M

1

allows

the

covariate effect

to

change over time,

but

also incorporates

the

prior informa-

tion

that

the

values

of

the

coefficient

13

in adjacent intervals are expected

to

be

somewhat

close

and

the

dependence

among

the

13's

decrease as

the

intervals become further

apart.

This

assumption

seems

to

be in complete accordance

with

some studies where

the

covariate effect

may

change over time,

but

is

not

expected

to

change

too

wildly over time.

The

parameters

w/c

's·

can

be used as a

tuning

device

to

determine

our

prior opinion

about

the

possible change in

the

magnitude

of

13

over

time.

For example,

apriori we expect

the

131c+1

to

be within approximately

1.96wIe

from

the

131e

with

95% confidence.

The

w/c's

should depend on

the

lengths

of

the

lie's allowing

the

coefficient

to

change

more

for bigger grid

intervals.

It

is possible

to

use

an

autocorrelated prior process for

the

baseline

hazard

also. For details

on

the

use

and

properties

of

an

autocorrelated process, see Sinha

and

Dey (1997),

and

Sargent (1996).

Our

major

interest is

to

compare the Cox model

(M

o

)

with

the

time

varying coefficient model

(Md.

For

the

example

of

breast cancer

data,

we

consider following values

of

the

hyperparameters.

3

Bayesian analysis and model selection for interval-censored survival data.

Figures

Citations

Bayesian Survival Analysis

Bayesian Survival Analysis

A survey of Bayesian predictive methods for model assessment, selection and comparison

Tutorial on methods for interval-censored data and their implementation in R

Fast Bayesian Inference in Dirichlet Process Mixture Models

References

Regression Models and Life-Tables

Applied Statistical Decision Theory

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review

Adaptive Rejection Sampling for Gibbs Sampling

Tools for statistical inference

Related Papers (5)

Bayesian measures of model complexity and fit

A Predictive Approach to Model Selection

Bayesian Survival Analysis

Regression Models and Life-Tables

Reversible jump Markov chain Monte Carlo computation and Bayesian model determination