When Should Epidemiologic Regressions Use Random Coefficients

doi:10.1111/J.0006-341X.2000.00915.X

BIOMETRICS

56, 915-921

September

2000

When Should Epidemiologic Regressions

Use

Random

Coefficients?

Sander Greenland

Department

of

Epidemiology,

UCLA

School

of

Public

Health,

Los

Angeles,

California

90095-1772,

U.S.A.

SUMMARY.

Regression models with random coefficients arise naturally in both frequentist and Bayesian

approaches to estimation problems. They are becoming widely available in standard computer packages

under the headings of generalized linear mixed models, hierarchical models, and multilevel models.

I

here

argue that such models offer

a

more scientifically defensible framework for epidemiologic analysis than

the fixed-effects models now prevalent in epidemiology. The argument invokes an antiparsimony principle

attributed to

L.

J.

Savage, which is that models should be rich enough to reflect the complexity of the

relations under study. It also invokes the countervailing principle that you cannot estimate anything if you

try to estimate everything (often used to justify parsimony). Regression with random coefficients offers

a

rational compromise between these principles

as

well

as

an alternative to analyses based on standard

variable-selection algorithms and their attendant distortion of uncertainty assessments. These points are

illustrated with an analysis of data on diet, nutrition, and breast cancer.

KEY

WORDS:

Bayesian statistics; Causal inference; Empirical Bayes estimators; Epidemiologic methods;

Hierarchical regression; Mixed models; Multilevel modeling; Random-coefficient regression; Relative risk;

Risk assessment; Shrinkage; Variance components.

1.

Introduction

When should epidemiologic regressions use random coeffi-

cients?

I

will argue that they are advisable whenever the

analysis objective is estimation of multiple causal effects and

some sort of dimensionality-reduction strategy is needed. My

arguments are not of mathematical

or

simulation form be-

cause there are many technical studies that support my the-

sis (cf., the citations in Greenland

(1998,

p.

428-430));

I

will

instead focus on the scientific advantages of mixed model-

ing that those studies reflect.

I

have derived these arguments

from writings of Box

(1976),

Leamer

(1978),

Good

(1983),

and other pragmatic Bayesians or Bayesians with reservations

and compromises (e.g., Rubin,

1984;

Draper,

1995),

though

any oversights are my own. What follows is an attempt to

apply these ideas in epidemiology, an often controversial and

idiosyncratic field whose importance is recognized but whose

use of statistics remains largely primitive; implementation de-

tails can be found in textbooks under the topic of hierarchi-

cal modeling (e.g., Gelman et al.,

1995,

Section

13.4;

Leonard

and Hsu,

1999,

Section

6.3)

though not at

a

level accessible

to most epidemiologists.

Causal effects are usually underidentified by epidemiologic

data in that any realistic model for the effects cannot be fit

without constraints. This underidentification is concealed by

routine analysis strategies but can be addressed openly using

models with random coefficients. The issue is important to

so-

ciety at large because of the seriousness with which the public

and lay press often respond to epidemiologic studies (Taubes,

1995).

For

example, massive lawsuits often result from weak

suggestions of hazards while dietary fads get launched by even

weaker data.

I

attribute some

of

this problem to inappropriate

modeling strategies that are common in epidemiology today.

The example below is intended

to

show how these strategies

lead to illusory significant results.

I

have encountered others

in which this occurs, and

I

believe many reported findings (in-

cluding several cited in Taubes

(1995))

contain similar mod-

eling artifacts.

I

will not contrast fitting methods, which have been the

fc-

cus of much research. That work, though important, has far

outpaced work on connecting models to the scientific context

(Hodges,

1996;

Mallows,

1998).

Nor

will

I

address issues of

model-form uncertainty

or

pure (noncausal) prediction mod-

eling,

as

considered, e.g., in the literature on model averaging

(e.g., Draper,

1995;

Raftery,

1996;

Buckland, Burnham, and

Augustin,

1997),

although mixed modeling can be viewed

as

a model-averaging method (Greenland,

1998, 1999).

2.

Complete Confounding in a Study

of

The example is from

a

casecontrol study of diet, food con-

stituents, and breast cancer (Witte et al.,

1994);

controls are

sisters of cases and

so

the data comprise matched sets with

one to five sister controls.

The variables include intakes of

35

food constituents (nutrients and suspected carcinogens)

computed from

87

diet questionnaire items plus five potential

confounders. This study is typical of many: The number of

subjects

(140

cases,

222

controls) is not much larger than the

number of variables.

(For

further study details, see Ursin

et

al.

(1992).)

I

will assume for now that only the food constituents

are

of

interest. This still leaves

a

dimensionality problem,

as

Food Constituents and Breast Cancer

915

916

Biometrics,

one should expect with

35

primary plus

5

confounding covari-

ates and only

140

cases

(3.5

cases per covariate) available for

analysis.

Standard analyses employ conditional logistic modeling

with one of the following strategies:

(1)

Use all

35

food constituents

as

candidate variables for

some sort of data-based variable-selection procedure,

such

as

stepwise regression, forcing in the five con-

founders (sometimes the confounders are also subject

to selection based on significance testing, but this prac-

tice has been condemned for leaving important con-

founders uncontrolled (Greenland and Neutra,

1980)).

(2)

Force all

35

food constituents and the

5

potential con-

founders into

a

single model and (if it fits) base infer-

ence on this model.

Strategy

1

can be condemned on the grounds that (i) the

food constituents are strongly correlated and hence estimates

from reduced subsets may be confounded by excluded vari-

ables, even if the latter are nonsignificant, and (ii) data-based

variable selection leads to nonnormal estimators and to severe

downward bias in the P-values and standard errors that come

from the final model (e.g., see the studies cited in Buckland et

al.

(1997)

and Greenland

(1998,

p.

402)).

Bootstrapping the

selection procedure is occasionally used to address problem

(ii), but this approach has its own problems (Reedman, Na-

vidi, and Peters,

1988).

Strategy

2

has also been promoted to

avoid the shortcomings of strategy

1

but depends on asymp-

totics whose applicability is dubious given the caselcovariate

ratio (exact logistic programs exist, but such large problems

remain beyond their reach). Here, however,

I

will focus on

a

major problem for causal inference that is overlooked by all

these strategies, i.e., confounding by residual dietary effects.

To describe this problem, let

X

represent the

362

x

87

di-

etary data matrix, let

W

be the

362

x

5

confounder data

matrix, and let

Z

=

{z3k}

be the

87

x

35

composition matrix

for the diet items; element

zjk

is the amount of constituent

k

found in one unit of diet item

j.

Thus,

Z

is

the table of

September

2000

contents for the diet items and

XZ

is the

362

x

35

matrix

giving the constituent intakes

for

the subjects. Letting Y be

the vector of subject-specific disease indicators, the logistic

model underlying the above strategies may be written

l=

logit{E(Y

I

x,

2,

W)}

=

a:

+

XZT

+

We,

(1)

where

T

is

the target parameter vector of constituent coeffi-

cients and

a

is

a

vector of nuisance parameters that are con-

stant within matched

sets.

Strategy

2

uses model

(1)

in its

entirety, whereas strategy

1

uses the data to select columns

of

XZ

for use in

a

reduced model. The models are

fit

by con-

ditional maximum-likelihood to eliminate

a,

and effects are

measured by the vector of odds ratios en (Breslow and Day,

1980).

The first column of Table

1

presents selected results from

applying strategy

1

to the fo.od constituents using backward

deletion with a-to-remove

=

0.10; 15

of the

35

constituents are

retained, and

11

of these have

P

<

0.05.

The second column

presents conditional maximum-likelihood (CML) estimates

of

odds ratios from strategy

2

(fit the full model); only

2

of the

35

coefficients have

P

<

0.05.

The first four food constituents

are shown because they have received considerable publicity

as

potential factors in carcinogenesis (possibly protective for

03

fatty acids, &carotene, and phytoestrogens and possibly

causal for alcohol). The differences in the point estimates from

strategies

1

and

2

are trivial relative to the confidence-interval

widths, but the intervals from the full model are meaningfully

wider for

C23

fatty acids and for alcohol. The differences in

widths are unsurprising given the downward bias in standard

errors estimated from data-selected models. The latter consid-

eration should be enough to make one prefer the full-model

intervals over the backward-deletion intervals.

I

will argue,

however, that even the full-model intervals are misleadingly

narrow.

Use of model

(1)

implicitly assumes absence of any effects

of

the diet variables

X

beyond the logit-linear effects mediated

through the constituents in

2.

There is no scientific basis

for this assumption, and there are good reasons to reject it.

Table

1

Estimates

of

odds ratios

en

from conditional logistic regressions

of

breast cancer

on

food constituents

(95%

confidence limits

in

parentheses); five potential confounders forced

into

each model

Model

With random diet residuals

Backward CML, all

35

deletion” constituents

7’

=

1/8

T2

=

112

03

fatty acids

,&carotene

Phytoestrogens

Alcohol

(3

oz./day)

Carbohydrate

(glday)

(mglday)

(mg/day)

(100

g/day)

0.77

(0.65, 0.92)

1.1

0.80

(0.70, 0.92)

0.94

(0.88, 1.00)

1

(deleted)

(0.99, 1.2)

0.71

(0.46, 1.1)

1.2

(1.01, 1.3)

0.73

(0.58, 0.93)

0.89

(0.63, 1.3)

0.97

(0.79, 1.2)

0.58

(0.17, 2.0)

1.1

(0.81, 1.6)

0.73

(0.40,

1.3)

0.93

(0.37, 2.3)

0.99

(0.58, 1.7)

0.49

(0.06, 4.3)

1.2

(0.64, 2.1)

0.72

(0.26, 1.9)

0.91

(0.18, 4.6)

1.0

(0.39, 2.6)

a

a-to-remove

=

0.10;

15

food

constituents retained.

Use

of

Random Coefficients with Epidemiologic Regressions

917

Dietary factors that may influence health continue to be dis-

covered, and their effects are not captured by

ir.

While the

individual effects of single omitted factors are likely to be

small,

so

are the effects under study. Furthermore, the aggre-

gate confounding due to the omitted effects may be impor-

tant because of the high positive correlations among healthy

dietary habits.

TO

account for this confounding problem, consider the ex-

panded model

e

=

0:

+xzT+

xs+

we.

The term

X6

is

intended to capture the residual diet-item

effects. Because

XZ

is

a linear function of

X,

however, the

constituent and diet effects are completely confounded in that

model (2) is not identified without side constraints. This non-

identification reflects the following fact: To control for other

dietary effects using a fixed-effects-only model, one would

have to measure the constituents responsible for those effects

and add them to model (1); without such measurements, the

effects of the measured constituents

Z

are not logically sepa-

rable from other dietary effects because those constituents are

measured only through diet variables in

X.

Standard analyses

of nutrient effects dodge this logical problem by not looking

beyond model (1). Of the two models, however, model (2)

is the only scientifically reasonable one for effect estimation.

Use of model (1) corresponds to imposing the implausible con-

straint

S

=

0

on model (2), which leads to understatement of

uncertainty about e*

.

Underidentified structures like model

(2)

are common in

epidemiology. Other examples include occupational studies in

which

X

contains job histories and

Z

is

a

matrix of expo-

sure levels within jobs, exercise studies in which

X

contains

physical-activity histories and

Z

is a vector of metabolic ex-

penditures of activities, and other studies in which

X

con-

tains questionnaire items and

Z

is

a

matrix that transforms

the items into quantities of focal interest. Most often, the po-

tential effects of

X

items not captured by

XZ

are ignored;

occasionally, items from

X

may be tested and added in a

forward-selection strategy, although the number that can be

added in this way is severely limited by the linear dependence

of

XZ

on

X.

3.

A

Mixed-Modeling

Approach

3.1

A

Famaly

of

Estimators

By treating

S

as

a vector of random coefficients, we can

achieve identification using less restrictive and more plausible

constraints than setting components of

6

to

0.

Perhaps

the simplest way to do

so

is to treat model

(2)

as

a

mixed model by specifying

6

-

MVN(p,T),

where

p

and

T

are known or are simple functions of a few unknown

parameters.

I

will here use

p

=

0,

T

=

7'1;

a more

realistic prior would have the diagonal elements of

T

vary

with diet item (indeed, Witte et al. (1994) constructed a

more complex prior for

6

based on extensive review of the

background nutrition and e idemiology literature). The fact

that the components of eB represent residual odds ratios

after regressing out food-constituent effects makes the zero-

correlation (diagonal

2')

assumption reasonable, because prior

correlations among the diet-item effects are, for the most

part, due to shared constituents. The normality of the prior is

chiefly for computational ease and could be replaced by other

(2)

assumptions if one had skill with software for Monte

Car10 fitting. Assuming normality, however, leads to simple

fitting methods such

as

restricted generalized least squares

(Goldstein, 1995), restricted maximum likelihood (Wolfinger

and O'Connell, 1993), penalized likelihood with

a

quadratic

penalty

for

6

(Breslow and Clayton, 1993; Greenland, 1997),

data augmentation (Bedrick, Christensen, and Johnson,

1996), and ridge regression with ridge parameters for

S

proportional to

1/~~

(Titterington, 1985).

Discussions of penalized likelihood and ridge regression

often treat

1/~~

as

a tuning

or

smoothing parameter for

solving an ill-conditioned regression problem rather than

as

an inverse variance component, and thus may appear to

finesse the problem

of

specifying

a

coefficient distribution.

Nonetheless, from a Bayesian perspective, such a distribution

is implicit in these methods (Learner, 1978) and the

tuning parameter should reflect the precision of background

information.

I

will thus use the prior information available in

the example to assign plausible values to the prior variance

of the residual effects in

6.

Let

%(r2)

denote the penalized conditional likelihood

(PCL) estimator of

A

obtained from fitting model

(2)

with

p

=

0

and the prior variance fixed at

T~.

The third

column of Table

1

gives results using %(1/8), i.e., with

72

=

{ln(2)/1.96}2

=

1/8. The latter number is derived from the

context by noting that odds ratios below 1/2

or

above

2

are

extremely implausible because the components of e' are odds

ratios for the residual effects for typical intakes of the dietary

items in

X

after regressing out effects mediated by measured

constituents. Taking

r2

=

1/8 corresponds to assigning

95% prior probability to the odds-ratio interval exp(0

&

1.96/8lI2)

=

(1/2,2) for each component of e'. The resulting

point estimates differ little from those in the earlier columns,

but the

PCL

intervals are considerably wider. Unlike the

results from strategies

1

and 2, no mixed-model estimate has

P

<

0.05,

and the precision of certain results in the first two

columns apparently hinges on ignoring residual diet effects.

Thus, mixed modeling indicates that there

is

little information

in the data about effects of individual food constituents once

we allow

for

the possibility of even small residual diet effects.

As

an added benefit, mixed modeling provides intervals

for

coefficients excluded by backward deletion.

The similarity of the point estimates in this example

is

not

coincidental.

A

large change in point estimates upon variable

deletion requires that the deleted variables have strong

relations to both the outcome and the retained variables

(cf., Breslow and Day,

1980,

Chapter 2). Backward deletion

with a high a-to-remove tends to delete only those variables

with a weak relation to the outcome. Conversely, addition of

random coefficients constrained by

a

small

T~

tends to keep

the added coefficients small. Hence, while large changes are

possible, both the backward-deletion and the mixed-model

point estimates tend to stay close to the full-model point

estimates in this example. Nonetheless, the interval estimates

differ profoundly, with the naive backward-deletion intervals

shrinking

as

coefficients are removed and the mixed-model

intervals growing

as

random coefficients are added, in accord

with results on the impact of variable addition on logistic

regression (Robinson and Jewell, 1991).

918

Biometrics,

September

2000

The mixed-model intervals are preferable for causal effects of

X

as

a

source of bias. Suppose instead the goal is to

estimate the effects of the basic covariates in

X.

A

standard

analysis would select columns of

X

for use in

a

logistic:

regression (strategy 1)

or

use all columns of

X

(strategy 2),

e

=

ff

+

xp

+

wo.

inference because model (2) better reflects current lack of

knowledge about the diet residuals

6.

The CML estimate

i?

under strategy 2 equals

?r(O),

the mixed-model estimate

obtained when

6

is given

a

degenerate prior concentrated

at

zero.

As

uncertainty about the size

of

these residuals

increases.

so

does uncertainty about

K.

This relation is

in the model

(3)

illustrated by comparing the third column of Table

1

to the

fourth column, which gives results using the contextually

large value of

r2

=

{ln(4)/1.96}’

=

1/2; this

r2

corresponds

to assigning 95% prior probability to exp(0

i

1.96/2lI2)

=

(1/4,4) for each component of e’. The variances of the

components of

5(r2)

increase without bound

as

r2

+

00,

reflecting the linear dependence of the constituents

XZ

on

the diet items

X.

As with

S,

there is considerable prior information about

TI

and

8

in this example. Bayesian philosophy says one should

use this information to add priors for

K

and

8

to the analysis,

while frequentist theory tells us that the resulting estimators

may be superior to any above if that information is valid.

Whether

or

not one finds these arguments compelling, they

lack one crucial element in the argument for introducing the

prior for

6:

Some constraint on

6

is needed to get

a

sensible

estimate of

T

within model (2) whereas

a

prior for

T

or

8

is

not.

3.2

Should the Prior Variance Be Estimated?

What about uncertainty about

r2

(or,

more generally,

T)?

Because

r2

is

a

parameter

of

the prior for

6,

uncertainty

about

r2

is uncertainty about the uncertainty about

6,

i.e.,

it is uncertainty about which prior we should use for

6.

From

a

subjective Bayesian perspective, this hyperuncertainty

concerns

a

parameter

r2

that indexes different opinions

about

6,

and neither

r2

nor a distribution for

r2

have

any objective meaning with respect to

6.

In other words,

uncertainty about

r2

is nothing more than uncertainty about

prior opinion. With this view, estimation of

r2

is

a

pointless

exercise; instead, uncertainty about

r2

should be addressed

by repeating the analysis using different values,

as

in the last

two columns of Table

1.

Those results suggest that, within

the

6

N

MVN(0,

r21)

prior specification, the main qualitative

inference (no estimate appears incompatible with chance)

should not vary among opinions with

T~

>

1/8.

Consider next

a

frequentist perspective in which one goal

is to minimize expected loss in estimating

K

subject to the

mixed-model specification. We don’t know what value of

rz

will minimize the expected loss

of

?r(r2),

so

we might

attempt to estimate it frcm the data. Because

r’

controls

the degree

of

shrinkage in

6(r2),

this approach accommodates

intuitions that the data should have some say in how much

to shrink

6.

Unfortunately, common estimators for

r2

can

have very poor small-sample properties (Greenland, 1993,

1997); furthermore, the estimates they produce often equal

no one’s prior variance for

6,

in which case the resulting

odds-ratio estimates have

no

contextually relevant Bayesian

interpretation.

For

this reason, if one feels compelled to

estimate

r2,

I

would recommend giving it

a

proper prior

concentrated among contextually reasonable values.

3.3

Mixed Coefficients

So

far, I have assumed that the analysis goal is to estimate

effects of the composite covariates

XZ,

treating any residual

In the example, this is

a

model for effects of the 87 diet items.

Although

,Ll

is identified without further specification,

results

from standard analyses are not credible: Upon fitting the full

model, 29 components of the CML estimate

,8

have

P

<

0.05

and many are absurdly inflated (Witte et al., 1994); after

backward deletion with

a

=

0.10, there is much less inflation,

but 14 of the 20 retained components still have

P

<

0.05.

For

example, the full-model estimate of the odds ratio for eating

two oranges per week is 3.1 (95% confidence limits: 1.2, 8.4);

after backward deletion, the estimate becomes 1.6 (1.2, 2.2).

Much more plausible results can be obtained by exploiting

the information in

2

about food composition to shrink the

CML estimate

of

p

toward the value expected under model

(l), in which foods have no effect beyond that conferred by

their measured constituents. Model 2 with

6

N

MVN(0,T)

is equivalent to

a

two-stage hierarchical (multilevel) model in

which the first stage is model

(3)

and the second stage is

,5=Z.ir+cf.

(4

/3

is now

a

combination of fixed and random coefficients; an

independence structure for the random part,

6,

implies thitt

any prior correlations among the diet effects in

p

are entirely

explained by known differences in constituents of the diet

items. This implication is

a

scientific proposition that was

evaluated against background literature (Witte et al., 1994).

The mixed coefficient

/3

can be estimated by plugging

the mixed-model (model (2)) estimates

%(T~)

and

6(r2)

into

equation (4). Since the estimated random vector

d(~’)

is

shrunk toward the zero vector,

P(r2)

=

Z?r(r2)

+

b(r2)

is an

estimate of

/3

that is shrunk toward

Z5(r2),

that portion of

the estimated dietary effects due to the constituents

2.

With

r2

=

1/8, the overall results appear much more ambiguous

than those from CML

or

backward deletion; e.g., only

4

of

the 87 components of P(l/8) have

P

<

0.05,

and the estimate

of the odds ratio for eating two oranges per week is reduced

to 1.4 (0.93, 2.0). The degree of shrinkage is controlled by

r2:

p(0)

=

Z?,

where

?r

is the CML estimate of

K

under model

(l), whereas

p(r2)

approaches as

r2

increases.

(For

further

illustration of these points in the example, see Witte et al.

(1994)

.)

Use of model (4) does not require prior information

as

detailed as a diet-nutrient matrix. If that matrix had been

unavailable for our analyses, we would have used other, more

crude information to construct

a

second stage (prior) design

matrix

Z.

For

example, we could group the coefficients by

food type (vegetables, fruits, white meats, red meats, etc.);

Z

would then be the matrix of group indicators.

As

before,

the objectives of the prior grouping would be to produce

uncorrelated

or

exchangeable priors for residual effects not

captured by the grouping and to minimize bias in any one

coefficient as

a

result of shrinkage toward an inappropriate

mean (Greenland, 1992). Because we would expect greater

Use

of

Random Coeficients with Epidemiologic Regressions

919

heterogeneity of effects within food-based than within

constituent-based groups, however, we would have used

a

larger value of

T~

(or

a

prior for

r2

with

a

larger mean) with

a

food-based grouping.

4.

Discussion

4.1

Mixed Modeling as an Extension of Established Methods

Epidemiologic regressions occasionally include random effects

that are coefficients for

a

set of group

or

cluster membership

indicators, where the groups are families, geographic areas,

or

sets of repeated observations on single individuals. The

group-indicator coefficients are treated

as

i.i.d.

or

as

having

a

specified correlation structure (e.g., exchangeable) because

the groups are too numerous and small to allow stable

estimation of their coefficients without constraints. In other

words, the indicator coefficients are assumed random for the

same reason as

6

was assumed random in the above example.

This assumption is easily accepted in group-indicator cases

because (i) the group coefficients are usually regarded

as

nuisance parameters, which makes the assumption seem to

be of only indirect importance, and (ii) the groups constitute

a

single natural partition of the observations, which makes

the prior correlational assumptions seem natural in that the

latter reflect symmetries in prior information about the group

effects.

Mixed models extend standard random-effects models to

include prior information about causal and measurement

processes in

a

model for the source of effect correlations

(Greenland, 1992; Searle, Casella, and McCulloch, 1992,

p.

330).

Such modeling can provide shrinkage estimators

superior to the original ridge and James-Stein estimators,

which only shrink toward the origin. Consider estimation

of dietary effects

(,B

in model

(3)).

Shrinking

,B

toward the

origin is equivalent to using model

(4)

with

7r

=

0, an

incorrect restriction. Mixed modeling allows shrinkage of

,B

toward a manifold

Z7r

that is contextually determined, which

increases coherence of the analysis with prior information. By

dropping the incorrect restriction, we should also expect less

bias (at

a

cost of greater variance) in mixed modeling than in

classical shrinkage while retaining lower mean-squared error

than unconstrained ML estimation.

4.2

The Constraints of Unconstrained

ML

Standard epidemiologic analyses often begin and usually

end with fixed-effects logistic regression fit by unconstrained

maximum likelihood

(ML).

Unconstrained

ML

is often

defended against shrinkage and Bayesian estimation with

claims that it is unbiased and free from dependence on prior

information. These claims are misleading because they are

based on the assumption that the correct model is known

and is the only model used in the analysis. In epidemiology,

this assumption is always highly unrealistic,

as

in the example

of estimating the constituent effects in

7r.

Unconstrained ML

forces use of an inadequately small fixed-effects model (such

as

model (1)

or

a

backward-deletion model), whereas shrinkage

allows use of

a

much richer mixed model (such

as

model (2)

with random

6).

In practice, then, ML tends to suffer from

more bias due to model restrictions.

There is

a

sense in which this bias reflects an enhanced

dependence of unconstrained ML on prior information. Every

nonexperimental inference is

a

function of prior information

and data. Although unconstrained ML uses no explicit prior,

it does use

a

prior in the form of restrictions on the class

of models available for the analysis (Learner, 1978; Robins

and Greenland, 1986). Mixed modeling expands that class

and thus can reduce bias from incorrect model restrictions

while facilitating use of plausible restrictions,

as

in the above

example. Classical solutions to such problems involve sharp

constraints, such

as

setting coefficients to zero

or

imposing

absolute bounds, which do not reflect the vagueness of

true prior information and which make valid uncertainty

evaluation difficult.

A

smooth prior can be viewed

as

a

probabilistic constraint requiring no sharp bounds. Mixed

modeling represents

a

convenient means of imposing such

fuzzy constraints.

4.3

The Parsimony Problem and Model Selection

The above arguments for model expansion using random

coefficients oppose the usual parsimony principle, which says

to seek the simplest model for the job. When one attempts

a

causal analysis of complex and poorly understood relations

from observations made without the benefit of randomization

(as

in most of epidemiology), models need to be complex

to capture uncertainty about the relations. In other words,

an honest uncertainty assessment requires parameters for

all effects that we know may be present. This advice is

implicit in an antiparsimony principle often attributed to

L.

J.

Savage, “All models should be

as

big

as

an elephant”

(see Draper, 1995). When we attempt to operationalize this

advice with conventional regression tools, however, we run

into another problem-you can’t estimate anything well if you

try to estimate everything simultaneously without constraints

(illustrated by the

fact

that

7r

in model

(2)

is not even

identified without constraints). This problem drives analysts

to search for simple models even if they do not explicitly adopt

parsimony

as

a

principle. Mixed models offer an alternative

to purely data-driven model simplification and consequent

uncertainty understatement.

Results will be sensitive to reasonable model choices

whenever one can envision more important parameters than

can be identified from the data. This problem is often handled

with mechanical selection algorithms that ignore all context

and produce models that exclude important parameters.

The true sensitivity of causal inferences to model choice

is concealed because these algorithms avoid the territory

of underidentified models.

To

address this problem, some

authors add

a

single unidentified parameter for unmeasured

effects to

a

simple model and examine sensitivity

of

results to

variations in this parameter (Rosenbaum, 1995; Copas and Li,

1997; Robins, Rotnitzsky, and Scharfstein, 1999), analogous

to the use of

r2

above. These methods are

a

welcome advance

beyond the usual approach, but

as

implemented to date, they

do not incorporate prior information

as

rich

as

that in the

above example.

Many analysts recognize that no causal inference is possible

from nonexperimental data without external identifying

constraints. Placing distributions on coefficients provides

more flexible and hence less unrealistic constraints than

excluding them entirely. This flexibility can also be

advantageous in pure prediction problems, for it allows one

to move beyond

the

all-or-none approach

of

variable selection.

Under

a

mean-zero, variance

r2

specification for

a

coefficient,

in

Causal

Inference

When Should Epidemiologic Regressions Use Random Coefficients

Citations

Cites background from "When Should Epidemiologic Regressio..."

References

"When Should Epidemiologic Regressio..." refers background in this paper

"When Should Epidemiologic Regressio..." refers background in this paper

Related Papers (5)