scispace - formally typeset
Search or ask a question
Journal ArticleDOI

When Should Epidemiologic Regressions Use Random Coefficients

01 Sep 2000-Biometrics (Biometrics)-Vol. 56, Iss: 3, pp 915-921
TL;DR: It is argued that regression models with random coefficients offer a more scientifically defensible framework for epidemiologic analysis than the fixed-effects models now prevalent in epidemiology.
Abstract: Regression models with random coefficients arise naturally in both frequentist and Bayesian approaches to estimation problems. They are becoming widely available in standard computer packages under the headings of generalized linear mixed models, hierarchical models, and multilevel models. I here argue that such models offer a more scientifically defensible framework for epidemiologic analysis than the fixed-effects models now prevalent in epidemiology. The argument invokes an antiparsimony principle attributed to L. J. Savage, which is that models should be rich enough to reflect the complexity of the relations under study. It also invokes the countervailing principle that you cannot estimate anything if you try to estimate everything (often used to justify parsimony). Regression with random coefficients offers a rational compromise between these principles as well as an alternative to analyses based on standard variable-selection algorithms and their attendant distortion of uncertainty assessments. These points are illustrated with an analysis of data on diet, nutrition, and breast cancer.

Content maybe subject to copyright    Report

BIOMETRICS
56, 915-921
September
2000
When Should Epidemiologic Regressions
Use
Random
Coefficients?
Sander Greenland
Department
of
Epidemiology,
UCLA
School
of
Public
Health,
Los
Angeles,
California
90095-1772,
U.S.A.
SUMMARY.
Regression models with random coefficients arise naturally in both frequentist and Bayesian
approaches to estimation problems. They are becoming widely available in standard computer packages
under the headings of generalized linear mixed models, hierarchical models, and multilevel models.
I
here
argue that such models offer
a
more scientifically defensible framework for epidemiologic analysis than
the fixed-effects models now prevalent in epidemiology. The argument invokes an antiparsimony principle
attributed to
L.
J.
Savage, which is that models should be rich enough to reflect the complexity of the
relations under study. It also invokes the countervailing principle that you cannot estimate anything if you
try to estimate everything (often used to justify parsimony). Regression with random coefficients offers
a
rational compromise between these principles
as
well
as
an alternative to analyses based on standard
variable-selection algorithms and their attendant distortion of uncertainty assessments. These points are
illustrated with an analysis of data on diet, nutrition, and breast cancer.
KEY
WORDS:
Bayesian statistics; Causal inference; Empirical Bayes estimators; Epidemiologic methods;
Hierarchical regression; Mixed models; Multilevel modeling; Random-coefficient regression; Relative risk;
Risk assessment; Shrinkage; Variance components.
1.
Introduction
When should epidemiologic regressions use random coeffi-
cients?
I
will argue that they are advisable whenever the
analysis objective is estimation of multiple causal effects and
some sort of dimensionality-reduction strategy is needed. My
arguments are not of mathematical
or
simulation form be-
cause there are many technical studies that support my the-
sis (cf., the citations in Greenland
(1998,
p.
428-430));
I
will
instead focus on the scientific advantages of mixed model-
ing that those studies reflect.
I
have derived these arguments
from writings of Box
(1976),
Leamer
(1978),
Good
(1983),
and other pragmatic Bayesians or Bayesians with reservations
and compromises (e.g., Rubin,
1984;
Draper,
1995),
though
any oversights are my own. What follows is an attempt to
apply these ideas in epidemiology, an often controversial and
idiosyncratic field whose importance is recognized but whose
use of statistics remains largely primitive; implementation de-
tails can be found in textbooks under the topic of hierarchi-
cal modeling (e.g., Gelman et al.,
1995,
Section
13.4;
Leonard
and Hsu,
1999,
Section
6.3)
though not at
a
level accessible
to most epidemiologists.
Causal effects are usually underidentified by epidemiologic
data in that any realistic model for the effects cannot be fit
without constraints. This underidentification is concealed by
routine analysis strategies but can be addressed openly using
models with random coefficients. The issue is important to
so-
ciety at large because of the seriousness with which the public
and lay press often respond to epidemiologic studies (Taubes,
1995).
For
example, massive lawsuits often result from weak
suggestions of hazards while dietary fads get launched by even
weaker data.
I
attribute some
of
this problem to inappropriate
modeling strategies that are common in epidemiology today.
The example below is intended
to
show how these strategies
lead to illusory significant results.
I
have encountered others
in which this occurs, and
I
believe many reported findings (in-
cluding several cited in Taubes
(1995))
contain similar mod-
eling artifacts.
I
will not contrast fitting methods, which have been the
fc-
cus of much research. That work, though important, has far
outpaced work on connecting models to the scientific context
(Hodges,
1996;
Mallows,
1998).
Nor
will
I
address issues of
model-form uncertainty
or
pure (noncausal) prediction mod-
eling,
as
considered, e.g., in the literature on model averaging
(e.g., Draper,
1995;
Raftery,
1996;
Buckland, Burnham, and
Augustin,
1997),
although mixed modeling can be viewed
as
a model-averaging method (Greenland,
1998, 1999).
2.
Complete Confounding in a Study
of
The example is from
a
casecontrol study of diet, food con-
stituents, and breast cancer (Witte et al.,
1994);
controls are
sisters of cases and
so
the data comprise matched sets with
one to five sister controls.
The variables include intakes of
35
food constituents (nutrients and suspected carcinogens)
computed from
87
diet questionnaire items plus five potential
confounders. This study is typical of many: The number of
subjects
(140
cases,
222
controls) is not much larger than the
number of variables.
(For
further study details, see Ursin
et
al.
(1992).)
I
will assume for now that only the food constituents
are
of
interest. This still leaves
a
dimensionality problem,
as
Food Constituents and Breast Cancer
915

916
Biometrics,
one should expect with
35
primary plus
5
confounding covari-
ates and only
140
cases
(3.5
cases per covariate) available for
analysis.
Standard analyses employ conditional logistic modeling
with one of the following strategies:
(1)
Use all
35
food constituents
as
candidate variables for
some sort of data-based variable-selection procedure,
such
as
stepwise regression, forcing in the five con-
founders (sometimes the confounders are also subject
to selection based on significance testing, but this prac-
tice has been condemned for leaving important con-
founders uncontrolled (Greenland and Neutra,
1980)).
(2)
Force all
35
food constituents and the
5
potential con-
founders into
a
single model and (if it fits) base infer-
ence on this model.
Strategy
1
can be condemned on the grounds that (i) the
food constituents are strongly correlated and hence estimates
from reduced subsets may be confounded by excluded vari-
ables, even if the latter are nonsignificant, and (ii) data-based
variable selection leads to nonnormal estimators and to severe
downward bias in the P-values and standard errors that come
from the final model (e.g., see the studies cited in Buckland et
al.
(1997)
and Greenland
(1998,
p.
402)).
Bootstrapping the
selection procedure is occasionally used to address problem
(ii), but this approach has its own problems (Reedman, Na-
vidi, and Peters,
1988).
Strategy
2
has also been promoted to
avoid the shortcomings of strategy
1
but depends on asymp-
totics whose applicability is dubious given the caselcovariate
ratio (exact logistic programs exist, but such large problems
remain beyond their reach). Here, however,
I
will focus on
a
major problem for causal inference that is overlooked by all
these strategies, i.e., confounding by residual dietary effects.
To describe this problem, let
X
represent the
362
x
87
di-
etary data matrix, let
W
be the
362
x
5
confounder data
matrix, and let
Z
=
{z3k}
be the
87
x
35
composition matrix
for the diet items; element
zjk
is the amount of constituent
k
found in one unit of diet item
j.
Thus,
Z
is
the table of
September
2000
contents for the diet items and
XZ
is the
362
x
35
matrix
giving the constituent intakes
for
the subjects. Letting Y be
the vector of subject-specific disease indicators, the logistic
model underlying the above strategies may be written
l=
logit{E(Y
I
x,
2,
W)}
=
a:
+
XZT
+
We,
(1)
where
T
is
the target parameter vector of constituent coeffi-
cients and
a
is
a
vector of nuisance parameters that are con-
stant within matched
sets.
Strategy
2
uses model
(1)
in its
entirety, whereas strategy
1
uses the data to select columns
of
XZ
for use in
a
reduced model. The models are
fit
by con-
ditional maximum-likelihood to eliminate
a,
and effects are
measured by the vector of odds ratios en (Breslow and Day,
1980).
The first column of Table
1
presents selected results from
applying strategy
1
to the fo.od constituents using backward
deletion with a-to-remove
=
0.10; 15
of the
35
constituents are
retained, and
11
of these have
P
<
0.05.
The second column
presents conditional maximum-likelihood (CML) estimates
of
odds ratios from strategy
2
(fit the full model); only
2
of the
35
coefficients have
P
<
0.05.
The first four food constituents
are shown because they have received considerable publicity
as
potential factors in carcinogenesis (possibly protective for
03
fatty acids, &carotene, and phytoestrogens and possibly
causal for alcohol). The differences in the point estimates from
strategies
1
and
2
are trivial relative to the confidence-interval
widths, but the intervals from the full model are meaningfully
wider for
C23
fatty acids and for alcohol. The differences in
widths are unsurprising given the downward bias in standard
errors estimated from data-selected models. The latter consid-
eration should be enough to make one prefer the full-model
intervals over the backward-deletion intervals.
I
will argue,
however, that even the full-model intervals are misleadingly
narrow.
Use of model
(1)
implicitly assumes absence of any effects
of
the diet variables
X
beyond the logit-linear effects mediated
through the constituents in
2.
There is no scientific basis
for this assumption, and there are good reasons to reject it.
Table
1
Estimates
of
odds ratios
en
from conditional logistic regressions
of
breast cancer
on
food constituents
(95%
confidence limits
in
parentheses); five potential confounders forced
into
each model
Model
With random diet residuals
Backward CML, all
35
deletion” constituents
7’
=
1/8
T2
=
112
03
fatty acids
,&carotene
Phytoestrogens
Alcohol
(3
oz./day)
Carbohydrate
(glday)
(mglday)
(mg/day)
(100
g/day)
0.77
(0.65, 0.92)
1.1
0.80
(0.70, 0.92)
0.94
(0.88, 1.00)
1
(deleted)
(0.99, 1.2)
0.71
(0.46, 1.1)
1.2
(1.01, 1.3)
0.73
(0.58, 0.93)
0.89
(0.63, 1.3)
0.97
(0.79, 1.2)
0.58
(0.17, 2.0)
1.1
(0.81, 1.6)
0.73
(0.40,
1.3)
0.93
(0.37, 2.3)
0.99
(0.58, 1.7)
0.49
(0.06, 4.3)
1.2
(0.64, 2.1)
0.72
(0.26, 1.9)
0.91
(0.18, 4.6)
1.0
(0.39, 2.6)
a
a-to-remove
=
0.10;
15
food
constituents retained.

Use
of
Random Coefficients with Epidemiologic Regressions
917
Dietary factors that may influence health continue to be dis-
covered, and their effects are not captured by
ir.
While the
individual effects of single omitted factors are likely to be
small,
so
are the effects under study. Furthermore, the aggre-
gate confounding due to the omitted effects may be impor-
tant because of the high positive correlations among healthy
dietary habits.
TO
account for this confounding problem, consider the ex-
panded model
e
=
0:
+xzT+
xs+
we.
The term
X6
is
intended to capture the residual diet-item
effects. Because
XZ
is
a linear function of
X,
however, the
constituent and diet effects are completely confounded in that
model (2) is not identified without side constraints. This non-
identification reflects the following fact: To control for other
dietary effects using a fixed-effects-only model, one would
have to measure the constituents responsible for those effects
and add them to model (1); without such measurements, the
effects of the measured constituents
Z
are not logically sepa-
rable from other dietary effects because those constituents are
measured only through diet variables in
X.
Standard analyses
of nutrient effects dodge this logical problem by not looking
beyond model (1). Of the two models, however, model (2)
is the only scientifically reasonable one for effect estimation.
Use of model (1) corresponds to imposing the implausible con-
straint
S
=
0
on model (2), which leads to understatement of
uncertainty about e*
.
Underidentified structures like model
(2)
are common in
epidemiology. Other examples include occupational studies in
which
X
contains job histories and
Z
is
a
matrix of expo-
sure levels within jobs, exercise studies in which
X
contains
physical-activity histories and
Z
is a vector of metabolic ex-
penditures of activities, and other studies in which
X
con-
tains questionnaire items and
Z
is
a
matrix that transforms
the items into quantities of focal interest. Most often, the po-
tential effects of
X
items not captured by
XZ
are ignored;
occasionally, items from
X
may be tested and added in a
forward-selection strategy, although the number that can be
added in this way is severely limited by the linear dependence
of
XZ
on
X.
3.
A
Mixed-Modeling
Approach
3.1
A
Famaly
of
Estimators
By treating
S
as
a vector of random coefficients, we can
achieve identification using less restrictive and more plausible
constraints than setting components of
6
to
0.
Perhaps
the simplest way to do
so
is to treat model
(2)
as
a
mixed model by specifying
6
-
MVN(p,T),
where
p
and
T
are known or are simple functions of a few unknown
parameters.
I
will here use
p
=
0,
T
=
7'1;
a more
realistic prior would have the diagonal elements of
T
vary
with diet item (indeed, Witte et al. (1994) constructed a
more complex prior for
6
based on extensive review of the
background nutrition and e idemiology literature). The fact
that the components of eB represent residual odds ratios
after regressing out food-constituent effects makes the zero-
correlation (diagonal
2')
assumption reasonable, because prior
correlations among the diet-item effects are, for the most
part, due to shared constituents. The normality of the prior is
chiefly for computational ease and could be replaced by other
(2)
assumptions if one had skill with software for Monte
Car10 fitting. Assuming normality, however, leads to simple
fitting methods such
as
restricted generalized least squares
(Goldstein, 1995), restricted maximum likelihood (Wolfinger
and O'Connell, 1993), penalized likelihood with
a
quadratic
penalty
for
6
(Breslow and Clayton, 1993; Greenland, 1997),
data augmentation (Bedrick, Christensen, and Johnson,
1996), and ridge regression with ridge parameters for
S
proportional to
1/~~
(Titterington, 1985).
Discussions of penalized likelihood and ridge regression
often treat
1/~~
as
a tuning
or
smoothing parameter for
solving an ill-conditioned regression problem rather than
as
an inverse variance component, and thus may appear to
finesse the problem
of
specifying
a
coefficient distribution.
Nonetheless, from a Bayesian perspective, such a distribution
is implicit in these methods (Learner, 1978) and the
tuning parameter should reflect the precision of background
information.
I
will thus use the prior information available in
the example to assign plausible values to the prior variance
of the residual effects in
6.
Let
%(r2)
denote the penalized conditional likelihood
(PCL) estimator of
A
obtained from fitting model
(2)
with
p
=
0
and the prior variance fixed at
T~.
The third
column of Table
1
gives results using %(1/8), i.e., with
72
=
{ln(2)/1.96}2
=
1/8. The latter number is derived from the
context by noting that odds ratios below 1/2
or
above
2
are
extremely implausible because the components of e' are odds
ratios for the residual effects for typical intakes of the dietary
items in
X
after regressing out effects mediated by measured
constituents. Taking
r2
=
1/8 corresponds to assigning
95% prior probability to the odds-ratio interval exp(0
&
1.96/8lI2)
=
(1/2,2) for each component of e'. The resulting
point estimates differ little from those in the earlier columns,
but the
PCL
intervals are considerably wider. Unlike the
results from strategies
1
and 2, no mixed-model estimate has
P
<
0.05,
and the precision of certain results in the first two
columns apparently hinges on ignoring residual diet effects.
Thus, mixed modeling indicates that there
is
little information
in the data about effects of individual food constituents once
we allow
for
the possibility of even small residual diet effects.
As
an added benefit, mixed modeling provides intervals
for
coefficients excluded by backward deletion.
The similarity of the point estimates in this example
is
not
coincidental.
A
large change in point estimates upon variable
deletion requires that the deleted variables have strong
relations to both the outcome and the retained variables
(cf., Breslow and Day,
1980,
Chapter 2). Backward deletion
with a high a-to-remove tends to delete only those variables
with a weak relation to the outcome. Conversely, addition of
random coefficients constrained by
a
small
T~
tends to keep
the added coefficients small. Hence, while large changes are
possible, both the backward-deletion and the mixed-model
point estimates tend to stay close to the full-model point
estimates in this example. Nonetheless, the interval estimates
differ profoundly, with the naive backward-deletion intervals
shrinking
as
coefficients are removed and the mixed-model
intervals growing
as
random coefficients are added, in accord
with results on the impact of variable addition on logistic
regression (Robinson and Jewell, 1991).

918
Biometrics,
September
2000
The mixed-model intervals are preferable for causal effects of
X
as
a
source of bias. Suppose instead the goal is to
estimate the effects of the basic covariates in
X.
A
standard
analysis would select columns of
X
for use in
a
logistic:
regression (strategy 1)
or
use all columns of
X
(strategy 2),
e
=
ff
+
xp
+
wo.
inference because model (2) better reflects current lack of
knowledge about the diet residuals
6.
The CML estimate
i?
under strategy 2 equals
?r(O),
the mixed-model estimate
obtained when
6
is given
a
degenerate prior concentrated
at
zero.
As
uncertainty about the size
of
these residuals
increases.
so
does uncertainty about
K.
This relation is
in the model
(3)
illustrated by comparing the third column of Table
1
to the
fourth column, which gives results using the contextually
large value of
r2
=
{ln(4)/1.96}’
=
1/2; this
r2
corresponds
to assigning 95% prior probability to exp(0
i
1.96/2lI2)
=
(1/4,4) for each component of e’. The variances of the
components of
5(r2)
increase without bound
as
r2
+
00,
reflecting the linear dependence of the constituents
XZ
on
the diet items
X.
As with
S,
there is considerable prior information about
TI
and
8
in this example. Bayesian philosophy says one should
use this information to add priors for
K
and
8
to the analysis,
while frequentist theory tells us that the resulting estimators
may be superior to any above if that information is valid.
Whether
or
not one finds these arguments compelling, they
lack one crucial element in the argument for introducing the
prior for
6:
Some constraint on
6
is needed to get
a
sensible
estimate of
T
within model (2) whereas
a
prior for
T
or
8
is
not.
3.2
Should the Prior Variance Be Estimated?
What about uncertainty about
r2
(or,
more generally,
T)?
Because
r2
is
a
parameter
of
the prior for
6,
uncertainty
about
r2
is uncertainty about the uncertainty about
6,
i.e.,
it is uncertainty about which prior we should use for
6.
From
a
subjective Bayesian perspective, this hyperuncertainty
concerns
a
parameter
r2
that indexes different opinions
about
6,
and neither
r2
nor a distribution for
r2
have
any objective meaning with respect to
6.
In other words,
uncertainty about
r2
is nothing more than uncertainty about
prior opinion. With this view, estimation of
r2
is
a
pointless
exercise; instead, uncertainty about
r2
should be addressed
by repeating the analysis using different values,
as
in the last
two columns of Table
1.
Those results suggest that, within
the
6
N
MVN(0,
r21)
prior specification, the main qualitative
inference (no estimate appears incompatible with chance)
should not vary among opinions with
T~
>
1/8.
Consider next
a
frequentist perspective in which one goal
is to minimize expected loss in estimating
K
subject to the
mixed-model specification. We don’t know what value of
rz
will minimize the expected loss
of
?r(r2),
so
we might
attempt to estimate it frcm the data. Because
r’
controls
the degree
of
shrinkage in
6(r2),
this approach accommodates
intuitions that the data should have some say in how much
to shrink
6.
Unfortunately, common estimators for
r2
can
have very poor small-sample properties (Greenland, 1993,
1997); furthermore, the estimates they produce often equal
no one’s prior variance for
6,
in which case the resulting
odds-ratio estimates have
no
contextually relevant Bayesian
interpretation.
For
this reason, if one feels compelled to
estimate
r2,
I
would recommend giving it
a
proper prior
concentrated among contextually reasonable values.
3.3
Mixed Coefficients
So
far, I have assumed that the analysis goal is to estimate
effects of the composite covariates
XZ,
treating any residual
In the example, this is
a
model for effects of the 87 diet items.
Although
,Ll
is identified without further specification,
results
from standard analyses are not credible: Upon fitting the full
model, 29 components of the CML estimate
,8
have
P
<
0.05
and many are absurdly inflated (Witte et al., 1994); after
backward deletion with
a
=
0.10, there is much less inflation,
but 14 of the 20 retained components still have
P
<
0.05.
For
example, the full-model estimate of the odds ratio for eating
two oranges per week is 3.1 (95% confidence limits: 1.2, 8.4);
after backward deletion, the estimate becomes 1.6 (1.2, 2.2).
Much more plausible results can be obtained by exploiting
the information in
2
about food composition to shrink the
CML estimate
of
p
toward the value expected under model
(l), in which foods have no effect beyond that conferred by
their measured constituents. Model 2 with
6
N
MVN(0,T)
is equivalent to
a
two-stage hierarchical (multilevel) model in
which the first stage is model
(3)
and the second stage is
,5=Z.ir+cf.
(4
/3
is now
a
combination of fixed and random coefficients; an
independence structure for the random part,
6,
implies thitt
any prior correlations among the diet effects in
p
are entirely
explained by known differences in constituents of the diet
items. This implication is
a
scientific proposition that was
evaluated against background literature (Witte et al., 1994).
The mixed coefficient
/3
can be estimated by plugging
the mixed-model (model (2)) estimates
%(T~)
and
6(r2)
into
equation (4). Since the estimated random vector
d(~’)
is
shrunk toward the zero vector,
P(r2)
=
Z?r(r2)
+
b(r2)
is an
estimate of
/3
that is shrunk toward
Z5(r2),
that portion of
the estimated dietary effects due to the constituents
2.
With
r2
=
1/8, the overall results appear much more ambiguous
than those from CML
or
backward deletion; e.g., only
4
of
the 87 components of P(l/8) have
P
<
0.05,
and the estimate
of the odds ratio for eating two oranges per week is reduced
to 1.4 (0.93, 2.0). The degree of shrinkage is controlled by
r2:
p(0)
=
Z?,
where
?r
is the CML estimate of
K
under model
(l), whereas
p(r2)
approaches as
r2
increases.
(For
further
illustration of these points in the example, see Witte et al.
(1994)
.)
Use of model (4) does not require prior information
as
detailed as a diet-nutrient matrix. If that matrix had been
unavailable for our analyses, we would have used other, more
crude information to construct
a
second stage (prior) design
matrix
Z.
For
example, we could group the coefficients by
food type (vegetables, fruits, white meats, red meats, etc.);
Z
would then be the matrix of group indicators.
As
before,
the objectives of the prior grouping would be to produce
uncorrelated
or
exchangeable priors for residual effects not
captured by the grouping and to minimize bias in any one
coefficient as
a
result of shrinkage toward an inappropriate
mean (Greenland, 1992). Because we would expect greater

Use
of
Random Coeficients with Epidemiologic Regressions
919
heterogeneity of effects within food-based than within
constituent-based groups, however, we would have used
a
larger value of
T~
(or
a
prior for
r2
with
a
larger mean) with
a
food-based grouping.
4.
Discussion
4.1
Mixed Modeling as an Extension of Established Methods
Epidemiologic regressions occasionally include random effects
that are coefficients for
a
set of group
or
cluster membership
indicators, where the groups are families, geographic areas,
or
sets of repeated observations on single individuals. The
group-indicator coefficients are treated
as
i.i.d.
or
as
having
a
specified correlation structure (e.g., exchangeable) because
the groups are too numerous and small to allow stable
estimation of their coefficients without constraints. In other
words, the indicator coefficients are assumed random for the
same reason as
6
was assumed random in the above example.
This assumption is easily accepted in group-indicator cases
because (i) the group coefficients are usually regarded
as
nuisance parameters, which makes the assumption seem to
be of only indirect importance, and (ii) the groups constitute
a
single natural partition of the observations, which makes
the prior correlational assumptions seem natural in that the
latter reflect symmetries in prior information about the group
effects.
Mixed models extend standard random-effects models to
include prior information about causal and measurement
processes in
a
model for the source of effect correlations
(Greenland, 1992; Searle, Casella, and McCulloch, 1992,
p.
330).
Such modeling can provide shrinkage estimators
superior to the original ridge and James-Stein estimators,
which only shrink toward the origin. Consider estimation
of dietary effects
(,B
in model
(3)).
Shrinking
,B
toward the
origin is equivalent to using model
(4)
with
7r
=
0, an
incorrect restriction. Mixed modeling allows shrinkage of
,B
toward a manifold
Z7r
that is contextually determined, which
increases coherence of the analysis with prior information. By
dropping the incorrect restriction, we should also expect less
bias (at
a
cost of greater variance) in mixed modeling than in
classical shrinkage while retaining lower mean-squared error
than unconstrained ML estimation.
4.2
The Constraints of Unconstrained
ML
Standard epidemiologic analyses often begin and usually
end with fixed-effects logistic regression fit by unconstrained
maximum likelihood
(ML).
Unconstrained
ML
is often
defended against shrinkage and Bayesian estimation with
claims that it is unbiased and free from dependence on prior
information. These claims are misleading because they are
based on the assumption that the correct model is known
and is the only model used in the analysis. In epidemiology,
this assumption is always highly unrealistic,
as
in the example
of estimating the constituent effects in
7r.
Unconstrained ML
forces use of an inadequately small fixed-effects model (such
as
model (1)
or
a
backward-deletion model), whereas shrinkage
allows use of
a
much richer mixed model (such
as
model (2)
with random
6).
In practice, then, ML tends to suffer from
more bias due to model restrictions.
There is
a
sense in which this bias reflects an enhanced
dependence of unconstrained ML on prior information. Every
nonexperimental inference is
a
function of prior information
and data. Although unconstrained ML uses no explicit prior,
it does use
a
prior in the form of restrictions on the class
of models available for the analysis (Learner, 1978; Robins
and Greenland, 1986). Mixed modeling expands that class
and thus can reduce bias from incorrect model restrictions
while facilitating use of plausible restrictions,
as
in the above
example. Classical solutions to such problems involve sharp
constraints, such
as
setting coefficients to zero
or
imposing
absolute bounds, which do not reflect the vagueness of
true prior information and which make valid uncertainty
evaluation difficult.
A
smooth prior can be viewed
as
a
probabilistic constraint requiring no sharp bounds. Mixed
modeling represents
a
convenient means of imposing such
fuzzy constraints.
4.3
The Parsimony Problem and Model Selection
The above arguments for model expansion using random
coefficients oppose the usual parsimony principle, which says
to seek the simplest model for the job. When one attempts
a
causal analysis of complex and poorly understood relations
from observations made without the benefit of randomization
(as
in most of epidemiology), models need to be complex
to capture uncertainty about the relations. In other words,
an honest uncertainty assessment requires parameters for
all effects that we know may be present. This advice is
implicit in an antiparsimony principle often attributed to
L.
J.
Savage, “All models should be
as
big
as
an elephant”
(see Draper, 1995). When we attempt to operationalize this
advice with conventional regression tools, however, we run
into another problem-you can’t estimate anything well if you
try to estimate everything simultaneously without constraints
(illustrated by the
fact
that
7r
in model
(2)
is not even
identified without constraints). This problem drives analysts
to search for simple models even if they do not explicitly adopt
parsimony
as
a
principle. Mixed models offer an alternative
to purely data-driven model simplification and consequent
uncertainty understatement.
Results will be sensitive to reasonable model choices
whenever one can envision more important parameters than
can be identified from the data. This problem is often handled
with mechanical selection algorithms that ignore all context
and produce models that exclude important parameters.
The true sensitivity of causal inferences to model choice
is concealed because these algorithms avoid the territory
of underidentified models.
To
address this problem, some
authors add
a
single unidentified parameter for unmeasured
effects to
a
simple model and examine sensitivity
of
results to
variations in this parameter (Rosenbaum, 1995; Copas and Li,
1997; Robins, Rotnitzsky, and Scharfstein, 1999), analogous
to the use of
r2
above. These methods are
a
welcome advance
beyond the usual approach, but
as
implemented to date, they
do not incorporate prior information
as
rich
as
that in the
above example.
Many analysts recognize that no causal inference is possible
from nonexperimental data without external identifying
constraints. Placing distributions on coefficients provides
more flexible and hence less unrealistic constraints than
excluding them entirely. This flexibility can also be
advantageous in pure prediction problems, for it allows one
to move beyond
the
all-or-none approach
of
variable selection.
Under
a
mean-zero, variance
r2
specification for
a
coefficient,
in
Causal
Inference

Citations
More filters
BookDOI
01 Jan 2006
TL;DR: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas.
Abstract: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas. Regression models are also used to adjust for patient heterogeneity in randomized clinical trials, to obtain tests that are more powerful and valid than unadjusted treatment comparisons.

4,211 citations

Journal ArticleDOI
TL;DR: It is demonstrated in particular that fixed effect meta‐regression is likely to produce seriously misleading results in the presence of heterogeneity, and the permutation test is recommended before a statistically significant relationship is claimed from a standard meta-regression analysis.
Abstract: Meta-regression has become a commonly used tool for investigating whether study characteristics may explain heterogeneity of results among studies in a systematic review. However, such explorations of heterogeneity are prone to misleading false-positive results. It is unclear how many covariates can reliably be investigated, and how this might depend on the number of studies, the extent of the heterogeneity and the relative weights awarded to the different studies. Our objectives in this paper are two-fold. First, we use simulation to investigate the type I error rate of meta-regression in various situations. Second, we propose a permutation test approach for assessing the true statistical significance of an observed meta-regression finding. Standard meta-regression methods suffer from substantially inflated false-positive rates when heterogeneity is present, when there are few studies and when there are many covariates. These are typical of situations in which meta-regressions are routinely employed. We demonstrate in particular that fixed effect meta-regression is likely to produce seriously misleading results in the presence of heterogeneity. The permutation test appropriately tempers the statistical significance of meta-regression findings. We recommend its use before a statistically significant relationship is claimed from a standard meta-regression analysis.

1,076 citations

Journal ArticleDOI
TL;DR: A simulation approach was used to clarify the application of random effects under three common situations for telemetry studies and found that random intercepts accounted for unbalanced sample designs, and models withrandom intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection.
Abstract: 1. Resource selection estimated by logistic regression is used increasingly in studies to identify critical resources for animal populations and to predict species occurrence. 2. Most frequently, individual animals are monitored and pooled to estimate population-level effects without regard to group or individual-level variation. Pooling assumes that both observations and their errors are independent, and resource selection is constant given individual variation in resource availability. 3. Although researchers have identified ways to minimize autocorrelation, variation between individuals caused by differences in selection or available resources, including functional responses in resource selection, have not been well addressed. 4. Here we review random-effects models and their application to resource selection modelling to overcome these common limitations. We present a simple case study of an analysis of resource selection by grizzly bears in the foothills of the Canadian Rocky Mountains with and without random effects. 5. Both categorical and continuous variables in the grizzly bear model differed in interpretation, both in statistical significance and coefficient sign, depending on how a random effect was included. We used a simulation approach to clarify the application of random effects under three common situations for telemetry studies: (a) discrepancies in sample sizes among individuals; (b) differences among individuals in selection where availability is constant; and (c) differences in availability with and without a functional response in resource selection. 6. We found that random intercepts accounted for unbalanced sample designs, and models with random intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection. Our empirical example and simulations demonstrate how including random effects in resource selection models can aid interpretation and address difficult assumptions limiting their generality. This approach will allow researchers to appropriately estimate marginal (population) and conditional (individual) responses, and account for complex grouping, unbalanced sample designs and autocorrelation.

718 citations


Cites background from "When Should Epidemiologic Regressio..."

  • ...Thus, we believe that measures of model fit will be critical to assessing where functional responses in RSF occur when there is no a priori decision to consider particular random effects (see also Greenland 2000)....

    [...]

  • ...Thus, we believe that measures of model fit will be critical to assessing where functional responses in RSF occur when there is no a priori decision to consider particular random effects (see also Greenland 2000 )....

    [...]

Journal ArticleDOI
TL;DR: The present article describes how most of these methods for estimating risk ratios from adjusted odds ratios when the outcome is common can be subsumed under a general formulation that also encompasses traditional standardization methods and methods for projecting the impact of partially successful interventions.
Abstract: Some recent articles have discussed biased methods for estimating risk ratios from adjusted odds ratios when the outcome is common, and the problem of setting confidence limits for risk ratios. These articles have overlooked the extensive literature on valid estimation of risks, risk ratios, and risk differences from logistic and other models, including methods that remain valid when the outcome is common, and methods for risk and rate estimation from case-control studies. The present article describes how most of these methods can be subsumed under a general formulation that also encompasses traditional standardization methods and methods for projecting the impact of partially successful interventions. Approximate variance formulas for the resulting estimates allow interval estimation; these intervals can be closely approximated by rapid simulation procedures that require only standard software functions.

671 citations

Journal ArticleDOI
TL;DR: The Bayesian false-discovery probability (BFDP) shares the ease of calculation of the recently proposed false-positive report probability (FPRP) but uses more information, has a noteworthy threshold defined naturally in terms of the costs of false discovery and nondiscovery, and has a sound methodological foundation.
Abstract: In light of the vast amounts of genomic data that are now being generated, we propose a new measure, the Bayesian false-discovery probability (BFDP), for assessing the noteworthiness of an observed association. BFDP shares the ease of calculation of the recently proposed false-positive report probability (FPRP) but uses more information, has a noteworthy threshold defined naturally in terms of the costs of false discovery and nondiscovery, and has a sound methodological foundation. In addition, in a multiple-testing situation, it is straightforward to estimate the expected numbers of false discoveries and false nondiscoveries. We provide an in-depth discussion of FPRP, including a comparison with the q value, and examine the empirical behavior of these measures, along with BFDP, via simulation. Finally, we use BFDP to assess the association between 131 single-nucleotide polymorphisms and lung cancer in a case-control study.

451 citations

References
More filters
Book
01 Jan 1995
TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Abstract: FUNDAMENTALS OF BAYESIAN INFERENCE Probability and Inference Single-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian Approaches Hierarchical Models FUNDAMENTALS OF BAYESIAN DATA ANALYSIS Model Checking Evaluating, Comparing, and Expanding Models Modeling Accounting for Data Collection Decision Analysis ADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional Approximations REGRESSION MODELS Introduction to Regression Models Hierarchical Linear Models Generalized Linear Models Models for Robust Inference Models for Missing Data NONLINEAR AND NONPARAMETRIC MODELS Parametric Nonlinear Models Basic Function Models Gaussian Process Models Finite Mixture Models Dirichlet Process Models APPENDICES A: Standard Probability Distributions B: Outline of Proofs of Asymptotic Theorems C: Computation in R and Stan Bibliographic Notes and Exercises appear at the end of each chapter.

16,079 citations


"When Should Epidemiologic Regressio..." refers background in this paper

  • ...…is recognized but whose use of statistics remains largely primitive; implementation details can be found in textbooks under the topic of hierarchical modeling (e.g., Gelman et al., 1995, Section 13.4; Leonard and Hsu, 1999, Section 6.3) though not at a level accessible to most epidemiologists....

    [...]

Book
31 Dec 1980
TL;DR: Statistical methods in cancer research as mentioned in this paper, Statistical Methods in Cancer Research, Statistical methods in Cancer research, Statistical methods for cancer research, کتابخانه مرکزی دانشگاه علوم پزش
Abstract: Statistical methods in cancer research , Statistical methods in cancer research , کتابخانه مرکزی دانشگاه علوم پزشکی تهران

6,164 citations

Journal ArticleDOI
TL;DR: A policy of not making adjustments for multiple comparisons is preferable because it will lead to fewer errors of interpretation when the data under evaluation are not random numbers but actual observations on nature.
Abstract: Adjustments for making multiple comparisons in large bodies of data are recommended to avoid rejecting the null hypothesis too readily. Unfortunately, reducing the type I error for null associations increases the type II error for those associations that are not null. The theoretical basis for advocating a routine adjustment for multiple comparisons is the "universal null hypothesis" that "chance" serves as the first-order explanation for observed phenomena. This hypothesis undermines the basic premises of empirical research, which holds that nature follows regular laws that may be studied through observations. A policy of not making adjustments for multiple comparisons is preferable because it will lead to fewer errors of interpretation when the data under evaluation are not random numbers but actual observations on nature. Furthermore, scientists should not be so reluctant to explore leads that may turn out to be wrong that they penalize themselves by missing possibly important findings.

4,854 citations


"When Should Epidemiologic Regressio..." refers background in this paper

  • ...Prominent epidemiologists have condemned such adjustments as unscientific and have even denied there are multiple-comparisons problems (Rothman, 1990; Cole, 1993; Savitz and Olshan, 1995)....

    [...]