UC San Diego
UC San Diego Previously Published Works
Title
How persuasive is a good fit? A comment on theory testing
Permalink
https://escholarship.org/uc/item/5vt0z72k
Journal
Psychological Review, 107(2)
Authors
Roberts, Seth
Pashler, Harold
Publication Date
2000
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California
Psychological Review (in press)
Running head: Testing Theories With Free Parameters
How Persuasive is a Good Fit?
Seth Roberts
University of California, Berkeley
Harold Pashler
University of California, San Diego
Quantitative theories with free parameters often gain credence when they "fit" data closely. This is a
mistake, we argue. A good fit reveals nothing about (a) the flexibility of the theory (how much it cannot
fit), (b) the variability of the data (how firmly the data rule out what the theory cannot fit), and (c) the
likelihood of other outcomes (perhaps the theory could have fit any plausible result)–and a reader needs
to know all three to decide how much the fit should increase belief in the theory. As far as we can tell, the
use of good fits as evidence receives no support from philosophers of science nor from the history of
psychology; we have been unable to find examples of a theory supported mainly by good fits that has led
to demonstrable progress. We consider and rebut arguments used to defend the use of good fits as
evidence–for example, that a good fit is meaningful when the number of free parameters is small
compared to the number of data points, or when one model fits better than others. A better way to test a
theory with free parameters is to (a) determine how the theory constrains possible outcomes (i.e., what it
predicts); (b) assess how firmly actual outcomes agree with those constraints; and (c) determine if
plausible alternative outcomes would have been inconsistent with the theory, allowing for the variability
of the data.
How Persuasive is a Good Fit?
Many quantitative psychological
theories with free parameters are supported
mainly or entirely by demonstrations that
they can "fit" data–that the parameters can
be adjusted so that the output of the theory
resembles actual results. The similarity is
often shown via a graph with two functions:
one labeled observed (or data), the other
labeled predicted (or theory or simulated).
That the theory fits data is supposed to show
that the theory should be taken seriously–
should be published, for example.
This type of argument is common;
judging from a search of Psychological
Abstracts, the research literature probably
contains thousands of examples. Early
instances involved sensory processes
(Hecht, 1934) and animal learning (Hull,
1943), but it is now used in many areas.
Here are three recent examples:
1. Cohen, Dunbar, and McClelland
(1990) proposed a parallel-distributed-
processing model to explain the Stroop
effect and related data. The model was
meant to embody a "continuous" view of
automaticity, in contrast to an "all-or-none"
(p. 332) view. The model contained many
adjustable parameters, including number of
units per module, ratio of training
frequencies, learning rate, maximum
response time, initial input weights, indirect
pathway strengths, cascade rate, noise,
magnitude of attentional influence (two
parameters), and response-mechanism
parameters (three). The model was fit to six
data sets. Some parameters (e.g., number of
units per module) were separately adjusted
for each data set; other parameters were
adjusted based on one data set and held
constant for the rest. The function relating
cycle time (model) to average reaction time
(observed) was always linear but its slope
Psychological Review (in press)
Running head: Testing Theories With Free Parameters
and intercept varied from one data set to the
next. That the model could fit several data
sets led the authors to conclude that
compared to the all-or-none view, "a more
useful approach is to consider automaticity
in terms of a continuum" (Cohen et al.,
1990, p. 357)–although they did not try to fit
a model based on the all-or-none view.
2. Zhuikov, Couvillon, & Bitterman
(1994) presented a theory to explain goldfish
avoidance conditioning. It is a quantitative
version of Mowrer’s two-process theory, in
which some responses are generated by fear,
some by reinforcement. When some
simplifying assumptions are made, the
theory has three equations and six adjustable
parameters. The authors fit the theory to data
from four experiments, and concluded that
"the good fit suggests that the theory is
worth developing further" (Zhuikov,
Couvillon, & Bitterman, 1994, p. 32).
3. Rodgers and Rowe (1993)
proposed a theory that explains how
teenagers come to engage in various sexual
behaviors for the first time. It emphasizes
contact with other teenagers--a "contagion"
(p. 479) explanation. The theory has eight
equations with twelve free parameters.
Rodgers and Rowe fitted the theory to
survey data about the prevalence of kissing,
petting, and intercourse in boys and girls of
different ages and races and concluded that
the theory "appears to have successfully
captured many of the patterns in two
empirical data sets" (p. 505). This success
was the main support for the theory.
Why the Use of Good Fits as Evidence is
Wrong
This type of argument has three
serious problems. First, what the theory
predicts–how much it constrains the fitted
data–is unclear. Theorists who use good fits
as evidence seem to reason as follows: if our
theory is correct, it will be able to fit the
data; our theory fits the data; therefore it is
more likely that our theory is correct.
However, if a theory did not constrain
possible outcomes, the fit is meaningless.
A prediction is a statement of what a
theory does and does not allow. When a
theory has adjustable parameters, a
particular fit is just one example of what it
allows. To know what a theory predicts for a
particular measurement you need to know
all of what it allows (what else it can fit) and
all of what it does not allow (what it cannot
fit). For example, suppose two measures are
positively correlated, and it is shown that a
certain theory can produce such a relation–
that is, can fit the data. This does not show
that the theory predicts the correlation. A
theory predicts such a relation only if it
could not fit other possible relations between
the two measures–zero correlation, negative
correlation–and this is not shown by fitting a
positive correlation.
When a theory does constrain
possible outcomes, it is necessary to know
how much. The more constraint–the
narrower the prediction—the more
impressive a confirmation of the constraint
(e.g., Meehl, 1997). Without knowing how
much a theory constrains possible outcomes,
you cannot know how impressed to be when
observation and theory are consistent.
Second, the variability of the data
(e.g., between-subject variation) is unclear.
How firmly do the data agree with the
predictions of the theory? Are they
compatible with the outcomes that the
theory rules out? The more conclusively the
data rule out what the theory rules out, the
Psychological Review (in press)
Running head: Testing Theories With Free Parameters
more impressive the confirmation. For
Psychological Review (in press)
Running head: Testing Theories With Free Parameters
example, suppose a theory predicts that a
certain measure should be greater than zero.
If the measure is greater than zero, the
shorter the confidence interval, the more
impressive the confirmation. That a theory
fits data does not show how firmly the data
rule out outcomes inconsistent with the
theory; without this information, you cannot
know how impressed to be that theory and
observation are consistent.
Adding error bars may not solve this
problem; it is variability on the constrained
dimension(s) that matters. For example,
suppose a theory predicts that several points
will lie on a straight line. To judge the
accuracy of this prediction, the reader needs
to know the variability of a measure of
curvature (or some other measure of non-
linearity). Adding vertical error bars to each
point is a poor substitute (unless the answer,
linear or non-linear, is very clear); the
vertical position of the points is not what the
theory predicts.
Figure 1: Four possible relationships between theory and data.
(Measures A and B are both measures of behavior. For both
measures, the axes cover the whole range of possible values. The
dotted areas indicate the range of outcomes that would be
consistent with the theory. The error bars indicate standard errors.
In every case, the theory can closely fit the data, but only when
both theory and data provide substantial constraints does this
provide significant evidence for the theory.)
To further illustrate these points,
Figure 1 shows four ways a "two-
dimensional" prediction–a constraint
involving two measures at once–can be
compatible with data. Measures A and B in
Figure 1 are both derived from
measurements of behavior. Either might be
quite simple (e.g., trials to criterion) or
relatively complex (the quadratic component
of a fitted function); it does not matter. The
axis of each measure covers the entire range
of plausible values of the measure before the
experiment is done (e.g., from 0 to 1, if the
measure is a probability). The dotted area
shows the predictions of the theory, the
range of outcomes that are consistent with
the theory. In the two upper panels of Figure
1, the theory tightly constrains possible
outcomes; in the two lower panels, it does
not. In each case there is one data point. In
the two left-hand panels, the observations
tightly constrain the population value; in the
two right-hand panels, they do not. In every
case, the data are consistent with the theory
(the data point is within the dotted area),
which means in every case the theory can
closely fit the data. But only the situation in
the upper left panel is substantial evidence
for the theory.
Third, the a-priori likelihood that the
theory will fit--the likelihood it will fit
whether or not it is true--is ignored. Perhaps
the theory could fit any plausible result. It is
well-known that a theory gains more support
from the correct prediction of an unlikely
event than from the correct prediction of
something that was expected anyway.
Lakatos (1978) made this point vividly: "It
is no success for Newtonian theory that
stones, when dropped, fall towards the earth,