How persuasive is a good fit? A comment on theory testing.

doi:10.1037/0033-295X.107.2.358

UC San Diego

UC San Diego Previously Published Works

Title

How persuasive is a good fit? A comment on theory testing

Permalink

https://escholarship.org/uc/item/5vt0z72k

Journal

Psychological Review, 107(2)

Authors

Roberts, Seth

Pashler, Harold

Publication Date

2000

Peer reviewed

eScholarship.org Powered by the California Digital Library

University of California

Psychological Review (in press)

Running head: Testing Theories With Free Parameters

How Persuasive is a Good Fit?

Seth Roberts

University of California, Berkeley

Harold Pashler

University of California, San Diego

Quantitative theories with free parameters often gain credence when they "fit" data closely. This is a

mistake, we argue. A good fit reveals nothing about (a) the flexibility of the theory (how much it cannot

fit), (b) the variability of the data (how firmly the data rule out what the theory cannot fit), and (c) the

likelihood of other outcomes (perhaps the theory could have fit any plausible result)–and a reader needs

to know all three to decide how much the fit should increase belief in the theory. As far as we can tell, the

use of good fits as evidence receives no support from philosophers of science nor from the history of

psychology; we have been unable to find examples of a theory supported mainly by good fits that has led

to demonstrable progress. We consider and rebut arguments used to defend the use of good fits as

evidence–for example, that a good fit is meaningful when the number of free parameters is small

compared to the number of data points, or when one model fits better than others. A better way to test a

theory with free parameters is to (a) determine how the theory constrains possible outcomes (i.e., what it

predicts); (b) assess how firmly actual outcomes agree with those constraints; and (c) determine if

plausible alternative outcomes would have been inconsistent with the theory, allowing for the variability

of the data.

How Persuasive is a Good Fit?

Many quantitative psychological

theories with free parameters are supported

mainly or entirely by demonstrations that

they can "fit" data–that the parameters can

be adjusted so that the output of the theory

resembles actual results. The similarity is

often shown via a graph with two functions:

one labeled observed (or data), the other

labeled predicted (or theory or simulated).

That the theory fits data is supposed to show

that the theory should be taken seriously–

should be published, for example.

This type of argument is common;

judging from a search of Psychological

Abstracts, the research literature probably

contains thousands of examples. Early

instances involved sensory processes

(Hecht, 1934) and animal learning (Hull,

1943), but it is now used in many areas.

Here are three recent examples:

1. Cohen, Dunbar, and McClelland

(1990) proposed a parallel-distributed-

processing model to explain the Stroop

effect and related data. The model was

meant to embody a "continuous" view of

automaticity, in contrast to an "all-or-none"

(p. 332) view. The model contained many

adjustable parameters, including number of

units per module, ratio of training

frequencies, learning rate, maximum

response time, initial input weights, indirect

pathway strengths, cascade rate, noise,

magnitude of attentional influence (two

parameters), and response-mechanism

parameters (three). The model was fit to six

data sets. Some parameters (e.g., number of

units per module) were separately adjusted

for each data set; other parameters were

adjusted based on one data set and held

constant for the rest. The function relating

cycle time (model) to average reaction time

(observed) was always linear but its slope

Psychological Review (in press)

Running head: Testing Theories With Free Parameters

and intercept varied from one data set to the

next. That the model could fit several data

sets led the authors to conclude that

compared to the all-or-none view, "a more

useful approach is to consider automaticity

in terms of a continuum" (Cohen et al.,

1990, p. 357)–although they did not try to fit

a model based on the all-or-none view.

2. Zhuikov, Couvillon, & Bitterman

(1994) presented a theory to explain goldfish

avoidance conditioning. It is a quantitative

version of Mowrer’s two-process theory, in

which some responses are generated by fear,

some by reinforcement. When some

simplifying assumptions are made, the

theory has three equations and six adjustable

parameters. The authors fit the theory to data

from four experiments, and concluded that

"the good fit suggests that the theory is

worth developing further" (Zhuikov,

Couvillon, & Bitterman, 1994, p. 32).

3. Rodgers and Rowe (1993)

proposed a theory that explains how

teenagers come to engage in various sexual

behaviors for the first time. It emphasizes

contact with other teenagers--a "contagion"

(p. 479) explanation. The theory has eight

equations with twelve free parameters.

Rodgers and Rowe fitted the theory to

survey data about the prevalence of kissing,

petting, and intercourse in boys and girls of

different ages and races and concluded that

the theory "appears to have successfully

captured many of the patterns in two

empirical data sets" (p. 505). This success

was the main support for the theory.

Why the Use of Good Fits as Evidence is

Wrong

This type of argument has three

serious problems. First, what the theory

predicts–how much it constrains the fitted

data–is unclear. Theorists who use good fits

as evidence seem to reason as follows: if our

theory is correct, it will be able to fit the

data; our theory fits the data; therefore it is

more likely that our theory is correct.

However, if a theory did not constrain

possible outcomes, the fit is meaningless.

A prediction is a statement of what a

theory does and does not allow. When a

theory has adjustable parameters, a

particular fit is just one example of what it

allows. To know what a theory predicts for a

particular measurement you need to know

all of what it allows (what else it can fit) and

all of what it does not allow (what it cannot

fit). For example, suppose two measures are

positively correlated, and it is shown that a

certain theory can produce such a relation–

that is, can fit the data. This does not show

that the theory predicts the correlation. A

theory predicts such a relation only if it

could not fit other possible relations between

the two measures–zero correlation, negative

correlation–and this is not shown by fitting a

positive correlation.

When a theory does constrain

possible outcomes, it is necessary to know

how much. The more constraint–the

narrower the prediction—the more

impressive a confirmation of the constraint

(e.g., Meehl, 1997). Without knowing how

much a theory constrains possible outcomes,

you cannot know how impressed to be when

observation and theory are consistent.

Second, the variability of the data

(e.g., between-subject variation) is unclear.

How firmly do the data agree with the

predictions of the theory? Are they

compatible with the outcomes that the

theory rules out? The more conclusively the

data rule out what the theory rules out, the

Psychological Review (in press)

Running head: Testing Theories With Free Parameters

more impressive the confirmation. For

Psychological Review (in press)

Running head: Testing Theories With Free Parameters

example, suppose a theory predicts that a

certain measure should be greater than zero.

If the measure is greater than zero, the

shorter the confidence interval, the more

impressive the confirmation. That a theory

fits data does not show how firmly the data

rule out outcomes inconsistent with the

theory; without this information, you cannot

know how impressed to be that theory and

observation are consistent.

Adding error bars may not solve this

problem; it is variability on the constrained

dimension(s) that matters. For example,

suppose a theory predicts that several points

will lie on a straight line. To judge the

accuracy of this prediction, the reader needs

to know the variability of a measure of

curvature (or some other measure of non-

linearity). Adding vertical error bars to each

point is a poor substitute (unless the answer,

linear or non-linear, is very clear); the

vertical position of the points is not what the

theory predicts.

Figure 1: Four possible relationships between theory and data.

(Measures A and B are both measures of behavior. For both

measures, the axes cover the whole range of possible values. The

dotted areas indicate the range of outcomes that would be

consistent with the theory. The error bars indicate standard errors.

In every case, the theory can closely fit the data, but only when

both theory and data provide substantial constraints does this

provide significant evidence for the theory.)

To further illustrate these points,

Figure 1 shows four ways a "two-

dimensional" prediction–a constraint

involving two measures at once–can be

compatible with data. Measures A and B in

Figure 1 are both derived from

measurements of behavior. Either might be

quite simple (e.g., trials to criterion) or

relatively complex (the quadratic component

of a fitted function); it does not matter. The

axis of each measure covers the entire range

of plausible values of the measure before the

experiment is done (e.g., from 0 to 1, if the

measure is a probability). The dotted area

shows the predictions of the theory, the

range of outcomes that are consistent with

the theory. In the two upper panels of Figure

1, the theory tightly constrains possible

outcomes; in the two lower panels, it does

not. In each case there is one data point. In

the two left-hand panels, the observations

tightly constrain the population value; in the

two right-hand panels, they do not. In every

case, the data are consistent with the theory

(the data point is within the dotted area),

which means in every case the theory can

closely fit the data. But only the situation in

the upper left panel is substantial evidence

for the theory.

Third, the a-priori likelihood that the

theory will fit--the likelihood it will fit

whether or not it is true--is ignored. Perhaps

the theory could fit any plausible result. It is

well-known that a theory gains more support

from the correct prediction of an unlikely

event than from the correct prediction of

something that was expected anyway.

Lakatos (1978) made this point vividly: "It

is no success for Newtonian theory that

stones, when dropped, fall towards the earth,

How persuasive is a good fit? A comment on theory testing.

Figures

Citations

An Integrated Theory of the Mind.

Heuristic Decision Making

A discounting framework for choice with delayed and probabilistic rewards.

Homo Heuristicus: Why Biased Minds Make Better Inferences

Moving beyond multiple regression analysis to algorithms: Calling for adoption of a paradigm shift from symmetric to asymmetric thinking in data analysis and crafting theory

References

A new look at the statistical model identification

Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations

The Logic of Scientific Discovery

The logic of scientific discovery

Confirmation Bias: A Ubiquitous Phenomenon in Many Guises:

Related Papers (5)

The Atomic Components of Thought

A Theory of Memory Retrieval.

Unified Theories of Cognition

An Integrated Theory of the Mind.

Reasoning the fast and frugal way: models of bounded rationality.