scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Construct validity in psychological tests.

01 Jul 1955-Psychological Bulletin (Psychol Bull)-Vol. 52, Iss: 4, pp 281-302
TL;DR: The present interpretation of construct validity is not "official" and deals with some areas where the Committee would probably not be unanimous, but the present writers are solely responsible for this attempt to explain the concept and elaborate its implications.
Abstract: Validation of psychological tests has not yet been adequately conceptualized, as the APA Committee on Psychological Tests learned when it undertook (1950-54) to specify what qualities should be investigated before a test is published. In order to make coherent recommendations the Committee found it necessary to distinguish four types of validity, established by different types of research and requiring different interpretation. The chief innovation in the Committee's report was the term construct validity.[2] This idea was first formulated by a subcommittee (Meehl and R. C. Challman) studying how proposed recommendations would apply to projective techniques, and later modified and clarified by the entire Committee (Bordin, Challman, Conrad, Humphreys, Super, and the present writers). The statements agreed upon by the Committee (and by committees of two other associations) were published in the Technical Recommendations (59). The present interpretation of construct validity is not "official" and deals with some areas where the Committee would probably not be unanimous. The present writers are solely responsible for this attempt to explain the concept and elaborate its implications.

Summary (3 min read)

Four Types of Validation

  • TI1e categories into which the Recommendations divide validity studies are: predictive validity, concurrent validity, content validity, and constrnct validity.
  • The first two of these may be considered together as criterion-oriented validation procedures.
  • Construct validation is important at times for every sort of psychological test: aptitude, achievement, interests, and so on.

Construct validity would be involved in answering such questions as:

  • These questions become relevant w~en the correlation is advanced as evidence that "test X measures anxiety proneness.".
  • Alternative interpretations are possible; e.g., perhaps the test measures "academic aspiration," in which case the authors will expect different results if they induce palmar sweating by economic threat.

Kinds of Constructs

  • At this point the authors should indicate summarily what they mean by a construct, recognizing that much of the remainder of the paper deals with this question .
  • The logic of construct validation is invoked whether the construct is highly systematized or loose, used in ramified theory or a few simple propositions, used in absolute propositions or probability statements.
  • In some situations the criterion is no more valid than the test.
  • Suppose, for example, that the authors want to know if counting the dots on Bender-Ccstalt figure five indicates "compulsive rigidity," and that they take psychiatric ratings on this trait as a criterion.
  • Suppose, to extend om exam ple, the authors have four t ests on the " predictor" side, over against the psychiatrist's "criterion," and find generally positive correlations among the five variables.

Inadequacy of Validation in Terms of Specific Criteria

  • The proposal to validate constructual interpretations of tests runs counter to suggestions of some others.
  • Validation is replaced by compiling statements as to how strongly the test predicts other observed variables of interest.
  • If two tests are presumed to measure the same construct, a correlation between them is predicted.
  • A matrix of intercorrclations often points out profitable ways of dividing the construct into more meaningful parts, factor analysis being a useful computational method in such studies.

THE NUMERICAL ESTU.1ATE OF CONSTRUCT VALIDITY

  • This numerical estimate can sometimes be arrived at by a factor analysis, but since present methods of factor analysis are based on linear relations, more general methods will ultimately be needed to deal with many quantitative problems of construct validation.
  • Rarely wi11 it be possible to estimate definite "construct saturations," because no factor corresponding closely to the construct will be avail able.
  • One can only hope to set upper and lower bounds to the '1oading.".
  • (The estimate is tentative because the test might overlap with the irrelevant portion of the laboratory measure.).
  • It shonld be particularly noted that rejecting the nuH hypothesis does not finish tl1e job of construct validation ( 35, p. 284).

186 CONSTRUCT VALIDITY IN PSYCHOLOGICAL TESTS

  • The Logic of Construct Validation Construct validation takes place when an investigator believes that his instrument reflects a particular construct, to which are attached certain meanings.
  • The proposed interpretation generates specific testable h ypotheses, which are a means of confirming or disconfirming the claim.
  • TI1e philosophy of science wh ich the authors believe does most justice to actual scientific practice will now be briefly and dogmatically set forth.

THE NOMOLOGICAL NET

  • The fundamental principles are these : I. Scientifically speaking, to "make clear what something is" means to set forth the laws in which it occurs.
  • One who claim s that his test reflects a construct cannot maintain h is claim in the face of recurrent negative results because these results show that his construct is too loosely defined to yield verifiable inferences.
  • In the extreme case t he h ypothesized laws are formulated entirely in terms of descriptive dimensions although not all of the relevant observations have actually been made.
  • The difficulties in merely "characterizing the surface cluster" are strikingly exhibited by the use of certain special and extreme groups for purposes of construct validation.
  • Chyatte's confirmation of this prediction ( I 0) tends to support botI1: (a) the theory sketch of "what tl1e Pd factor is, psychologically"; and (b) the claim of the Pd scale to construct validity for this hypothetical factor.

IC) I

  • This line of thought leads directly to their second important qualification upon the network schema.
  • When the network is very incomplete, having many strands missing entirely and some constructs tied in only by tenuous threads, then the "implicit definition" of these constructs is disturbingly loose; one might say that the meaning of the constructs is tmderdeterrnined.
  • Since the meaning of theoretical constructs is set forth by stating the laws in which they occur, their incomplete knowledge of t11e laws of nature produces a vagueness in their constructs (see Hempel, 30; Kaplan, 34; Pap, 51).
  • The authors will be able to say "what anxiety is" when they know all of the laws involving it; meanwhile, since they are in the process of discovering these laws, they do not yet know precisely what anxiety is.

Conclusions Regarding the Network after Experimentation

  • The proposition that x per cent of test variance is accounted for by the construct is inserted into the accepted network.
  • A predicted empirical relationship permits us to test all the propositions leading to that prediction.
  • Most cases in psychology today lie somewhere between these extremes.
  • The negative finding shows the bridge between the two to be undependable, but this is all the authors can say.
  • Success of these derivations testifies to the inductive power of the test-validity statement, and renders it unlikely that an equally effective alternative can be offered.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

-
----L.
J.
CRONBACH
and
P.
E.
MEEID..-----
Construct
Validity
in
Psychological
Tests
V
AL
IDATION
of psychological tests has
not
yet been adequately concep·
tua1ized,
as
the
APA
Committee
on
Psychological Tests learned when
it
undertook (1950-54)
to
specify what qualities should
be
investigated
before a te
st
is
published. In order to make coherent recommendations
the
Committee found it necessary
to
distinguish four types of validity,
established by different types of research
and
requiring different interpre·
tation.
The
chief
inn
ovation in the
Com
mittee's report was the term
constmct validity.*
Th
is idea
was
fir
st formulated hy a subcommittee
{Mee
hl
and
R.
C.
Cballman) studyi
ng
how proposed recommendations
would apply to proje
ct
ive techniques, and later modified
and
clarified
by the entire Committee {Bordin, Challman, Conrad,
Hu
mphre
ys
,
Super, and
the
present writers).
The
statements agreed
upon
by
tbe
Committee (and by committees of two other associations) were pub-
lished
in
the Technical Recommendations ( 59).
The
present interpre-
tation
of
construct validity
is
not
"official" and deals with some areas
in which
the
Committee would probably
not
be
unanimous.
Th
e present
writers are solely responsible for this
att
em
pt to explain the concept
and
elaborate its implications.
Identification
of
construct validity
was
not
an
isolated development.
Writers
on
validity during
the
preceding
decade
had shown a great
deal of dissatisfaction
with
conventional notions of validity, and intro·
dnced new terms
and
ideas,
hut
the resulting aggregation
of
types of
* Referred to in a preliminary report (
58)
as
cougrue
nt
validity.
NOTE:
Th
e second a
uthor
worked on this problem in co
nn
ection with his appoint·
mcnt to
the
Minnesota
Cente
r for Philosophy of
Sci
ence. \
Ve
are indebted to
the
o
th
er members of
the
C
enter
(Herb
e
rt
Feigl,
l'vli
chael Scriven, \.Vilfricl Sellars), and
h>
D. L.
Thi
stlcth
waitc
of
the
Univ
er
sity
of
Illinois, for
th
eir
m;1jor
c:o11trilmtions
to
our
thi11king
a11c1
their suggestions for improving this paper. T he paper
li
rst appc;ll'
l'<I
i
11
/
'
.
~>
·
dwlogie:rl
l311llcti11
,
Jul
y 1955,
an
d
is
repri
nt
ed
here,
wi
th minor :ilti;rations.
hy
p
l'
1111
issio
11
of
Ilic
editor
:i
nd of
the
authors.
174
CONSTRUCT VALIDITY IN PSYCHOLOGICAL TF.STS
validity seems only
to
ha
ve
stirred the
mudd
y waters. Portions of
the
distinctions we shall discuss are implicit
in
Jenkins' paper, "Validity for
'What?" { 33), Gulliksen's "I
nt
rinsic Validity" (27),
Goo<le
nough's di
s-
tinction between tests as "signs"
and
"samples" (22), Cronbach's sepa·
ration of "logical"
and
"empirical" validity (
11
), Guilford's "factorial
va
lidity" (25),
and
Mo
sier's pape
rs
on
"face validity" and "validity gen-
eralization"
(
49,
50). Hel
en
Pea
k (
52)
comes close to
an
explic
it
stat
e-
ment
of construct validity as we shall pre
sent
it.
Four
Type
s of Validation
TI1e
categories
into
which the Recommendations divide
va
lidity
studies are: predictive validity, concurrent validity, c
ontent
validity, and
constrnct vali
di
ty.
Th
e
fir
st two of these may
be
considered together
as
criterion-oriented validation procedures.
TI1e
pattern of a criterion-oriented study
is
familiar.
The
investigator
is
primarily intere
st
ed in some criterion whi
ch
he
w
is
hes
to
predict.
li
e
administers the test, obtains
an
independent criterion measure on the
same subjects, and computes a correlation.
If
the criterion
is
obtained
some time after
tile test
is
given, he is studying predictive validity.
If
th
e
test score
and
criterion score are
de
termined
at
essentially
th
e same
time,
he
is
studying concurrent validity. Concurrent validity is studied
when
one
test
is
proposed as a s
ub
sti
tute
for another (for exampl
e,
when
a multiple-choice form
of
spelling test is substi
tuted
for taking dicta-
tion),
or
a test
is
shown to correlate
with
some contemporary criterion
(e.g
.,
psychiatric diagnosis).
Conte
nt
validity is established by showing that the test items arc
a sample
of
a universe in
whic11
the
investigator is interested.
Content
va
lidity is ordinarily
to
be
es
tablished deductivel
y,
by defining a un
verse of items
and
sampling systematically within this universe to
establish
the
test.
Construct validation
is
in
vo
lved whenever a test
is
to
be
int
erpreted
as
a
m0<1s
ure of some attribute or quality which is
not
"operationally
defined."
TI1
e problem faced by the investigator is, '
'\Vhat
constn1cts
accou
nt
for ·variance in test
pe
rformance?" Construct
va
lidity calls for
110 new scientific
ap
proach.
Mu
ch curre
nt
research on tests of per-
sonn
lity (9) is consl'ruct
va
lidation, usually without the
ben
efi
t of a
clear formulation of
!'hi
s process.
C
on
struct validity is not to he identified sol
ely
by particular investi-
17
5

L.
J.
Cronbach and P.
E.
Meehl
gative procedures,
but
by
the
orientation
of
the
investigator. Criterion-
oriented validity,
as
Bechtoldt emphasizes (
3,
p.
1245), "involves
the
acceptance of a set of operations
as
an
adequate definition of whatever
is
to
be
measured." \.Vhen
an
investigator believes
that
no criterion
available
to
him
is
fully valid,
he
perforce becomes interested in con-
struct validity because this
is
the only way
to
avoid the "infinite frus-
tration"
of relating every criterion
to
some more ultimate standard ( 21).
In
content validation, acceptance of
the
universe
of
content as defining
the variable
to
be
measured
is
essential.
Construct
validity must
be
in-
vestigated whenever
no
criterion
or
universe
of
content
is
accepted as
entirely adequate to define the quality to
be
measured. Determining
what psychological constructs account for test performance
is
desirable
for almost any test. Thus, although
the
MMPI
was originaJly estab-
lished
on
the
basis of empirical discrimination between patient groups
and
so-called normals (concurrent validity), continuing research has
tried to provide a basis for describing the personality associated with
each score pattern.
Such interpretations permit
the
clinician
to
predict
performance with respect
to
criteria which have
not
yet
been employed
in empirical validation studies
(cf. 46, pp. 49-50, 110-11).
Vie
can distinguish among
the
four types
of
validity by noting
that
each involves a different emphasis
on
the criterion.
In
predictive or con-
current validity,
the
criterion behavior
is
of concern
to
the
tester, and
he
may have
no
concern whatsoever with the type
of
behavior exl1ibited
in the test. (An employer does
not
care if a worker can manipulate
blocks,
but
the score
on
the block test may predict something
he
cares
about.)
Content
validity
is
studied when
the
tester is concerned
with
the type of bel1avior involved in
the
test performance. Indeed, if the
test
is
a work sample,
the
behavior represented in
the
test
may
be
an
end in itself. Construct validity
is
ordinarily studied when
the
tester
has no definite criterion measure of the quality
with
which
he
is
con-
cerned,
and
must use indirect measures. Herc the trait
or
quality un-
derlying the test
is
of
central importance, rather
than
either
the
test
behavior
or
the scores
on
the
criteria (
59,
p.
14).
Construct validation
is
important
at
times for every sort
of
psycho-
logical test: aptitude, achievement, interests, and so on. Thurstone's
statement
is
interesting
in
this connection:
In the
field
of intelligence tests,
it
used to
be
common to define validity
as
the correlation between a test score and some outside criterion.
\Ve
have reached a stage of sophistication where the test-criterion correlation
176
CONSTRUCT VALIDITY
IN
PSYCHOLOGICAL TESTS
is
too coarse.
It
is
obsolete.
If
we
attempted
to ascertain the validity
of
a test for the second space-factor, for example, we would have
to
get
judges [to] make reliable judgments
ah<:>ut
people as to this factor.
Ordinarily their [the available
i.u~ges']
r~tm~s
would
b~.
of
no
v~lue
as
a criterion. Consequently, vahd1ty studies m
the
cogmbve functions
now depend
on
criteria of internal consistency
...
(60, p.
3).
Construct validity would
be
involved in answering such questions as:
To
what extent
is
this test of intelligence culture-free? Does this
test
of "interpretation of data" measure reading ability, quantitative reason-
ing,
or
response sets? How does a person with A in Strong Accountant,
and
B in Strong CPA, differ from a person who has these scores
reversed?
Example of construct validation procedure. Suppose measure X
cor-
relates
.50
with
Y,
the
amount
of palmar sweating induced when we
tell a student
that
he
has failed a Psychology I exam. Predictive validity
of
X for Y
is
adequately described
by
the
coefficient,
and
a statement
of
the experimental
and
samp1ing conditions.
If
someone were
to
ask,
"Isn't
there perhaps another way to interpret this correlation?" or
"\Vhat
other kinds of evidence can you bring
to
support your interpre-
tation?"
we would hardly understand 'vhat
he
was asking because no
interpretation has been made. These questions become relevant
w~en
the
correlation
is
advanced
as
evidence
that
"test
X measures anxiety
proneness." Alternative interpretations are possible; e.g., perhaps
the
test measures "academic aspiration," in which case we will expect dif-
ferent results if we induce palmar sweating by economic threat.
It
is
then reasonable to inquire about other kinds
of
evidence.
Add these facts from further studies:
Test
X correlates
.45
with fra-
ternity brothers' ratings
on
"tenseness."
Test
X correlates
.55
with
amount
of
inteJlectual inefficiency induced by P'dinful electric shock,
and
.68 with
the
Taylor Anxiety Scale.
Mean
X score decreases among
four diagnosed groups in this order: anxiety state, reactive depression,
"normal,"
and
psychopathic personality. And palmar sweat under threat
of
failure in Psychology I correlates .60 with
threat
of failure in mathc·
matics. Negative results eliminate competing explanations of the X
score; thus, findings of negligible correlations between X and social
class, vocational aim, and value-orientation make it fairly safe to reject
the suggestion
that
X measures "academic aspiration."
We
can have
substantial confidence
that
X docs
lll<~a
s
ure
anxiety proneness if the
177

L.
J.
Cronbach
and
P. E .
Meehl
c
urr
e
nt
the
ory
of
anxiety can embrace
the
v
ar
iates which yield positive
correlations,
and
does
not
predict correlations where we found
none.
Kinds of Constructs
At
this
point
we
s
hould
indicate s
umm
arily
what
we
m
ean
by a con-
struct, recognizing
that
much
of
the
remainder
of
the
paper deals wi
th
thi
s question. A
construct
is
some
po
stu
lated
attribute
of
people,
assumed
to
be
reflected
in
test performance.
In
test validation
the
attribute
about
which
we
make s
tat
eme
nt
s
in
interpreting a
te
st is
a cons
tru
ct.
We
expect a person at
any
tim
e
to
possess
or
not
possess
a qualitative
attribute
(amnesia)
or
s
tructur
e,
or
to
po
ssess
some
degree
of
a
quan
titative
attribute
{cheerfulness). A
co
n
struct
ha
s certain asso-
ciated mea
ning
s carried
in
statements
of
th
is
ge
neral character:
Per
sons
who possess this
attribut
e will,
in
situation
X,
act
in
mann
er Y (with
a s
tat
ed
probability
).
The
logic of construct validation is invoked
whether
the
con
st
ruct
is highly systematized
or
loo
se, used in ramified
theory
or
a few s
impl
e propositions, used
in
absolute propositions
or
probability statements.
We
seek
to
specify
how
one
is
to
defend
a
proposed interpretation
of
a test; we are
not
recommending
any
one
type of interpretation.
The
constructs in whi
ch
te
sts are to
be
interpreted
are
certai
nl
y
not
likely
to
be
physiological.
Most
oft
en
th
ey will
be
traits such as "
latent
ho
stility"
or
"variable
in
mood,"
or
descriptions
in
terms
of
an
educa-
tional objective,
or
"ability
to
plan
experiments."
For
th
e
benefit
of
readers who may
have
been
influenced by certain exegeses
of
MacCor-
quodale a
nd
Meehl
(40), let us he
re
emphasize:
Whether
or
not
an
interpretation
of
a test's properties
or
relations involves
qu
estions
of
c
on
struct validity is
to
be
decided
by
examining
th
e
entire
body
of
evidence offered,
togeth
er
with
what
is asserted
about
the
test
in
the
context
of
this evidence.
Proposed
i
den
tifications
of
cons
tru
cts
aJlegedly
measured by
th
e test
with
co
n
structs
of
other
sciences (e.g., genetics,
neuroan
ato
m
y,
biochemis
tr
y) make
up
o
nl
y one cJass of
co
nstruct-
validity claim
s,
and
a
rather
minor
one at present. Spa
ce
does
not
permit
full analysis
of
the relation
of
th
e present paper to
the
Ma
c-
Corquodalc-
Meehl
distin
c
tion
between
hypothetical constructs
and
in·
l'crvcning
variables.
Th
e philosophy
of
science
pertinent
to
th
e present
paper is
se
t forth lat
er
in
the
section entitled,
"T
he
nomological net·
work."
1
7R
CONSTRUCT VALIDITY
IN
PSYCHOLOGICAL TESTS
The
Relation of Constmcts to
"C
riteria"
CRITICAL
VIEW
OF'
THE
CRITERION IM
PLI
ED
An
unqu
estionable criterion
ma
y
be
found
in
a practical operation,
or
may
be
establis
hed
as a conse
quence
of
an
operational definition.
Typically, however,
the
psychologist is unwilling to use
the
directly
operational approach
because
he
is
interested in
building
a theory
about
a generalized construct. A theorist trying
to
relate behavior
to
"hunger"
almost
certain
ly
in
ves
ts
that
term
with
meanin
gs
other
than
th
e
opera-
tion "elapsed-ti
me
-since-feeding."
If
he is
co
ncerned with
hun
ger as a
tissue
ne
ed,
he
will
not
accept
time lapse as equivalent
to
his construct
because
it
fails to cons
ide
r,
among
ot
her
thing
s, energy expenditure
of
the
animal.
In
some situations
the
criterion is no m
ore
valid
than
the
test. Sup-
pose, for example,
that
we
want
to
know if
co
untin
g
the
dots
on
Bender-
Ccstalt
fi
gure
fi
ve indicates
"c
ompulsive rigidity,"
and
that
we take
psychiatric ratings
on
this trait
as
a criterion.
Even
a conventional
report
on
the
resulting correlation will say
som
ethin
g
about
the
ext
ent
and
intensity
of
the
psychiatrist's contacts
and
should describe
his
qualifica-
tions
(e.g., diplomatc s
tatu
s?
analyzed?).
\,Vhy
report
th
ese facts? Because data are needed
to
indicate
whether
the
crite
rion
is
any
good.
"Co
mpul
sive rigi
dit
y" is
not
really
intended
to
mean
"social stimulus value
to
psychiatrists."
The
implied
tr
ait
in
-
volves a range
of
behavior-dispositions which may
be
very
imp
erfectly
sampled by
th
e psychiatrist. Suppose dot-counting does
not
occur in
a particular
patient
and
yet we find
that
the
psychiatri
st
has
ra
t
ed
him
as "rigi
d."
'Vh
en
qu
estioned
the
ps
yc
hiatrist tells us
that
th
e
patient
w-as
a
rather
easy, free-w
he
eling so
rt
; however, t
he
pat
i
ent
did
lean over
to
straighten
out
a skewed
de
sk
blott
er,
and
thi
s,
viewed agai
nst
certa
in
ot
h
er
facts, tipped
the
scale in favor of a "rigid" rating.
On
the face
of
it,
counting
Bende
r
dots
may
be
just as good
(o
r
poor)
a sa
mple
of the
compu
lsive-rigidity domain as stra
ighte
ning desk
blott
ers is.
S
upp
ose,
to
extend
om
example,
we
have
four
tests on
th
e "
pr
edictor"
side, over against the psychiatrist's "criterion
,"
and
find ge
ne
rally posi-
tive
corre
lat
ions among the
fiv
e variables. Surely
it
is
artificial
rmd
arhi-
t·rary
to
impose
the
"t
·est·sliould-pre
cli
ct-critcrion" pattern on
!1
11d1
clala.
T
he
p
.<>yc
hiatrist
sa
mple
s verbal con
tent
,
cxpr~sivc
pattern, voice, pos-
ture
, etc. T he psychologist !lmnplcs verb
al
cout
cnt,
pcr
c:c
ptio11,
exprcs-
17
9

L.
J.
Cronbach
and
P.
E.
Meehl
sive pattern, etc.
Our
proper conclusion
is
that, from this evidence,
the
four tests and
the
ps
yc
hiatrist all assess some common factor.
111e
asymmetry between the
"test"
and the so-designated
"c
riterion"
arises only because the terminology of predictive validity has become
a commonplace in test analysis.
In
this
st
udy where a construct
is
the
central concern, any distinction between the merit of
the
t
es
t and
criterion variables would
be
justified only if
it
had already been shown
that
the
psychiatrist's theory and operations were excellent measures
of
the attribute.
Inadequacy of Validation in Terms
of
Specific Criteria
Th
e proposal to
vali
date
co
nstructual interpretations of tests runs
counter to suggestions of some others.
Spiker and McCandless (
57)
favor an operational approach. Validation
is
replaced by compiling state-
ments
as
to how strongly the test predicts other observed variables
of
interest.
To
avoid requiring
that
each new variable
be
investigated com-
pletely by itself, they allow two variables
to
collapse into one whenever
the
properties of
the
operationally defined measures are
the
same:
"If
a new test
is
demonstrated
to
predict
the
scores on an older, wcll-
establishcd test,
then
an evaluation of the predictive power of
the
older
te
st
may
be
used for
the
new one."
But
accurate inferences are po
ss
ible
only if the two tests correlate so highly t
hat
there
is
negligible reliable
variance in either test, independent of the other.
Where
the
corre-
spondence
is
less close, one must either retain all the separate variables
operationally defined or embark on construct validation.
Th
e practical user
of
tests must rely on constructs
of
some generality
to
make predictions about new situations.
Test
X could be used
to
predict palmar sweating
in
the
face
of
failure without invoking any
construct,
but
a counselor
is
more likely
to
be
asked
to
forecast behavior
in diverse or even unique
situations
for
which
the
correlation of test X
is
unknown. Significant predictions rely on knowledge accumulated
around the generalized construct of anxiety.
The
Techni
cal
Recom-
mendations state:
It
is
ordinarily necessary
to
evaluate construct validity by
int
egrating
evidence from many different sources.
The
problem
of
construct valida-
tion hccomes especially acute in
th
e clinical field since for many of the
co11strncts
dealt with
it
is
not
a question
of
finding an imperfe
ct
criterion
h11I
of finding any criterion
at
all. The psychologi
st
inter
es
ted in
('0
11
·
180
CONSTRUCT VALIDITY
IN
PSYCHOLOGICAL TESTS
struct validity for clinical devices
is
concerned with making an
es
timate
of a hypothetical internal process,
fa
ctor,
sys
tem, structure, or state and
ca~
not
~xpect
to find. a
~lea
r
unitary behavioral criterion. An attempt
to
identify any one cntenon m
eas
ur
e or any composite
as
the criterion
aimed
at
is,
however, usually unwarranted
(59,
pp. 14-15).
This appears
to
conflict with arguments
for
specific criteria promi-
nent
at
places in the testing literature. Thus Anastasi (2) makes many
statements
of
the
latter character:
"It
is
only
as
a measure
of
a speci-
ficall
y defined criterion
that
a test can be objectively validated
at
all . . .
To
cJa
im
that
a test measures anything over
and
above
it
s c
ri
terion
is
pure speculation" (p.
67
). Yet elsewhere
this
article supports construct
validation. Tests can be profitably interpreted
if
we
"know t
he
relation-
ships between
the
tested behavior
...
and other behavior
sa
mples,
none of these behavior samples necessarily occupying the preeminent
position of a
criterion" ( p. 75). Factor analysis with several partial
criteria might be used
to
study whether a test measures a postulated
"general learning ability."
If
th
e data demonstrate specificity
of
ability
instead, such specificity
is
"useful in its own right in advancing our
knowledge
of
behavior;
it
should
not
be construed as a weakness of
the
tests" (p. 75).
\V
e depart from Anastasi
at
two points. She writes, "
Th
e validity of
a psychological test should
not
be
co
nfused with an analysis of the
factors which determine the behavior under consideration." W
e,
how-
ever, regard such analysis
as
a mo
st
important type of validation. Second,
she refers to
"the
will·o'-the-wisp
of
psychological
proce~ses
which arc
di
st
in
ct
from performance" (2, p. 77).
While
we agree
that
psychologi-
ca
l processes are elusive, we are sympathetic
to
attempts
to
formulate
and clarify constructs which are evidenced by performance
but
distiuct
from
it. Surely an inductive inference based on a pattern
of
corrcl
atio11s
cannot be dismissed as "pure speculation."
SPECIFIC
CRITERIA USED
TEMPORAR
ILY:
THE
"BOOTSTRAPS"
F.J
l
'F
l
•:CT
Even when a test
is
constructed on the basis of a specific c
ri1
·cr
io11
,
it
may ultimately be judged
to
have greater construct
valiclil'y
than I he
criterion.
We
start with a vague concept which we associate with
cc
1tai11
observations.
We
then discover empirically
that
these
ohservntio11s
co-vary with some other observation which pos
sess
es greater reliability
or
is
more intimately
co
rr
ela
t
ed
with relevant experimental
c
hnn
gc.~
than
181

L.
f.
Cronbach and P. E
.-
Meelil
is
th
e original measure,
or
both.
For
exa
mple,
the
notion
of
te
mp
era
tur
e
arises because some objects
feel
hotter
to
the
touch
than
ot
hers.
The
expansion
of
a mercury column does
not
have face validity
as
an i
nd
ex
of
hoh1ess.
But
it
turns
out
that
(a)
there
is a statistical relati
on
be
-
tw
een expansion
and
sensed temperature; (
b)
ob
servers employ
the
mercury method with good interobserver
ag
re
em
ent
; (
c)
the regularity
of
observed relations is increased by using
the
thermometer
{e.g., melt-
ing
points
of
sa
mpl
es
of
th
e same
mat
erial vary little
on
the
thermome
-
ter; we o
bt
ain nearly linear relations between mercury measures and
pressure
of
a gas). Finally, ( d) a theoretical
stru
ct
ure involving unob-
servable
microevents-the
kinetic
theory-is
worked
out
which explains
the relation
of
mercury expansion to heat. 111is whole process
of
con-
ceptual enri
chment
begins w
ith
what in retrospect we see as
an
ex-
treme
ly
fallible
"criterion"-the
hu
man temperature sense. T
hat
original
criterion has
now
been relega
te
d to a peripheral position.
\Ve
have lifted
ourselves by
our
boostraps,
bu
t
in
a legitimate a
nd
fruitful way.
Similarly,
the
Binet
scale was first valued because children's scores
tended
to agree with judgme
nt
s by schoolteachers.
If
it
had
no
t shown
this agreement,
it
would have been discarded al
on
g with reaction
tim
e
and
the
other measures
of
ability previous
ly
tried.
Teacher
judgments
once
constituted
th
e cri
ter
ion against which
the
individual
in
telligence
test
was validated.
But
if today a child's
IQ
is
135
and
three
of
hi
s
teachers complain
about
how
shtpid
he is, we do
not
conclude
that
the
test
has failed.
Quite
to
th
e contrary, if
no
error
in
test proce
dure
can
be
argued, wc
treat
th
e
te
st
score
as
a
va
lid
s
tat
em
ent
about
an
important
quality,
and
define our task
as
that
of
finding
out
what
o
th
er
variabl
es-persona
lity,
st
udy skill
s,
et
c.-moclify achievement
or
distort
teac
her
judgment.
Expcrjmentation to
In
vestigate
Co
ns
tru
ct
Validity
VALIDATION PROCEDURES
'\Ve
ca
n use many
methods
in
constrnct validation.
Att
entio
n
sho
uld
particnlarly
be
dra\.vn
to
Macfarlane's survey of
th
ese me
th
ods as
th
ey
apply to projective devices ( 41) .
Croup
<liff
crcnces.
If
our
understanding
of
a construct leads us to
expect
two
gronps to cliffer on
the
test
, this expectation
may
be
tested
dir
cd
l
y.
'l'l111s
Thurslo11c
and
Chavc validated the Scale for Measuring
1
82
CONSTRUCT VALIDITY
IN
PSYCHOLOGICAL
TE
STS
Attitude
Toward
th
e
Church
by showing score differenc
es
between
chur
ch members
and
nonchurchgoers. Churchgoing is
not
tbe
criterion
of
attitude, for
the
purpose
of
the
test
is to measure something other
than
the
crude sociological fact of church
attendanc
e;
on
th
e
other
hand,
failure to find a difference would have seriously challenged
the
te
s
t.
Only coarse correspondence between
test
and
group designation
is
expected.
Too
great a correspondence between
the
two would indicate
that
the
test
is
to
some degree invalid, because members
of
the
groups
are expected to overlap
on
the
test. Intelligence test
item
s are selected
initially on
the
basis
of
a correspondence
to
age,
but
an
item
that
corr
e-
lates
.95
with age in
an
elementary school
sa
mp
le would sure
ly
be suspect.
Correlation
matri
ces
and
factor
analysis
.
If
two tests are presu
med
to
measure
the
same
con
struct, a correlation between
them
is predicted.
(An exception is noted where some second attribute has
po
sitive load-
ing
in
the
first test
and
negative loading
in
the
second
test
;
then
a low
correlation is expected.
111is
is
a
te
stable interpretation provided
an
external
mea
sure
of
either
the
first or
th
e second variable exists.)
If
the
obtained correlation departs from
the
expectation, however,
there
is
no
way
to
know whether
the
fault lies
in
test A,
test
B,
or
the
formulation
of
th
e construct. A matrix
of
intercorrclations often points
out
profitable
ways
of
dividing
the
construct
into
more
meaningful parts, factor
analysis being a
us
eful computational method
in
such
st
udies.
Guilford (26) has discussed
the
place
of
factor analysis
in
construct
validation. His
statem
e
nts
may
be
extracted as
fo11ows:
"T
he personnel
psychologist wishes
to
know 'why his tests are valid.'
He
can
place tests
and
practical criteria
in
a matrix
and
fa
ctor
it
to identify 'real dimen-
sions
of
human
personality.' A factorial description
is
exact
and
stable;
it
is
economical in explanatio
n;
it
leads to
the
creation
of
pure tests
which can be co
mb
ined
to predict co
mp
lex behaviors."
It
is clear
th
at
factors
here
function as constructs. Eysenck, in his "criterion analysis"
(18), goes farther
than
Guilford,
and
shows
that
factoring can
be
used
explicitly to
test
hypoth
eses
about
constructs.
Factors may
or
may
not
be
weighted with surplus meaning. Certainly
when they a;e regarded
as
"real dimensions" a great
de
al
of
surplus
meaning
is
implied,
and
the interpreter
must
shoulder a
s11hs
ta11tial
burden
of
proof.
The
alternative view is to regard factors as defining a
working reference frame,
lo
cated in a
convenient
manner
in tltc
"space"
define
<]
hy
all behaviors
of
a given
lyp
e.
\Vhi
eh set
of
fa
ctors from a
18
3

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors provide guidance for substantive researchers on the use of structural equation modeling in practice for theory testing and development, and present a comprehensive, two-step modeling approach that employs a series of nested models and sequential chi-square difference tests.
Abstract: In this article, we provide guidance for substantive researchers on the use of structural equation modeling in practice for theory testing and development. We present a comprehensive, two-step modeling approach that employs a series of nested models and sequential chi-square difference tests. We discuss the comparative advantages of this approach over a one-step approach. Considerations in specification, assessment of fit, and respecification of measurement models using confirmatory factor analysis are reviewed. As background to the two-step approach, the distinction between exploratory and confirmatory analysis, the distinction between complementary approaches for theory testing versus predictive application, and some developments in estimation methods also are discussed.

34,720 citations


Cites background from "Construct validity in psychological..."

  • ...Given acceptable convergent and discriminant validities, the test of the structural model then constitutes a confirmatory assessment of nomological validity (Campbell, 1960; Cronbach & Meehl, 1955 )....

    [...]

Journal ArticleDOI
TL;DR: This transmutability of the validation matrix argues for the comparisons within the heteromethod block as the most generally relevant validation data, and illustrates the potential interchangeability of trait and method components.
Abstract: Content Memory (Learning Ability) As Comprehension 82 Vocabulary Cs .30 ( ) .23 .31 ( ) .31 .31 .35 ( ) .29 .48 .35 .38 ( ) .30 .40 .47 .58 .48 ( ) As judged against these latter values, comprehension (.48) and vocabulary (.47), but not memory (.31), show some specific validity. This transmutability of the validation matrix argues for the comparisons within the heteromethod block as the most generally relevant validation data, and illustrates the potential interchangeability of trait and method components. Some of the correlations in Chi's (1937) prodigious study of halo effect in ratings are appropriate to a multitrait-multimethod matrix in which each rater might be regarded as representing a different method. While the published report does not make these available in detail because it employs averaged values, it is apparent from a comparison of his Tables IV and VIII that the ratings generally failed to meet the requirement that ratings of the same trait by different raters should correlate higher than ratings of different traits by the same rater. Validity is shown to the extent that of the correlations in the heteromethod block, those in the validity diagonal are higher than the average heteromethod-heterotrait values. A conspicuously unsuccessful multitrait-multimethod matrix is provided by Campbell (1953, 1956) for rating of the leadership behavior of officers by themselves and by their subordinates. Only one of 11 variables (Recognition Behavior) met the requirement of providing a validity diagonal value higher than any of the heterotrait-heteromethod values, that validity being .29. For none of the variables were the validities higher than heterotrait-monomethod values. A study of attitudes toward authority and nonauthority figures by Burwen and Campbell (1957) contains a complex multitrait-multimethod matrix, one symmetrical excerpt from which is shown in Table 6. Method variance was strong for most of the procedures in this study. Where validity was found, it was primarily at the level of validity diagonal values higher than heterotrait-heteromethod values. As illustrated in Table 6, attitude toward father showed this kind of validity, as did attitude toward peers to a lesser degree. Attitude toward boss showed no validity. There was no evidence of a generalized attitude toward authority which would include father and boss, although such values as the VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX

15,795 citations

Journal ArticleDOI
TL;DR: In this paper, a broader approach to research in human development is proposed that focuses on the pro- gressive accommodation, throughout the life span, between the growing human organism and the changing environments in which it actually lives and grows.
Abstract: A broader approach to research in hu- j man development is proposed that focuses on the pro- \ gressive accommodation, throughout the life span, between the growing human organism and the changing environments in which it actually lives and grows. \ The latter include not only the immediate settings containing the developing person but also the larger social contexts, both formal and informal, in which these settings are embedded. In terms of method, the approach emphasizes the use of rigorousj^d^igned exp_erjments, both naturalistic and contrived, beginning in the early stages of the research process. The chang- ing relation between person and environment is con- ceived in systems terms. These systems properties are set forth in a series of propositions, each illus- trated by concrete research examples.

7,980 citations

Journal ArticleDOI
TL;DR: A critical appraisal of resilience, a construct connoting the maintenance of positive adaptation by individuals despite experiences of significant adversity, concludes that work on resilience possesses substantial potential for augmenting the understanding of processes affecting at-risk individuals.
Abstract: This paper presents a critical appraisal of resilience, a construct connoting the maintenance of positive adaptation by individuals despite experiences of significant adversity. As empirical research on resilience has burgeoned in recent years, criticisms have been levied at work in this area. These critiques have generally focused on ambiguities in definitions and central terminology; heterogeneity in risks experienced and competence achieved by individuals viewed as resilient; instability of the phenomenon of resilience; and concerns regarding the usefulness of resilience as a theoretical construct. We address each identified criticism in turn, proposing solutions for those we view as legitimate and clarifying misunderstandings surrounding those we believe to be less valid. We conclude that work on resilience possesses substantial potential for augmenting the understanding of processes affecting at-risk individuals. Realization of the potential embodied by this construct, however, will remain constrained without continued scientific attention to some of the serious conceptual and methodological pitfalls that have been noted by skeptics and proponents alike.

7,392 citations


Additional excerpts

  • ...If different studies with diverse methods yielded largely consonant findings on particular aspects of parenting, it would be reasonable to infer that they each tapped into the same broad scientific construct (cf. Cronbach & Meehl, 1955)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a 10-item measure of the Big-Five personality dimensions is proposed for situations where very short measures are needed, personality is not the primary topic of interest, or researchers can tolerate the somewhat diminished psychometric properties associated with very brief measures.

6,574 citations


Cites background from "Construct validity in psychological..."

  • ...The construct validity of an instrument can be defined in terms of a nomological network (Cronbach & Meehl, 1955); that is, the degree to which a construct shows theoretically predicted patterns of correlations with other related and unrelated constructs....

    [...]

  • ...The construct validity of an instrument can be defined in terms of a nomological network ( Cronbach & Meehl, 1955 ); that is, the degree to which a construct shows theoretically predicted patterns of correlations with other related and unrelated constructs....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a general formula (α) of which a special case is the Kuder-Richardson coefficient of equivalence is shown to be the mean of all split-half coefficients resulting from different splittings of a test, therefore an estimate of the correlation between two random samples of items from a universe of items like those in the test.
Abstract: A general formula (α) of which a special case is the Kuder-Richardson coefficient of equivalence is shown to be the mean of all split-half coefficients resulting from different splittings of a test. α is therefore an estimate of the correlation between two random samples of items from a universe of items like those in the test. α is found to be an appropriate index of equivalence and, except for very short tests, of the first-factor concentration in the test. Tests divisible into distinct subtests should be so divided before using the formula. The index $$\bar r_{ij} $$ , derived from α, is shown to be an index of inter-item homogeneity. Comparison is made to the Guttman and Loevinger approaches. Parallel split coefficients are shown to be unnecessary for tests of common types. In designing tests, maximum interpretability of scores is obtained by increasing the first-factor concentration in any separately-scored subtest and avoiding substantial group-factor clusters within a subtest. Scalability is not a requisite.

37,235 citations


"Construct validity in psychological..." refers result in this paper

  • ...On the other hand, a study of item groupings in the DAT Mechanical Comprehension Test permitted rejection of the hypothesis that knowledge about specific topics such as gears made a substantial contribution to scores (13)....

    [...]

Book
01 Jan 1970
TL;DR: In this article, the authors discuss the use of tests and test interpretation for various purposes, e.g., personality measurement through self-report, personality test interpretation, and personnel selection.
Abstract: I.BASIC CONCEPTS. 1.Who Uses Tests? And for What Purposes? 2.Varieties of Tests and Test Interpretations. 3.Administering Tests. 4.Scores and Score Conversions. 5.How to Judge Tests: Validation. 6.How to Judge Tests: Reliability and other Qualities. II.TESTS OF ABILITY. 7.General Ability: Appraisal Methods. 8.The Meanings of General Ability. 9.Influences on Intellectual Development. 10.Multiple Abilities and Their Role in Counseling. 11.Personnel Selection. III.MEASURES OF TYPICAL RESPONSE. 12.Interest Inventories. 13.General Problems in Studying Personality. 14.Personality Measurement through Self-Report. 15.Judgments and Systematic Observations. 16.Inferences from Performance.

3,858 citations

Book
01 Jan 1951

2,160 citations

Journal ArticleDOI

1,344 citations


"Construct validity in psychological..." refers background in this paper

  • ...A recent analysis of "empathy" tests is perhaps worth citing (14)....

    [...]

Journal ArticleDOI
TL;DR: The efficiency of the great majority of psychometric devices reported in the clinical psychology literature is difficult or impossible to evaluate for the following reasons:.
Abstract: In clinical practice, psychologists frequently participate in the making of vital decisions concerning the classification, treatment, prognosis, and disposition of individuals. In their attempts to increase the number of correct classifications and predictions, psychologists have developed and applied many psychometric devices, such as patterns of test responses as well as cutting scores for scales, indices, and sign lists. Since diagnostic and prognostic statements can often be made with a high degree of accuracy purely on the basis of actuarial or experience tables (referred to hereinafter as base rates), a psychometric device, to be efficient, must make possible a greater number of correct decisions than could be made in terms of the base rates alone. The efficiency of the great majority of psychometric devices reported in the clinical psychology literature is difficult or impossible to evaluate for the following reasons: a. Base rates are virtually never reported. It is, therefore, difficult to determine whether or not a given device results in a greater number of correct decisions than would be possible solely on the basis of the rates from previous experience. When, 1 From the Neuropsychiatric Service, VA Hospital, Minneapolis, Minnesota, and the Divisions of Psychiatry and Clinical Psychology of the University of Minnesota Medical School. The senior author carried on his part of this work in connection with his appointment to the Minnesota Center for the Philosophy of Science.

1,197 citations


"Construct validity in psychological..." refers background in this paper

  • ...The test may serve, at best, only as a source of suggestions about individuals to be confirmed by other evidence (15, 47)....

    [...]

Frequently Asked Questions (9)
Q1. What are the contributions in this paper?

The chief innovation in the Committee 's report was the term constmct validity. This idea was first formulated hy a subcommittee { Meehl and R. C. Cballman ) studying how proposed recommendations would apply to projective techniques, and later modified and clarified by the entire Committee { Bordin, Challman, Conrad, Humphreys, Super, and the present writers ). Writers on validity during the preceding decade had shown a great deal of dissatisfaction with conventional notions of validity, and intro· dnced new terms and ideas, hut the resulting aggregation of types of * Referred to in a preliminary report ( 58 ) as cougruent validity. 

A necessary condition for a construct to be scientifically admissible is that it occur in a nomological net, at least some of whose laws involve observables. 

4. "Learning more about" a theoretical construct is a matter of elaho· rating the nomological network in which it occurs, or of increasing the definiteness of the components. 

A matrix of intercorrclations often points out profitable ways of dividing the construct into more meaningful parts, factor analysis being a useful computational method in such studies. 

These are examples of experiments which have indicated upper limits to test validity: studies of differences associated with the examiner in projective testing, of change of score under alternative directions ("tell the truth" vs. "make yourself look good to an employer"), and of coachability of mental tests. 

If "creativity" is defined as something independent of knowledge, then a correlation of .40 between a presumed test of creativity and a test of arithmetic knowledge would indicate that at least 16 per cent of the reliable test variance is irrelevant to creativity as defined. 

This numerical estimate can sometimes be arrived at by a factor analysis, but since present methods of factor analysis are based on linear relations, more general methods will ultimately be needed to deal with many quantitative problems of construct validation. 

Only if the underlying theory of the trait being measured calls for high item intercorrelations do the correlations support construct validity. 

for example, that the authors want to know if counting the dots on BenderCcstalt figure five indicates "compulsive rigidity," and that the authors take psychiatric ratings on this trait as a criterion.