Journal Article•DOI•

Construct validity in psychological tests.

Q: What are the contributions in this paper?

The chief innovation in the Committee 's report was the term constmct validity. This idea was first formulated hy a subcommittee { Meehl and R. C. Cballman ) studying how proposed recommendations would apply to projective techniques, and later modified and clarified by the entire Committee { Bordin, Challman, Conrad, Humphreys, Super, and the present writers ). Writers on validity during the preceding decade had shown a great deal of dissatisfaction with conventional notions of validity, and intro· dnced new terms and ideas, hut the resulting aggregation of types of * Referred to in a preliminary report ( 58 ) as cougruent validity.

Q: What is the condition for a construct to be scientifically admissible?

A necessary condition for a construct to be scientifically admissible is that it occur in a nomological net, at least some of whose laws involve observables.

Q: What is the definition of "Learning more about"?

4. "Learning more about" a theoretical construct is a matter of elaho· rating the nomological network in which it occurs, or of increasing the definiteness of the components.

Q: What are examples of tests which have indicated upper limits to test validity?

These are examples of experiments which have indicated upper limits to test validity: studies of differences associated with the examiner in projective testing, of change of score under alternative directions ("tell the truth" vs. "make yourself look good to an employer"), and of coachability of mental tests.

Q: How much of the variance is irrelevant to creativity?

If "creativity" is defined as something independent of knowledge, then a correlation of .40 between a presumed test of creativity and a test of arithmetic knowledge would indicate that at least 16 per cent of the reliable test variance is irrelevant to creativity as defined.

Q: What is the way to test for construct validity?

Only if the underlying theory of the trait being measured calls for high item intercorrelations do the correlations support construct validity.

Lee J. Cronbach¹, Paul E. Meehl•Institutions (1)

Urbana University¹

01 Jul 1955-Psychological Bulletin (Psychol Bull)-Vol. 52, Iss: 4, pp 281-302

TL;DR: The present interpretation of construct validity is not "official" and deals with some areas where the Committee would probably not be unanimous, but the present writers are solely responsible for this attempt to explain the concept and elaborate its implications.

read less

Abstract: Validation of psychological tests has not yet been adequately conceptualized, as the APA Committee on Psychological Tests learned when it undertook (1950-54) to specify what qualities should be investigated before a test is published. In order to make coherent recommendations the Committee found it necessary to distinguish four types of validity, established by different types of research and requiring different interpretation. The chief innovation in the Committee's report was the term construct validity.[2] This idea was first formulated by a subcommittee (Meehl and R. C. Challman) studying how proposed recommendations would apply to projective techniques, and later modified and clarified by the entire Committee (Bordin, Challman, Conrad, Humphreys, Super, and the present writers). The statements agreed upon by the Committee (and by committees of two other associations) were published in the Technical Recommendations (59). The present interpretation of construct validity is not "official" and deals with some areas where the Committee would probably not be unanimous. The present writers are solely responsible for this attempt to explain the concept and elaborate its implications.

...read moreread less

Summary (3 min read)

Jump to: [Four Types of Validation] – [Construct validity would be involved in answering such questions as:] – [Kinds of Constructs] – [Inadequacy of Validation in Terms of Specific Criteria] – [THE NUMERICAL ESTU.1ATE OF CONSTRUCT VALIDITY] – [186 CONSTRUCT VALIDITY IN PSYCHOLOGICAL TESTS] – [THE NOMOLOGICAL NET] – [IC) I] and [Conclusions Regarding the Network after Experimentation]

Four Types of Validation

TI1e categories into which the Recommendations divide validity studies are: predictive validity, concurrent validity, content validity, and constrnct validity.
The first two of these may be considered together as criterion-oriented validation procedures.
Construct validation is important at times for every sort of psychological test: aptitude, achievement, interests, and so on.

Construct validity would be involved in answering such questions as:

These questions become relevant w~en the correlation is advanced as evidence that "test X measures anxiety proneness.".
Alternative interpretations are possible; e.g., perhaps the test measures "academic aspiration," in which case the authors will expect different results if they induce palmar sweating by economic threat.

Kinds of Constructs

At this point the authors should indicate summarily what they mean by a construct, recognizing that much of the remainder of the paper deals with this question .
The logic of construct validation is invoked whether the construct is highly systematized or loose, used in ramified theory or a few simple propositions, used in absolute propositions or probability statements.
In some situations the criterion is no more valid than the test.
Suppose, for example, that the authors want to know if counting the dots on Bender-Ccstalt figure five indicates "compulsive rigidity," and that they take psychiatric ratings on this trait as a criterion.
Suppose, to extend om exam ple, the authors have four t ests on the " predictor" side, over against the psychiatrist's "criterion," and find generally positive correlations among the five variables.

Inadequacy of Validation in Terms of Specific Criteria

The proposal to validate constructual interpretations of tests runs counter to suggestions of some others.
Validation is replaced by compiling statements as to how strongly the test predicts other observed variables of interest.
If two tests are presumed to measure the same construct, a correlation between them is predicted.
A matrix of intercorrclations often points out profitable ways of dividing the construct into more meaningful parts, factor analysis being a useful computational method in such studies.

THE NUMERICAL ESTU.1ATE OF CONSTRUCT VALIDITY

This numerical estimate can sometimes be arrived at by a factor analysis, but since present methods of factor analysis are based on linear relations, more general methods will ultimately be needed to deal with many quantitative problems of construct validation.
Rarely wi11 it be possible to estimate definite "construct saturations," because no factor corresponding closely to the construct will be avail able.
One can only hope to set upper and lower bounds to the '1oading.".
(The estimate is tentative because the test might overlap with the irrelevant portion of the laboratory measure.).
It shonld be particularly noted that rejecting the nuH hypothesis does not finish tl1e job of construct validation ( 35, p. 284).

186 CONSTRUCT VALIDITY IN PSYCHOLOGICAL TESTS

The Logic of Construct Validation Construct validation takes place when an investigator believes that his instrument reflects a particular construct, to which are attached certain meanings.
The proposed interpretation generates specific testable h ypotheses, which are a means of confirming or disconfirming the claim.
TI1e philosophy of science wh ich the authors believe does most justice to actual scientific practice will now be briefly and dogmatically set forth.

THE NOMOLOGICAL NET

The fundamental principles are these : I. Scientifically speaking, to "make clear what something is" means to set forth the laws in which it occurs.
One who claim s that his test reflects a construct cannot maintain h is claim in the face of recurrent negative results because these results show that his construct is too loosely defined to yield verifiable inferences.
In the extreme case t he h ypothesized laws are formulated entirely in terms of descriptive dimensions although not all of the relevant observations have actually been made.
The difficulties in merely "characterizing the surface cluster" are strikingly exhibited by the use of certain special and extreme groups for purposes of construct validation.
Chyatte's confirmation of this prediction ( I 0) tends to support botI1: (a) the theory sketch of "what tl1e Pd factor is, psychologically"; and (b) the claim of the Pd scale to construct validity for this hypothetical factor.

IC) I

This line of thought leads directly to their second important qualification upon the network schema.
When the network is very incomplete, having many strands missing entirely and some constructs tied in only by tenuous threads, then the "implicit definition" of these constructs is disturbingly loose; one might say that the meaning of the constructs is tmderdeterrnined.
Since the meaning of theoretical constructs is set forth by stating the laws in which they occur, their incomplete knowledge of t11e laws of nature produces a vagueness in their constructs (see Hempel, 30; Kaplan, 34; Pap, 51).
The authors will be able to say "what anxiety is" when they know all of the laws involving it; meanwhile, since they are in the process of discovering these laws, they do not yet know precisely what anxiety is.

Conclusions Regarding the Network after Experimentation

The proposition that x per cent of test variance is accounted for by the construct is inserted into the accepted network.
A predicted empirical relationship permits us to test all the propositions leading to that prediction.
Most cases in psychology today lie somewhere between these extremes.
The negative finding shows the bridge between the two to be undependable, but this is all the authors can say.
Success of these derivations testifies to the inductive power of the test-validity statement, and renders it unlikely that an equally effective alternative can be offered.

Did you find this useful? Give us your feedback

Content maybe subject to copyright Report

----L.

CRONBACH

and

MEEID..-----

Construct

Validity

Psychological

Tests

IDATION

of psychological tests has

not

yet been adequately concep·

tua1ized,

the

APA

Committee

Psychological Tests learned when

undertook (1950-54)

specify what qualities should

investigated

before a te

published. In order to make coherent recommendations

the

Committee found it necessary

distinguish four types of validity,

established by different types of research

and

requiring different interpre·

tation.

The

chief

inn

ovation in the

Com

mittee's report was the term

constmct validity.*

is idea

was

fir

st formulated hy a subcommittee

{Mee

and

Cballman) studyi

how proposed recommendations

would apply to proje

ive techniques, and later modified

and

clarified

by the entire Committee {Bordin, Challman, Conrad,

mphre

Super, and

the

present writers).

The

statements agreed

upon

tbe

Committee (and by committees of two other associations) were pub-

lished

the Technical Recommendations ( 59).

The

present interpre-

tation

construct validity

not

"official" and deals with some areas

in which

the

Committee would probably

not

unanimous.

e present

writers are solely responsible for this

att

pt to explain the concept

and

elaborate its implications.

Identification

construct validity

was

not

isolated development.

Writers

validity during

the

preceding

decade

had shown a great

deal of dissatisfaction

with

conventional notions of validity, and intro·

dnced new terms

and

ideas,

hut

the resulting aggregation

types of

* Referred to in a preliminary report (

58)

cougrue

validity.

NOTE:

e second a

uthor

worked on this problem in co

ection with his appoint·

mcnt to

the

Minnesota

Cente

r for Philosophy of

Sci

ence. \

are indebted to

the

er members of

the

enter

(Herb

Feigl,

l'vli

chael Scriven, \.Vilfricl Sellars), and

D. L.

Thi

stlcth

waitc

the

Univ

sity

Illinois, for

eir

m;1jor

c:o11trilmtions

our

thi11king

a11c1

their suggestions for improving this paper. T he paper

rst appc;ll'

l'<I

dwlogie:rl

l311llcti11

Jul

y 1955,

repri

here,

th minor :ilti;rations.

1111

issio

Ilic

editor

nd of

the

authors.

174

CONSTRUCT VALIDITY IN PSYCHOLOGICAL TF.STS

validity seems only

stirred the

mudd

y waters. Portions of

the

distinctions we shall discuss are implicit

Jenkins' paper, "Validity for

'What?" { 33), Gulliksen's "I

rinsic Validity" (27),

Goo<le

nough's di

tinction between tests as "signs"

and

"samples" (22), Cronbach's sepa·

ration of "logical"

and

"empirical" validity (

), Guilford's "factorial

lidity" (25),

and

sier's pape

"face validity" and "validity gen-

eralization"

(

49,

50). Hel

Pea

k (

52)

comes close to

explic

stat

ment

of construct validity as we shall pre

sent

it.

Four

Type

s of Validation

TI1e

categories

into

which the Recommendations divide

lidity

studies are: predictive validity, concurrent validity, c

ontent

validity, and

constrnct vali

ty.

fir

st two of these may

considered together

criterion-oriented validation procedures.

TI1e

pattern of a criterion-oriented study

familiar.

The

investigator

primarily intere

ed in some criterion whi

hes

predict.

administers the test, obtains

independent criterion measure on the

same subjects, and computes a correlation.

the criterion

obtained

some time after

tile test

given, he is studying predictive validity.

test score

and

criterion score are

termined

essentially

e same

time,

studying concurrent validity. Concurrent validity is studied

when

one

test

proposed as a s

sti

tute

for another (for exampl

when

a multiple-choice form

spelling test is substi

tuted

for taking dicta-

tion),

a test

shown to correlate

with

some contemporary criterion

(e.g

psychiatric diagnosis).

Conte

validity is established by showing that the test items arc

a sample

a universe in

whic11

the

investigator is interested.

Content

lidity is ordinarily

tablished deductivel

by defining a un

i·

verse of items

and

sampling systematically within this universe to

establish

the

test.

Construct validation

lved whenever a test

int

erpreted

m0<1s

ure of some attribute or quality which is

not

"operationally

defined."

TI1

e problem faced by the investigator is, '

'\Vhat

constn1cts

accou

for ·variance in test

rformance?" Construct

lidity calls for

110 new scientific

proach.

ch curre

research on tests of per-

sonn

lity (9) is consl'ruct

lidation, usually without the

ben

efi

t of a

clear formulation of

!'hi

s process.

struct validity is not to he identified sol

ely

by particular investi-

Cronbach and P.

Meehl

gative procedures,

but

the

orientation

the

investigator. Criterion-

oriented validity,

Bechtoldt emphasizes (

1245), "involves

the

acceptance of a set of operations

adequate definition of whatever

measured." \.Vhen

investigator believes

that

no criterion

available

him

fully valid,

perforce becomes interested in con-

struct validity because this

the only way

avoid the "infinite frus-

tration"

of relating every criterion

some more ultimate standard ( 21).

content validation, acceptance of

the

universe

content as defining

the variable

measured

essential.

Construct

validity must

in-

vestigated whenever

criterion

universe

content

accepted as

entirely adequate to define the quality to

measured. Determining

what psychological constructs account for test performance

desirable

for almost any test. Thus, although

the

MMPI

was originaJly estab-

lished

the

basis of empirical discrimination between patient groups

and

so-called normals (concurrent validity), continuing research has

tried to provide a basis for describing the personality associated with

each score pattern.

Such interpretations permit

the

clinician

predict

performance with respect

criteria which have

not

yet

been employed

in empirical validation studies

(cf. 46, pp. 49-50, 110-11).

Vie

can distinguish among

the

four types

validity by noting

that

each involves a different emphasis

the criterion.

predictive or con-

current validity,

the

criterion behavior

of concern

the

tester, and

may have

concern whatsoever with the type

behavior exl1ibited

in the test. (An employer does

not

care if a worker can manipulate

blocks,

but

the score

the block test may predict something

cares

about.)

Content

validity

studied when

the

tester is concerned

with

the type of bel1avior involved in

the

test performance. Indeed, if the

test

a work sample,

the

behavior represented in

the

test

may

end in itself. Construct validity

ordinarily studied when

the

tester

has no definite criterion measure of the quality

with

which

con-

cerned,

and

must use indirect measures. Herc the trait

quality un-

derlying the test

central importance, rather

than

either

the

test

behavior

the scores

the

criteria (

59,

14).

Construct validation

important

times for every sort

psycho-

logical test: aptitude, achievement, interests, and so on. Thurstone's

statement

interesting

this connection:

In the

field

of intelligence tests,

used to

common to define validity

the correlation between a test score and some outside criterion.

\Ve

have reached a stage of sophistication where the test-criterion correlation

176

CONSTRUCT VALIDITY

PSYCHOLOGICAL TESTS

too coarse.

obsolete.

attempted

to ascertain the validity

a test for the second space-factor, for example, we would have

get

judges [to] make reliable judgments

ah<:>ut

people as to this factor.

Ordinarily their [the available

i.u~ges']

r~tm~s

would

b~.

v~lue

a criterion. Consequently, vahd1ty studies m

the

cogmbve functions

now depend

criteria of internal consistency

...

(60, p.

3).

Construct validity would

involved in answering such questions as:

what extent

this test of intelligence culture-free? Does this

test

of "interpretation of data" measure reading ability, quantitative reason-

ing,

response sets? How does a person with A in Strong Accountant,

and

B in Strong CPA, differ from a person who has these scores

reversed?

Example of construct validation procedure. Suppose measure X

cor-

relates

.50

with

the

amount

of palmar sweating induced when we

tell a student

that

has failed a Psychology I exam. Predictive validity

X for Y

adequately described

the

coefficient,

and

a statement

the experimental

and

samp1ing conditions.

someone were

ask,

"Isn't

there perhaps another way to interpret this correlation?" or

"\Vhat

other kinds of evidence can you bring

support your interpre-

tation?"

we would hardly understand 'vhat

was asking because no

interpretation has been made. These questions become relevant

w~en

the

correlation

advanced

evidence

that

"test

X measures anxiety

proneness." Alternative interpretations are possible; e.g., perhaps

the

test measures "academic aspiration," in which case we will expect dif-

ferent results if we induce palmar sweating by economic threat.

then reasonable to inquire about other kinds

evidence.

Add these facts from further studies:

Test

X correlates

.45

with fra-

ternity brothers' ratings

"tenseness."

Test

X correlates

.55

with

amount

inteJlectual inefficiency induced by P'dinful electric shock,

and

.68 with

the

Taylor Anxiety Scale.

Mean

X score decreases among

four diagnosed groups in this order: anxiety state, reactive depression,

"normal,"

and

psychopathic personality. And palmar sweat under threat

failure in Psychology I correlates .60 with

threat

of failure in mathc·

matics. Negative results eliminate competing explanations of the X

score; thus, findings of negligible correlations between X and social

class, vocational aim, and value-orientation make it fairly safe to reject

the suggestion

that

X measures "academic aspiration."

can have

substantial confidence

that

X docs

lll<~a

ure

anxiety proneness if the

177

Cronbach

and

P. E .

Meehl

urr

the

ory

anxiety can embrace

the

iates which yield positive

correlations,

and

does

not

predict correlations where we found

none.

Kinds of Constructs

this

point

hould

indicate s

umm

arily

what

ean

by a con-

struct, recognizing

that

much

the

remainder

the

paper deals wi

thi

s question. A

construct

some

stu

lated

attribute

people,

assumed

reflected

test performance.

test validation

the

attribute

about

which

make s

tat

eme

interpreting a

st is

a cons

tru

ct.

expect a person at

any

tim

possess

not

possess

a qualitative

attribute

(amnesia)

tructur

ssess

some

degree

quan

titative

attribute

{cheerfulness). A

struct

s certain asso-

ciated mea

ning

s carried

statements

neral character:

Per

sons

who possess this

attribut

e will,

situation

act

mann

er Y (with

a s

tat

probability

The

logic of construct validation is invoked

whether

the

con

ruct

is highly systematized

loo

se, used in ramified

theory

a few s

impl

e propositions, used

absolute propositions

probability statements.

seek

specify

how

one

defend

proposed interpretation

a test; we are

not

recommending

any

one

type of interpretation.

The

constructs in whi

sts are to

interpreted

are

certai

not

likely

physiological.

Most

oft

ey will

traits such as "

latent

stility"

"variable

mood,"

descriptions

terms

educa-

tional objective,

"ability

plan

experiments."

For

benefit

readers who may

have

been

influenced by certain exegeses

MacCor-

quodale a

Meehl

(40), let us he

emphasize:

Whether

not

interpretation

a test's properties

relations involves

estions

struct validity is

decided

examining

entire

body

evidence offered,

togeth

with

what

is asserted

about

the

test

the

context

this evidence.

Proposed

den

tifications

cons

tru

cts

aJlegedly

measured by

e test

with

structs

other

sciences (e.g., genetics,

neuroan

ato

biochemis

y) make

y one cJass of

nstruct-

validity claim

and

rather

minor

one at present. Spa

does

not

permit

full analysis

the relation

e present paper to

the

Corquodalc-

Meehl

distin

tion

between

hypothetical constructs

and

in·

l'crvcning

variables.

e philosophy

science

pertinent

e present

paper is

t forth lat

the

section entitled,

nomological net·

work."

CONSTRUCT VALIDITY

PSYCHOLOGICAL TESTS

The

Relation of Constmcts to

riteria"

CRITICAL

VIEW

OF'

THE

CRITERION IM

PLI

unqu

estionable criterion

found

a practical operation,

may

establis

hed

as a conse

quence

operational definition.

Typically, however,

the

psychologist is unwilling to use

the

directly

operational approach

because

interested in

building

a theory

about

a generalized construct. A theorist trying

relate behavior

"hunger"

almost

certain

ves

that

term

with

meanin

other

than

opera-

tion "elapsed-ti

-since-feeding."

he is

ncerned with

hun

ger as a

tissue

ed,

will

not

time lapse as equivalent

his construct

because

fails to cons

ide

among

her

thing

s, energy expenditure

the

animal.

some situations

the

criterion is no m

ore

valid

than

the

test. Sup-

pose, for example,

that

want

know if

untin

the

dots

Bender-

Ccstalt

gure

ve indicates

ompulsive rigidity,"

and

that

we take

psychiatric ratings

this trait

a criterion.

Even

a conventional

report

the

resulting correlation will say

som

ethin

about

the

ext

ent

and

intensity

the

psychiatrist's contacts

and

should describe

his

qualifica-

tions

(e.g., diplomatc s

tatu

analyzed?).

\,Vhy

report

ese facts? Because data are needed

indicate

whether

the

crite

rion

any

good.

"Co

mpul

sive rigi

dit

y" is

not

really

intended

mean

"social stimulus value

psychiatrists."

The

implied

ait

volves a range

behavior-dispositions which may

very

imp

erfectly

sampled by

e psychiatrist. Suppose dot-counting does

not

occur in

a particular

patient

and

yet we find

that

the

psychiatri

has

him

as "rigi

d."

'Vh

estioned

the

hiatrist tells us

that

patient

w-as

rather

easy, free-w

eling so

; however, t

pat

ent

did

lean over

straighten

out

a skewed

blott

er,

and

thi

viewed agai

nst

certa

facts, tipped

the

scale in favor of a "rigid" rating.

the face

it,

counting

Bende

dots

may

just as good

poor)

a sa

mple

of the

compu

lsive-rigidity domain as stra

ighte

ning desk

blott

ers is.

upp

ose,

extend

example,

have

four

tests on

e "

edictor"

side, over against the psychiatrist's "criterion

and

find ge

rally posi-

tive

corre

lat

ions among the

fiv

e variables. Surely

artificial

rmd

arhi-

t·rary

impose

the

·est·sliould-pre

cli

ct-critcrion" pattern on

11d1

clala.

.<>yc

hiatrist

mple

s verbal con

tent

cxpr~sivc

pattern, voice, pos-

ture

, etc. T he psychologist !lmnplcs verb

cout

cnt,

pcr

c:c

ptio11,

exprcs-

Cronbach

and

Meehl

sive pattern, etc.

Our

proper conclusion

that, from this evidence,

the

four tests and

the

hiatrist all assess some common factor.

111e

asymmetry between the

"test"

and the so-designated

riterion"

arises only because the terminology of predictive validity has become

a commonplace in test analysis.

this

udy where a construct

the

central concern, any distinction between the merit of

the

t and

criterion variables would

justified only if

had already been shown

that

the

psychiatrist's theory and operations were excellent measures

the attribute.

Inadequacy of Validation in Terms

Specific Criteria

e proposal to

vali

date

nstructual interpretations of tests runs

counter to suggestions of some others.

Spiker and McCandless (

57)

favor an operational approach. Validation

replaced by compiling state-

ments

to how strongly the test predicts other observed variables

interest.

avoid requiring

that

each new variable

investigated com-

pletely by itself, they allow two variables

collapse into one whenever

the

properties of

the

operationally defined measures are

the

same:

"If

a new test

demonstrated

predict

the

scores on an older, wcll-

establishcd test,

then

an evaluation of the predictive power of

the

older

may

used for

the

new one."

But

accurate inferences are po

ible

only if the two tests correlate so highly t

hat

there

negligible reliable

variance in either test, independent of the other.

Where

the

corre-

spondence

less close, one must either retain all the separate variables

operationally defined or embark on construct validation.

e practical user

tests must rely on constructs

some generality

make predictions about new situations.

Test

X could be used

predict palmar sweating

the

face

failure without invoking any

construct,

but

a counselor

more likely

asked

forecast behavior

in diverse or even unique

situations

for

which

the

correlation of test X

unknown. Significant predictions rely on knowledge accumulated

around the generalized construct of anxiety.

The

Techni

cal

Recom-

mendations state:

ordinarily necessary

evaluate construct validity by

int

egrating

evidence from many different sources.

The

problem

construct valida-

tion hccomes especially acute in

e clinical field since for many of the

co11strncts

dealt with

not

a question

finding an imperfe

criterion

h11I

of finding any criterion

all. The psychologi

inter

ted in

('0

180

CONSTRUCT VALIDITY

PSYCHOLOGICAL TESTS

struct validity for clinical devices

concerned with making an

timate

of a hypothetical internal process,

ctor,

sys

tem, structure, or state and

ca~

not

~xpect

to find. a

~lea

unitary behavioral criterion. An attempt

identify any one cntenon m

eas

e or any composite

the criterion

aimed

is,

however, usually unwarranted

(59,

pp. 14-15).

This appears

conflict with arguments

for

specific criteria promi-

nent

places in the testing literature. Thus Anastasi (2) makes many

statements

the

latter character:

"It

only

a measure

a speci-

ficall

y defined criterion

that

a test can be objectively validated

all . . .

cJa

that

a test measures anything over

and

above

s c

terion

pure speculation" (p.

). Yet elsewhere

this

article supports construct

validation. Tests can be profitably interpreted

"know t

relation-

ships between

the

tested behavior

...

and other behavior

mples,

none of these behavior samples necessarily occupying the preeminent

position of a

criterion" ( p. 75). Factor analysis with several partial

criteria might be used

study whether a test measures a postulated

"general learning ability."

e data demonstrate specificity

ability

instead, such specificity

"useful in its own right in advancing our

knowledge

behavior;

should

not

be construed as a weakness of

the

tests" (p. 75).

e depart from Anastasi

two points. She writes, "

e validity of

a psychological test should

not

nfused with an analysis of the

factors which determine the behavior under consideration." W

how-

ever, regard such analysis

a mo

important type of validation. Second,

she refers to

"the

will·o'-the-wisp

psychological

proce~ses

which arc

from performance" (2, p. 77).

While

we agree

that

psychologi-

l processes are elusive, we are sympathetic

attempts

formulate

and clarify constructs which are evidenced by performance

but

distiuct

from

it. Surely an inductive inference based on a pattern

corrcl

atio11s

cannot be dismissed as "pure speculation."

SPECIFIC

CRITERIA USED

TEMPORAR

ILY:

THE

"BOOTSTRAPS"

F.J

•:CT

Even when a test

constructed on the basis of a specific c

ri1

·cr

io11

may ultimately be judged

have greater construct

valiclil'y

than I he

criterion.

start with a vague concept which we associate with

1tai11

observations.

then discover empirically

that

these

ohservntio11s

co-vary with some other observation which pos

sess

es greater reliability

more intimately

ela

with relevant experimental

hnn

gc.~

than

181

Cronbach and P. E

Meelil

e original measure,

both.

For

exa

mple,

the

notion

era

tur

arises because some objects

feel

hotter

the

touch

than

hers.

The

expansion

a mercury column does

not

have face validity

an i

hoh1ess.

But

turns

out

that

(a)

there

is a statistical relati

een expansion

and

sensed temperature; (

servers employ

the

mercury method with good interobserver

ent

; (

the regularity

observed relations is increased by using

the

thermometer

{e.g., melt-

ing

points

mpl

e same

mat

erial vary little

the

thermome

ter; we o

ain nearly linear relations between mercury measures and

pressure

a gas). Finally, ( d) a theoretical

stru

ure involving unob-

servable

microevents-the

kinetic

theory-is

worked

out

which explains

the relation

mercury expansion to heat. 111is whole process

con-

ceptual enri

chment

begins w

ith

what in retrospect we see as

ex-

treme

fallible

"criterion"-the

man temperature sense. T

hat

original

criterion has

now

been relega

d to a peripheral position.

\Ve

have lifted

ourselves by

our

boostraps,

a legitimate a

fruitful way.

Similarly,

the

Binet

scale was first valued because children's scores

tended

to agree with judgme

s by schoolteachers.

had

t shown

this agreement,

would have been discarded al

g with reaction

tim

and

the

other measures

ability previous

tried.

Teacher

judgments

once

constituted

e cri

ter

ion against which

the

individual

telligence

test

was validated.

But

if today a child's

135

and

three

teachers complain

about

how

shtpid

he is, we do

not

conclude

that

the

test

has failed.

Quite

e contrary, if

error

test proce

dure

can

argued, wc

treat

score

lid

tat

ent

about

important

quality,

and

define our task

that

finding

out

what

variabl

es-persona

lity,

udy skill

c.-moclify achievement

distort

teac

her

judgment.

Expcrjmentation to

vestigate

tru

Validity

VALIDATION PROCEDURES

'\Ve

n use many

methods

constrnct validation.

Att

entio

sho

uld

particnlarly

dra\.vn

Macfarlane's survey of

ese me

ods as

apply to projective devices ( 41) .

Croup

<liff

crcnces.

our

understanding

a construct leads us to

expect

two

gronps to cliffer on

the

test

, this expectation

may

tested

dir

'l'l111s

Thurslo11c

and

Chavc validated the Scale for Measuring

CONSTRUCT VALIDITY

PSYCHOLOGICAL

STS

Attitude

Toward

Church

by showing score differenc

between

chur

ch members

and

nonchurchgoers. Churchgoing is

not

tbe

criterion

attitude, for

the

purpose

the

test

is to measure something other

than

the

crude sociological fact of church

attendanc

other

hand,

failure to find a difference would have seriously challenged

the

Only coarse correspondence between

test

and

group designation

expected.

Too

great a correspondence between

the

two would indicate

that

the

test

some degree invalid, because members

the

groups

are expected to overlap

the

test. Intelligence test

item

s are selected

initially on

the

basis

a correspondence

age,

but

item

that

corr

lates

.95

with age in

elementary school

le would sure

be suspect.

Correlation

matri

ces

and

factor

analysis

two tests are presu

med

measure

the

same

con

struct, a correlation between

them

is predicted.

(An exception is noted where some second attribute has

sitive load-

ing

the

first test

and

negative loading

the

second

test

;

then

a low

correlation is expected.

111is

stable interpretation provided

external

mea

sure

either

the

first or

e second variable exists.)

the

obtained correlation departs from

the

expectation, however,

there

way

know whether

the

fault lies

test A,

test

the

formulation

e construct. A matrix

intercorrclations often points

out

profitable

ways

dividing

the

construct

into

meaningful parts, factor

analysis being a

eful computational method

such

udies.

Guilford (26) has discussed

the

place

factor analysis

construct

validation. His

statem

nts

may

extracted as

fo11ows:

he personnel

psychologist wishes

know 'why his tests are valid.'

can

place tests

and

practical criteria

a matrix

and

ctor

to identify 'real dimen-

sions

human

personality.' A factorial description

exact

and

stable;

economical in explanatio

leads to

the

creation

pure tests

which can be co

ined

to predict co

lex behaviors."

is clear

factors

here

function as constructs. Eysenck, in his "criterion analysis"

(18), goes farther

than

Guilford,

and

shows

that

factoring can

used

explicitly to

test

hypoth

eses

about

constructs.

Factors may

may

not

weighted with surplus meaning. Certainly

when they a;e regarded

"real dimensions" a great

surplus

meaning

implied,

and

the interpreter

must

shoulder a

s11hs

ta11tial

burden

proof.

The

alternative view is to regard factors as defining a

working reference frame,

cated in a

convenient

manner

in tltc

"space"

define

all behaviors

a given

lyp

\Vhi

eh set

ctors from a

HTML Viewer

Frequently Asked Questions (9)

Q1. What are the contributions in this paper?

The chief innovation in the Committee 's report was the term constmct validity. This idea was first formulated hy a subcommittee { Meehl and R. C. Cballman ) studying how proposed recommendations would apply to projective techniques, and later modified and clarified by the entire Committee { Bordin, Challman, Conrad, Humphreys, Super, and the present writers ). Writers on validity during the preceding decade had shown a great deal of dissatisfaction with conventional notions of validity, and intro· dnced new terms and ideas, hut the resulting aggregation of types of * Referred to in a preliminary report ( 58 ) as cougruent validity.

Q2. What is the condition for a construct to be scientifically admissible?

A necessary condition for a construct to be scientifically admissible is that it occur in a nomological net, at least some of whose laws involve observables.

Q3. What is the definition of "Learning more about"?

4. "Learning more about" a theoretical construct is a matter of elaho· rating the nomological network in which it occurs, or of increasing the definiteness of the components.

Q4. What is the useful way of dividing a construct into more meaningful parts?

A matrix of intercorrclations often points out profitable ways of dividing the construct into more meaningful parts, factor analysis being a useful computational method in such studies.

Q5. What are examples of tests which have indicated upper limits to test validity?

These are examples of experiments which have indicated upper limits to test validity: studies of differences associated with the examiner in projective testing, of change of score under alternative directions ("tell the truth" vs. "make yourself look good to an employer"), and of coachability of mental tests.

Q6. How much of the variance is irrelevant to creativity?

If "creativity" is defined as something independent of knowledge, then a correlation of .40 between a presumed test of creativity and a test of arithmetic knowledge would indicate that at least 16 per cent of the reliable test variance is irrelevant to creativity as defined.

Q7. How can the authors determine the degree of construct validity?

This numerical estimate can sometimes be arrived at by a factor analysis, but since present methods of factor analysis are based on linear relations, more general methods will ultimately be needed to deal with many quantitative problems of construct validation.

Q8. What is the way to test for construct validity?

Only if the underlying theory of the trait being measured calls for high item intercorrelations do the correlations support construct validity.

Q9. What is the criterion for a psychiatric rating?

for example, that the authors want to know if counting the dots on BenderCcstalt figure five indicates "compulsive rigidity," and that the authors take psychiatric ratings on this trait as a criterion.

Construct validity in psychological tests.

Summary (3 min read)

Four Types of Validation

Construct validity would be involved in answering such questions as:

Kinds of Constructs

Inadequacy of Validation in Terms of Specific Criteria

THE NUMERICAL ESTU.1ATE OF CONSTRUCT VALIDITY

186 CONSTRUCT VALIDITY IN PSYCHOLOGICAL TESTS

THE NOMOLOGICAL NET

IC) I

Conclusions Regarding the Network after Experimentation

Citations

Cites background from "Construct validity in psychological..."

Additional excerpts

Cites background from "Construct validity in psychological..."

References

"Construct validity in psychological..." refers result in this paper

"Construct validity in psychological..." refers background in this paper

"Construct validity in psychological..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (9)

Q1. What are the contributions in this paper?

Q2. What is the condition for a construct to be scientifically admissible?

Q3. What is the definition of "Learning more about"?

Q4. What is the useful way of dividing a construct into more meaningful parts?

Q5. What are examples of tests which have indicated upper limits to test validity?

Q6. How much of the variance is irrelevant to creativity?

Q7. How can the authors determine the degree of construct validity?

Q8. What is the way to test for construct validity?

Q9. What is the criterion for a psychiatric rating?