What are the future works in this paper?

The authors do not wish to claim any higher precision than this. Further research should show if greater precision is possible and thus whether there is scope for making these cutting points more precise.

What is the reason why the standard practice of concluding that a model is a?

The standard practice of concluding that a model is a good model if the fit is acceptable, or no significant MIs are found, is unjustified because non-significance may just be due to lack of power.

What is the probability of rejecting the model when it is exactly correct?

researchers choose α =0.05, so the probability α of rejecting the model when the model is exactly correct (Type The authorerror) is 0.05.

how does a model test if the critical value is exceeded?

In fact, by using a fixed cut-off value for FI in the way described, the FI acts as a statistic for hypothesis testing: that is, if the critical value is exceeded, the model is rejected; if not, the model is accepted.

What is the probability of obtaining a value for 21 close to zero?

If γ22 is 0.5 or larger, the probability of obtaining a value for ψ21 close to zero becomes smaller, which is an indication that there is a misspecification in the model.

(Open Access) Testing Structural Equation Models or Detection of Misspecifications (2009) | Willem E. Saris

Q: What are the contributions in this paper?

However, in this paper the authors question the validity of approaches for model evaluation based on overall goodness of fit indices. The argument against the use of fit indices is that they do not provide an adequate indication of the “ size ” of the model ’ s misspecification. As an alternative method of model evaluation, the authors suggest using the Expected Parameter Change ( EPC ) in combination with the Modification Index ( MI ) and the power of the MI test.

Testing Structural Equation Models or

Detection of Misspecifications?

Abstract

Assessing the correctness of a structural equation model is essential to avoid drawing

wrong conclusions from empirical research. In the past, the CHI2 test was recommended for

assessing the correctness of the model but this test has been criticized because of its

sensitivity to the sample size. As a reaction, an abundance of fit indices has been developed.

The result of these developments is that SEM packages are now producing a large list of fit

measures. One would think that this progression has led to a clear understanding of

evaluating models with respect to model misspecifications.

However, in this paper we question the validity of approaches for model evaluation

based on overall goodness of fit indices. The argument against the use of fit indices is that

they do not provide an adequate indication of the “size” of the model’s misspecification. That

is, they vary dramatically with the values of incidental parameters that are unrelated with the

misspecification in the model. This is illustrated using simple but fundamental models. As an

alternative method of model evaluation, we suggest using the Expected Parameter Change

(EPC) in combination with the Modification Index (MI) and the power of the MI test.

Keywords: Structural Equation Models (SEM), Likelihood Ratio Test (LRT),

Chi-square Goodness-of-fit test, Power, Sensitivity Analysis, Goodness-of-fit Indices,

Expected Parameter Changes (EPC)

Acknowledgements: We appreciate the very useful comments on an earlier version of two

anonymous reviewers.

1. Introduction

In an influential paper, MacCallum, Browne and Sugawara (1996: 131) write: “if the

model is truly a good model in terms of its fit in the population, we wish to avoid concluding

that the model is a bad one. Alternatively, if the model is truly a bad one, we wish to avoid

concluding that it is a good one.” The mentioned two types of wrong conclusions correspond

to what in statistics are known as Type I and Type II errors, whose probabilities of

occurrence are called α and β respectively. Although everybody would agree that α and β

should be as small as possible, in the practice of SEM these probabilities are seldom

controlled. In this paper we will show the consequences of not controlling the probabilities α

and β.

To discuss this issue we first have to define what is a good and a bad model in terms

of misspecifications. MacCallum et al. (1996) do not give a definition of good and bad

models. We suggest that good and bad is defined in this context by the absence (good) or

presence (bad) of misspecifications in the model, as done by Hu and Bentler (1998: 427),

who state that: “a model is said to be misspecified when (a) one or more parameters are

estimated whose population values are zeros (i.e. an over-parameterized misspecified model)

(b) one or more parameters are fixed to zeros whose population values are non-zeros (i.e. an

under-parameterized misspecified model) or both.” In line with Hu & Bentler (1998), we

believe that (b) is the type of misspecification that has more serious consequences, so in this

paper we merely discuss that type of misspecification. In the case of just one parameter of a

model being misspecified, the size of the misspecification is the absolute difference between

the true value of the parameter and the value specified in the analysis. If there is more than

one parameter misspecified, the size of the misspecification of the model is also determined

by the differences between the restricted values in the specified model and the true population

values of the parameters under the correct model. This definition of the size of the

misspecifications deviates from the definition of other scholars such as Fan and Sivo (2006),

who define the size of the misspecification on the basis of the non-centrality parameter or the

power of the test.

Some authors, e.g. Browne & Cudeck (1993) and MacCallum et al. (1996), have

argued that models are always simplifications of reality and are therefore always

misspecified. Although there is truth in this argument, this is not a good reason to completely

change the approach to model testing. What is needed is for (a) models with substantially

relevant misspecifications to be rejected and (b) models with substantially irrelevant

misspecifications to be accepted.

To make our discussion more concrete, we now provide examples using population

data on what we mean by substantially relevant misspecifications and substantially irrelevant

misspecifications.

A substantively relevant misspecification

As an example of a substantively relevant misspecification, consider the fundamental

causal model example M

= γ

+ ζ

(1)

= β

+ γ

+ ζ

(2)

where: E(x

)=E(y

)=E(ζ

)=0, E(x

,ζ

) = 0 and E(ζ

,ζ

) = ψ

and

where all variables are standardized except for the disturbance term.

The purpose of many studies is to determine whether there is an effect of one variable,

i.e. y

on another one, i.e. y

. To test this hypothesis, it is essential to ensure that all variables

causing spurious relationships between the two variables have been introduced. If that is not

the case, the covariance between the disturbance term (ψ

) will not be zero.

If ψ

is other than zero and the researcher specifies a model M

where ψ

= 0, the

effect β

will be over- or under-estimated

and wrong conclusions about this parameter may

be drawn. Depending on the size of the misspecification of the model M

(absolute value of

the deviation of ψ

form zero) a substantial misspecification can be attained, in which case

the model should be rejected.

To make the example more complete, consider the following (true) population

parameter values for model M

(see figure 1); γ

= .4, β

= .0, γ

=.1 and ψ

= .2 .

According to Hu-Bentler’s definition of misspecification, the model M

(see figure 2) is

misspecified since it imposes the incorrect restriction of ψ

= 0.

Figure 1 and 2 about here

The effect (β

) could also be underestimated if the correlation between the disturbance term is negative.

The size of the misspecification is .2, i.e. the difference between the value of ψ

under M

and its value under the correct model M

. Note that the size of the misspecification

would always be .2 regardless of the size of the other parameters in the model.

The consequence of the misspecification is that the effect β

will be overestimated

when fitting M

instead of the true model M

. The expected value would be .2 and not .0 and

so the wrong conclusion will be drawn that the variable y

has an effect on y

. This example

illustrates a case where a misspecification yields wrong conclusions, so this is a case of a

“bad model” that should be rejected.

A substantively irrelevant misspecification

As an example of a model with an irrelevant misspecification we use a simple but

important example from factor analysis. Consider the following 2-factor model M1:

= b

+ e

= b

+ e

= b

+ e

(3)

= b

+ e

where E(f

)= 0 and E(f

) = 1 E(f

) = 0 E(e

) = 0 while E(f

) = ρ

and suppose that our interest lies in assessing whether this is a one-factor model, i.e.

whether the correlation ρ is equal to 1. Let’s note as M

the model that imposes ρ to be equal

to 1. Suppose that population values for the parameters of loadings equal .8 and the

correlation coefficient (ρ) equals .95. In that case, substantive researchers would agree that

the two factors are the same, that is, that there is only one factor and not two. According to

the definition stated previously, in this case the size of the misspecification of M

is .05,

regardless of the size of the other parameters in the model. The size of this misspecification is

substantively irrelevant, and therefore one would not like to reject the model M

since the

model is adequate for all practical purposes even though it is not exactly correct. This

illustrates the situation of a model with substantively irrelevant misspecifications which

should be accepted. Figures 3 and 4 show the corresponding path diagram of the true and the

approximate models.

Figure 3 and 4 about here

The above two examples should not imply that relevant misspecifications occur only

for path analysis models, while irrelevant misspecifications occur only in factor analysis

models. The mentioned problems can occur in both types of models and, of course, in the

combination of both models.

In this paper we want to show, first of all, that the standard procedures for the

evaluation of models do not work as required. After that we want to suggest an alternative

approach for the evaluation of structural equation models.

The structure of the paper is as follows. Section 2 reviews the standard procedure of

using goodness-of-fit testing and goodness-of-fit indexes for evaluation of models in the

SEM tradition. Section 3 illustrates how not controlling for Type I and Type II errors does

not work well, in that, even in the case of very simple but fundamental models, a bad model

is typically accepted whilst a model that is good for all practical purposes is typically

rejected. Section 4 describes the reason for these problems. Section 5 suggests an alternative

approach based on detection of model misspecification. Section 6 describes an illustration

with empirical data of the proposed procedures. Section 7 concludes with a discussion.

2. Traditional goodness-of-fit testing in SEM

In SEM, the goodness-of-fit of a specified model M is typically tested using a chi-

square goodness-of-fit test statistic T (the so called CHI2 test), defined as n (the sample size)

times the value of a discrepancy function that evaluates the differences between the observed

covariance matrix in the sample and the fitted covariance matrices based on the parameter

estimates and the specified model. Different discrepancy functions that take care of different

distributional assumptions can be used (see Bollen, 1989). Under standard assumptions and

when the model holds, T is asymptotically χ

distributed with degrees of freedom (df) equal

to the number of over-identifying restrictions implied by the specified model. So, in the

standard approach, M is rejected when

T > c

(4)

where c

is the critical value of the test, that is, the value for which pr(χ

(df) > c

) =

α, and α being the chosen significance level of the test. Typically, researchers choose α =

Testing Structural Equation Models or Detection of Misspecifications

Figures

Citations

Current Methodological Considerations in Exploratory and Confirmatory Factor Analysis

Design, Evaluation, and Analysis of Questionnaires for Survey Research: Saris/Design, Evaluation, and Analysis of Questionnaires for Survey Research

Applied Psychometrics: Sample Size and Sample Power Considerations in Factor Analysis (EFA, CFA) and SEM in General

How Should the Internal Structure of Personality Inventories Be Evaluated

Multidimensionality and Structural Coefficient Bias in Structural Equation Modeling: A Bifactor Perspective

References

Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives

Alternative Ways of Assessing Model Fit

Comparative fit indexes in structural models

Structural Equations with Latent Variables

Significance tests and goodness of fit in the analysis of covariance structures

Related Papers (5)

Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives

Structural Equations with Latent Variables

Alternative Ways of Assessing Model Fit

Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance

Comparative fit indexes in structural models

Frequently Asked Questions (6)

Q1. What are the contributions in this paper?

Q2. What are the future works in this paper?

Q3. What is the reason why the standard practice of concluding that a model is a?

Q4. What is the probability of rejecting the model when it is exactly correct?

Q5. how does a model test if the critical value is exceeded?

Q6. What is the probability of obtaining a value for 21 close to zero?