scispace - formally typeset
Open AccessJournal ArticleDOI

Testing Structural Equation Models or Detection of Misspecifications

TLDR
In this article, the authors question the validity of approaches for model evaluation based on overall goodness-of-fit indexes, arguing that they do not provide an adequate indication of the "size" of the model's misspecification.
Abstract
Assessing the correctness of a structural equation model is essential to avoid drawing incorrect conclusions from empirical research. In the past, the chi-square test was recommended for assessing the correctness of the model but this test has been criticized because of its sensitivity to sample size. As a reaction, an abundance of fit indexes have been developed. The result of these developments is that structural equation modeling packages are now producing a large list of fit measures. One would think that this progression has led to a clear understanding of evaluating models with respect to model misspecifications. In this article we question the validity of approaches for model evaluation based on overall goodness-of-fit indexes. The argument against such usage is that they do not provide an adequate indication of the “size” of the model's misspecification. That is, they vary dramatically with the values of incidental parameters that are unrelated with the misspecification in the model. This is illus...

read more

Content maybe subject to copyright    Report

2
Testing Structural Equation Models or
Detection of Misspecifications?
1
Abstract
Assessing the correctness of a structural equation model is essential to avoid drawing
wrong conclusions from empirical research. In the past, the CHI2 test was recommended for
assessing the correctness of the model but this test has been criticized because of its
sensitivity to the sample size. As a reaction, an abundance of fit indices has been developed.
The result of these developments is that SEM packages are now producing a large list of fit
measures. One would think that this progression has led to a clear understanding of
evaluating models with respect to model misspecifications.
However, in this paper we question the validity of approaches for model evaluation
based on overall goodness of fit indices. The argument against the use of fit indices is that
they do not provide an adequate indication of the “size” of the model’s misspecification. That
is, they vary dramatically with the values of incidental parameters that are unrelated with the
misspecification in the model. This is illustrated using simple but fundamental models. As an
alternative method of model evaluation, we suggest using the Expected Parameter Change
(EPC) in combination with the Modification Index (MI) and the power of the MI test.
Keywords: Structural Equation Models (SEM), Likelihood Ratio Test (LRT),
Chi-square Goodness-of-fit test, Power, Sensitivity Analysis, Goodness-of-fit Indices,
Expected Parameter Changes (EPC)
1
Acknowledgements: We appreciate the very useful comments on an earlier version of two
anonymous reviewers.
.

3
1. Introduction
In an influential paper, MacCallum, Browne and Sugawara (1996: 131) write: “if the
model is truly a good model in terms of its fit in the population, we wish to avoid concluding
that the model is a bad one. Alternatively, if the model is truly a bad one, we wish to avoid
concluding that it is a good one.” The mentioned two types of wrong conclusions correspond
to what in statistics are known as Type I and Type II errors, whose probabilities of
occurrence are called α and β respectively. Although everybody would agree that α and β
should be as small as possible, in the practice of SEM these probabilities are seldom
controlled. In this paper we will show the consequences of not controlling the probabilities α
and β.
To discuss this issue we first have to define what is a good and a bad model in terms
of misspecifications. MacCallum et al. (1996) do not give a definition of good and bad
models. We suggest that good and bad is defined in this context by the absence (good) or
presence (bad) of misspecifications in the model, as done by Hu and Bentler (1998: 427),
who state that: “a model is said to be misspecified when (a) one or more parameters are
estimated whose population values are zeros (i.e. an over-parameterized misspecified model)
(b) one or more parameters are fixed to zeros whose population values are non-zeros (i.e. an
under-parameterized misspecified model) or both.” In line with Hu & Bentler (1998), we
believe that (b) is the type of misspecification that has more serious consequences, so in this
paper we merely discuss that type of misspecification. In the case of just one parameter of a
model being misspecified, the size of the misspecification is the absolute difference between
the true value of the parameter and the value specified in the analysis. If there is more than
one parameter misspecified, the size of the misspecification of the model is also determined
by the differences between the restricted values in the specified model and the true population
values of the parameters under the correct model. This definition of the size of the
misspecifications deviates from the definition of other scholars such as Fan and Sivo (2006),
who define the size of the misspecification on the basis of the non-centrality parameter or the
power of the test.
Some authors, e.g. Browne & Cudeck (1993) and MacCallum et al. (1996), have
argued that models are always simplifications of reality and are therefore always
misspecified. Although there is truth in this argument, this is not a good reason to completely
change the approach to model testing. What is needed is for (a) models with substantially

4
relevant misspecifications to be rejected and (b) models with substantially irrelevant
misspecifications to be accepted.
To make our discussion more concrete, we now provide examples using population
data on what we mean by substantially relevant misspecifications and substantially irrelevant
misspecifications.
A substantively relevant misspecification
As an example of a substantively relevant misspecification, consider the fundamental
causal model example M
1
:
y
1
= γ
11
x
1
+ ζ
1
(1)
y
2
= β
21
y
1
+ γ
22
x
1
+ ζ
2
(2)
where: E(x
i
)=E(y
i
)=E(ζ
i
)=0, E(x
i
j
) = 0 and E(ζ
1
2
) = ψ
21
and
where all variables are standardized except for the disturbance term.
The purpose of many studies is to determine whether there is an effect of one variable,
i.e. y
1
on another one, i.e. y
2
. To test this hypothesis, it is essential to ensure that all variables
causing spurious relationships between the two variables have been introduced. If that is not
the case, the covariance between the disturbance term
21
) will not be zero.
If ψ
21
is other than zero and the researcher specifies a model M
0
where ψ
21
= 0, the
effect β
21
will be over- or under-estimated
2
and wrong conclusions about this parameter may
be drawn. Depending on the size of the misspecification of the model M
0
(absolute value of
the deviation of ψ
21
form zero) a substantial misspecification can be attained, in which case
the model should be rejected.
To make the example more complete, consider the following (true) population
parameter values for model M
1
(see figure 1); γ
11
= .4, β
21
= .0, γ
22
=.1 and ψ
21
= .2 .
According to Hu-Bentler’s definition of misspecification, the model M
0
(see figure 2) is
misspecified since it imposes the incorrect restriction of ψ
21
= 0.
Figure 1 and 2 about here
2
The effect (β
21
) could also be underestimated if the correlation between the disturbance term is negative.

5
The size of the misspecification is .2, i.e. the difference between the value of ψ
21
under M
0
and its value under the correct model M
1
. Note that the size of the misspecification
would always be .2 regardless of the size of the other parameters in the model.
The consequence of the misspecification is that the effect β
21
will be overestimated
when fitting M
0
instead of the true model M
1
. The expected value would be .2 and not .0 and
so the wrong conclusion will be drawn that the variable y
1
has an effect on y
2
. This example
illustrates a case where a misspecification yields wrong conclusions, so this is a case of a
“bad model” that should be rejected.
A substantively irrelevant misspecification
As an example of a model with an irrelevant misspecification we use a simple but
important example from factor analysis. Consider the following 2-factor model M1:
x
1
= b
11
f
1
+ e
1
x
2
= b
21
f
1
+ e
2
x
3
= b
31
f
2
+ e
3
(3)
x
4
= b
41
f
2
+ e
4
where E(f
i
)= 0 and E(f
i
) = 1 E(f
i
e
j
) = 0 E(e
i
e
j
) = 0 while E(f
1
f
2
) = ρ
and suppose that our interest lies in assessing whether this is a one-factor model, i.e.
whether the correlation ρ is equal to 1. Let’s note as M
0
the model that imposes ρ to be equal
to 1. Suppose that population values for the parameters of loadings equal .8 and the
correlation coefficient (ρ) equals .95. In that case, substantive researchers would agree that
the two factors are the same, that is, that there is only one factor and not two. According to
the definition stated previously, in this case the size of the misspecification of M
0
is .05,
regardless of the size of the other parameters in the model. The size of this misspecification is
substantively irrelevant, and therefore one would not like to reject the model M
0
since the
model is adequate for all practical purposes even though it is not exactly correct. This
illustrates the situation of a model with substantively irrelevant misspecifications which
should be accepted. Figures 3 and 4 show the corresponding path diagram of the true and the
approximate models.
Figure 3 and 4 about here

6
The above two examples should not imply that relevant misspecifications occur only
for path analysis models, while irrelevant misspecifications occur only in factor analysis
models. The mentioned problems can occur in both types of models and, of course, in the
combination of both models.
In this paper we want to show, first of all, that the standard procedures for the
evaluation of models do not work as required. After that we want to suggest an alternative
approach for the evaluation of structural equation models.
The structure of the paper is as follows. Section 2 reviews the standard procedure of
using goodness-of-fit testing and goodness-of-fit indexes for evaluation of models in the
SEM tradition. Section 3 illustrates how not controlling for Type I and Type II errors does
not work well, in that, even in the case of very simple but fundamental models, a bad model
is typically accepted whilst a model that is good for all practical purposes is typically
rejected. Section 4 describes the reason for these problems. Section 5 suggests an alternative
approach based on detection of model misspecification. Section 6 describes an illustration
with empirical data of the proposed procedures. Section 7 concludes with a discussion.
2. Traditional goodness-of-fit testing in SEM
In SEM, the goodness-of-fit of a specified model M is typically tested using a chi-
square goodness-of-fit test statistic T (the so called CHI2 test), defined as n (the sample size)
times the value of a discrepancy function that evaluates the differences between the observed
covariance matrix in the sample and the fitted covariance matrices based on the parameter
estimates and the specified model. Different discrepancy functions that take care of different
distributional assumptions can be used (see Bollen, 1989). Under standard assumptions and
when the model holds, T is asymptotically χ
2
distributed with degrees of freedom (df) equal
to the number of over-identifying restrictions implied by the specified model. So, in the
standard approach, M is rejected when
T > c
α
(4)
where c
α
is the critical value of the test, that is, the value for which pr(χ
2
(df) > c
α
) =
α, and α being the chosen significance level of the test. Typically, researchers choose α =

Citations
More filters
Journal ArticleDOI

Current Methodological Considerations in Exploratory and Confirmatory Factor Analysis

TL;DR: The present article provides a current overview of these areas in an effort to provide researchers with up-to-date methods and considerations in both exploratory and confirmatory factor analysis.
BookDOI

Design, Evaluation, and Analysis of Questionnaires for Survey Research: Saris/Design, Evaluation, and Analysis of Questionnaires for Survey Research

TL;DR: This book discusses the development of survey research features of requests for an answer, the effects of measurement characteristics on the quality of survey questions, and applications in social science research.
Journal ArticleDOI

Applied Psychometrics: Sample Size and Sample Power Considerations in Factor Analysis (EFA, CFA) and SEM in General

Theodoros A. Kyriazos
- 24 Aug 2018 - 
TL;DR: In this article, the issue of what sample size and sample power the researcher should have in the EFA, CFA, and SEM study is reviewed. But, the existing literature provides limited and sometimes conflicting guidance on this issue.
Journal ArticleDOI

How Should the Internal Structure of Personality Inventories Be Evaluated

TL;DR: The authors demonstrate poor CFA fit for several widely used personality measures with documented evidence of criterion-related validity but also show that some measures perform well from an exploratory factor analytic perspective.
Journal ArticleDOI

Multidimensionality and Structural Coefficient Bias in Structural Equation Modeling: A Bifactor Perspective

TL;DR: In this article, the authors consider several indices to indicate whether multidimensional data are "unidimensional enough" to fit with a unidimensional measurement model, especially when the goal is to avoid excessive bias in structural parameter estimates.
References
More filters
Journal ArticleDOI

Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives

TL;DR: In this article, the adequacy of the conventional cutoff criteria and several new alternatives for various fit indexes used to evaluate model fit in practice were examined, and the results suggest that, for the ML method, a cutoff value close to.95 for TLI, BL89, CFI, RNI, and G...
Journal ArticleDOI

Alternative Ways of Assessing Model Fit

TL;DR: In this paper, two types of error involved in fitting a model are considered, error of approximation and error of fit, where the first involves the fit of the model, and the second involves the model's shape.
Journal ArticleDOI

Comparative fit indexes in structural models

TL;DR: A new coefficient is proposed to summarize the relative reduction in the noncentrality parameters of two nested models and two estimators of the coefficient yield new normed (CFI) and nonnormed (FI) fit indexes.
Book

Structural Equations with Latent Variables

TL;DR: The General Model, Part I: Latent Variable and Measurement Models Combined, Part II: Extensions, Part III: Extensions and Part IV: Confirmatory Factor Analysis as discussed by the authors.
Journal ArticleDOI

Significance tests and goodness of fit in the analysis of covariance structures

TL;DR: In this article, a general null model based on modified independence among variables is proposed to provide an additional reference point for the statistical and scientific evaluation of covariance structure models, and the importance of supplementing statistical evaluation with incremental fit indices associated with the comparison of hierarchical models.
Related Papers (5)
Frequently Asked Questions (6)
Q1. What are the contributions in this paper?

However, in this paper the authors question the validity of approaches for model evaluation based on overall goodness of fit indices. The argument against the use of fit indices is that they do not provide an adequate indication of the “ size ” of the model ’ s misspecification. As an alternative method of model evaluation, the authors suggest using the Expected Parameter Change ( EPC ) in combination with the Modification Index ( MI ) and the power of the MI test. 

The authors do not wish to claim any higher precision than this. Further research should show if greater precision is possible and thus whether there is scope for making these cutting points more precise. 

The standard practice of concluding that a model is a good model if the fit is acceptable, or no significant MIs are found, is unjustified because non-significance may just be due to lack of power. 

researchers choose α =0.05, so the probability α of rejecting the model when the model is exactly correct (Type The authorerror) is 0.05. 

In fact, by using a fixed cut-off value for FI in the way described, the FI acts as a statistic for hypothesis testing: that is, if the critical value is exceeded, the model is rejected; if not, the model is accepted. 

If γ22 is 0.5 or larger, the probability of obtaining a value for ψ21 close to zero becomes smaller, which is an indication that there is a misspecification in the model.