scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Critical Values for Yen’s Q3: Identification of Local Dependence in the Rasch Model Using Residual Correlations:

TL;DR: The results show that LD should be considered relative to the average observed residual correlation, rather than to a uniform value, as this results in more stable percentiles for the null distribution of an adjusted fit statistic.
Abstract: The assumption of local independence is central to all item response theory (IRT) models. Violations can lead to inflated estimates of reliability and problems with construct validity. For the most...

Content maybe subject to copyright    Report

Article
Applied Psychological Measurement
2017, Vol. 41(3) 178–194
Ó The Author(s) 2016
Reprints and permissions:
sagepub.com/journalsPermissions.nav
DOI: 10.1177/0146621616677520
journals.sagepub.com/home/apm
Critical Values for Yen’s Q
3
:
Identification of Local
Dependence in the Rasch
Model Using Residual
Correlations
Karl Bang Christensen
1
, Guido Makransky
2
, and Mike Horton
3
Abstract
The assumption of local independence is central to all item response theory (IRT) models.
Violations can lead to inflated estimates of reliability and problems with construct validity. For
the most widely used fit statistic Q
3
, there are currently no well-documented suggestions of the
critical values which should be used to indicate local dependence (LD), and for this reason, a
variety of arbitrary rules of thumb are used. In this study, an empirical data example and Monte
Carlo simulation were used to investigate the different factors that can influence the null distri-
bution of residual correlations, with the objective of proposing guidelines that researchers and
practitioners can follow when making decisions about LD during scale development and valida-
tion. A parametric bootstrapping procedure should be implemented in each separate situation
to obtain the critical value of LD applicable to the data set, and provide example critical values
for a number of data structure situations. The results show that for the Q
3
fit statistic, no single
critical value is appropriate for all situations, as the percentiles in the empirical null distribution
are influenced by the number of items, the sample size, and the number of response categories.
Furthermore, the results show that LD should be considered relative to the average observed
residual correlation, rather than to a uniform value, as this results in more stable percentiles
for the null distribution of an adjusted fit statistic.
Keywords
local dependence, Rasch model, Yen’s Q
3
, residual correlations, Monte Carlo simulation
1
University of Copenhagen, Denmark
2
University of Southern Denmark, Odense, Denmark
3
University of Leeds, UK
Corresponding Author:
Karl Bang Christensen, Section of Biostatistics, Department of Public Health, University of Copenhagen, P.O. Box 2099,
Copenhagen DK-1014, Denmark.
Email: KACH@sund.ku.dk

Introduction
Statistical independence of two variables implies that knowledge about one variable does not
change the expectations about another variable. Thus, test items, X
1
, ..., X
I
, are not indepen-
dent, because a student giving a correct answer to one test item would change the expectation
of his or her probability of also giving a correct answer to another item in the same test. A fun-
damental assumption in the Rasch (1960) model and in other item response theory (IRT) mod-
els is that item responses are conditionally independent given the latent variable:
PX
1
= x
1
, ..., X
I
= x
I
juðÞ=
Y
I
i =1
PX
i
= x
i
juðÞ: ð1Þ
The items should only be correlated through the latent trait that the test is measuring (Lord &
Novick, 1968). This is generally referred to as local independence (Lazarsfeld & Henry, 1968).
The assumptions of local independence can be violated through response dependency and
multidimensionality, and these violations are often referred to under the umbrella term of ‘‘local
dependence’ (LD). Both of these situations yield interitem correlations beyond what can be
attributed to the latent variable, but for very different reasons. Response dependency occurs
when items are linked in some way, such that the response on one item governs the response on
another because of similarities in, for example, item content or response format. A typical
example is where several walking items are included in the same scale. If a person can walk
several miles without difficulty, then that person must be able to walk 1 mile, or any lesser dis-
tance, without difficulty (Tennant & Conaghan, 2007). This is a structural dependency which is
inherent within the items, because there is no other logical way in which a person may validly
respond. Another form of LD could be caused by a redundancy–dependency, where the degree
of overlap within the content of items is such that the items are not independent (i.e., where the
same question is essentially asked twice, using slightly different language or synonymous
descriptive words). Yen (1993) offered an in-depth discussion of ways that the format and pre-
sentation of items can cause LD.
Violation of the local independence assumption through multidimensionality is typically seen
for instruments composed of bundles of items that measure different aspects of the latent vari-
able or different domains of a broader latent construct. In this case, the higher order latent vari-
able alone might not account for correlation between items in the same bundle.
Violations of local independence in a unidimensional scale will influence estimation of per-
son parameters and can lead to inflated estimates of reliability and problems with construct
validity. Consequences of LD have been described in detail elsewhere (Lucke, 2005; Marais,
2009; Marais & Andrich, 2008a; Scott & Ip, 2002; Yen, 1993). Ignoring LD in a unidimensional
scale thus leads to reporting of inflated reliability giving a false impression of the accuracy and
precision of estimates (Marais, 2013). For a discussion of the effect of multidimensionality on
estimates of reliability, see Marais and Andrich (2008b).
Detecting LD
One of the earliest methods for detecting LD in the Rasch model is the fit measure Q
2
(van den
Wollenberg, 1982), which was derived from contingency tables and used the sufficiency prop-
erties of the Rasch model. Kelderman (1984) expressed the Rasch model as a log-linear model
in which LD can be shown to correspond to interactions between items. Log-linear Rasch mod-
els have also been considered by Haberman (2007) and by Kreiner and Christensen (2004,
2007), who proposed to test for LD by evaluating partial correlations using approach similar to
Christensen et al. 179

the Mantel–Haenszel analysis of differential item functioning (DIF; Holland & Thayer, 1988).
The latter approach is readily implemented in standard software such as SAS or SPSS. Notably,
Kreiner and Christensen (2007) argued that the log-linear Rasch models proposed by
Kelderman that incorporate LD still provide essentially valid and objective measurement and
described the measurement properties of such models. Furthermore, a way of quantifying LD
has been proposed by Andrich and Kreiner (2010) for two dichotomous items. It is based on
splitting a dependent item into two new ones, according to the responses to the other item
within the dependent pair. LD is then easily quantified by estimating the difference d between
the item locations of the two new items. However, Andrich and Kreiner do not go on to investi-
gate whether d is statistically significant. For the partial credit model (Masters, 1982) and the
rating scale model (Andrich, 1978), a generalized version of this methodology exists (Andrich,
Humphry, & Marais, 2012).
Beyond the Rasch model, Yen (1984) proposed the Q
3
statistic for detecting LD in the three
parameter logistics (3PL) model. This fit statistic is based on the item residuals,
d
i
= X
i
EX
i
j
^
u

, ð2Þ
and computed as the Pearson correlation (taken over examinees),
Q
3, ij
= r
d
i
d
j
, ð3Þ
where d
i
and d
j
are item residuals for items i and j, respectively. This method is often used for
the Rasch model, the partial credit model, and the rating scale model.
Chen and Thissen (1997) discussed X
2
and G
2
LD statistics that, although not more powerful
than the Q
3
, have null distributions very similar to the chi-square distribution with one degree
of freedom. Other methods for detecting LD are standardized bivariate residuals for dichoto-
mous or multinomial IRT models (Maydeu-Olivares & Liu, 2015), the use of conditional covar-
iances (Douglas, Kim, Habing, & Gao, 1998), or the use of Mantel–Haenszel type tests (Ip,
2001). Tests based on parametric models are also a possibility: Glas and Suarez-Falcon (2003)
proposed Lagrange multiplier (LM) tests based on a threshold shift model, but bifactor models
(Liu & Thissen, 2012, 2014), specification of other models that incorporate LD (Hoskens & De
Boeck, 1997; Ip, 2002), or limited information goodness-of-fit tests (Liu & Maydeu-Olivares,
2013) are also possible.
The Use of the Q
3
Fit Statistic
Yen’s Q
3
is probably the most often reported index in published Rasch analyses due to its inclu-
sion (in the form of the residual correlation matrix) in widely used software such as RUMM
(Andrich, Sheridan, & Luo, 2010). Yen (1984) argued that if the IRT model is correct, then
the distribution of the Q
3
is known, and proposed that p values could be based on the Fisher
(1915) z-transform. Chen and Thissen (1997) stated, ‘‘In using Q
3
to screen items for local
dependence, it is more common to use a uniform critical value of an absolute value of 0.2 for
the Q
3
statistic itself’ (pp. 284-285). T hey went on to present results showing that, although
the sampling distribution under the Rasch model is bell shaped, it is not well approximated
by the standard normal distribution, especially in the tails (Chen & Thissen, 1997, Figure 3).
In practical applications of the Q
3
test statistic researchers will often compute the complete
correlation matrix of residuals and look at the maximum value:
Q
3, max
= max
i . j
Q
3, ij
: ð4Þ
180 Applied Psychological Measurement 41(3)

Critical Values of Residual Correlations
When investigating LD based on Yen’s Q
3
, residuals for any pair of items should be uncorre-
lated, and generally close to zero. Residual correlations that are high indicate a violation of the
local independence assumption, and this suggests that the pair of items have something more in
common than the rest of the item set have in common with each other (Marais, 2013).
As noted by Yen (1984), a negative bias is built into Q
3
. This problem is due to the fact that
measures of association will be biased away from zero even though the assumption of local
independence applies, due to the conditioning on a proxy variable instead of the latent variable
(Rosenbaum, 1984). A second problem is that the way the residuals are computed induces a bias
(Kreiner & Christensen, 2011). Marais (2013) recognized that the sampling properties among
residuals are unknown; therefore, these statistics cannot be used for formal tests of LD. A third,
and perhaps the most important, problem in applications is that there are currently no well-
documented suggestions of the critical values which should be used to indicate LD, and for this
reason, arbitrary rules of thumb are used when evaluating whether an observed correlation is
such that it can be reasonably supposed to have arisen from random sampling.
Standards often reported in the literature include looking at fit residuals over the critical
value of 0.2, as proposed by Chen and Thissen (1997). For examples of this, see Reeve et al.
(2007); Hissbach, Klusmann, and Hampe (2011); Makransky and Bilenberg (2014); and
Makransky, Rogers, and Creed (2014). However, other critical values are also used, and there
seems to be a wide variation in what is seen as indicative of dependence. Marais and Andrich
(2008a) investigated dependence at a critical value of 0.1, but a value of 0.3 has also often been
used (see, for example, das Nair, Moreton, & Lincoln, 2011; La Porta et al., 2011; Ramp,
Khan, Misajon, & Pallant, 2009; Røe, Damsga
˚
rd, Fors, & Anke, 2014), and critical values of
0.5 (Davidson, Keating, & Eyres, 2004; Ten Klooster, Taal, & van de Laar, 2008) and even 0.7
(Gonza
´
lez-de Paz et al., 2015) can be found in use.
There are two fundamental problems with this use of standard critical values: (a) there is lim-
ited evidence of their validity and often no reference of where values come from, and (b) they
are not sensitive to specific characteristics of the data.
Marais (2013) not only identified that the residual correlations are difficult to directly inter-
pret confidently when there are fewer than 20 items in the item set but also stated that the corre-
lations should always be considered relative to the overall set of correlations. This is because of
the magnitude of a residual correlation value, which indicates LD will vary depending on the
number of items in a data set. Instead of an absolute critical value, Marais (2013) suggested that
residual correlation values should be compared with the average item residual correlation of the
complete data set to give a truer picture of the LD within a data set. It was concluded that when
diagnosing response dependence, item residual correlations should be considered relative to
each other and in light of the number of items, although there is no indication of a relative criti-
cal value (above the average residual correlation) that could indicate LD.
Thus, under the null hypothesis, the average correlation of residuals is negative (cf. Marais,
2013) and, ideally, observed correlations between residuals in a data set should be evaluated
with reference to this average value. Marais proposes to evaluate them with reference to the
average of the observed correlations rather than the average under the null hypothesis. Thus,
following Marais, the average value of the observed correlations could be considered:
Q
3
=
I
2

1
X
i . j
Q
3, ij
, ð5Þ
where
I
2

is the number of item pairs and defines the test statistic:
Christensen et al. 181

Q
3,
= Q
3, max
Q
3
, ð6Þ
that compares the largest observed correlation with average of the observed correlations.
The problem with the currently used critical values is that they are neither theoretically nor
empirically based. Researchers and practitioners faced with making scale validation, and devel-
opment decisions need to know what level of LD could be expected, given the properties of their
items and data.
A possible solution would be to use a parametric bootstrap approach and simulate the resi-
dual correlation matrix several times under the assumption of fit to the Rasch model. This
would provide information about the level of residual correlation that could be expected for the
particular case, given that the Rasch model fits. To the authors’ knowledge, there is no existing
research that describes how important characteristics such as the number of items, number of
response categories, number of respondents, the distribution of items and persons, and the tar-
geting of the items affect residual correlations expected, given fit to the Rasch model. In the
current study, the possibility of identifying critical values of LD is investigated by examining
the distribution of Q
3
under the null hypothesis, where the data fit the model. This is done using
an empirical example along with a simulation study.
Given the existence of the wide range of fit statistics with known sampling distributions out-
lined above, it is surprising that Rasch model applications abound with reporting of Q
3
using
arbitrary cut-points without theoretical or empirical justification. The reason for this is that the
theoretically sound LD indices are not included in the software packages used by practitioners.
For this reason, this article presents extensive simulation studies that will (a) illustrate that Q
3
should be interpreted with caution and (b) allow researchers to know what level of LD could be
expected, given properties of their items and data. Furthermore, these simulation studies will be
used to study whether the maximum correlation, or the difference between the maximum corre-
lation and the average correlation, as suggested by Marais (2013), is the most informative.
Thus, the objectives of this article are (a) to provide an overview of the influence of different
factors upon the null distribution of residual correlations and (b) to propose guidelines that
researchers and practitioners can follow when making decisions about LD during scale devel-
opment and validation. Two different situations are addressed: first, the situation where the test
statistic is computed for all item pairs and only the strongest evidence (the largest correlation)
is considered, and second, the less common case where only a single a priori defined item pair
is considered.
Simulation Study
Methods
The simulated data sets used are as follows: (a) I dichotomous items simulated from
PX
i
= xjuðÞ=
exp x u b
i
ðÞðÞ
1 + exp u b
i
ðÞ
i =1, ..., IðÞ, ð7Þ
with evenly spaced item difficulties b
i
ranging from 22to2,
b
i
=2
i 1
I 1

i =1, ..., IðÞ, ð8Þ
or (b) I polytomous items simulated from
182 Applied Psychological Measurement 41(3)

Citations
More filters
Journal Article
Wendy M. Yen1
TL;DR: Unidimensional item response theory (IRT) has been widely used in the analysis and equating of edu cational achievement tests as discussed by the authors, and if an IRT model is true, item responses must be locally independent.
Abstract: Unidimensional item response theory (IRT) has be come widely used in the analysis and equating of edu cational achievement tests. If an IRT model is true, item responses must be locally independent...

80 citations

Journal ArticleDOI
TL;DR: The findings obtained confirmed that suitability of the Polish IGDS9-SF to assess IGD amongst Polish gamers given the adequate levels of validity and reliability found, and suggest that some of the diagnostic criteria may present with a different clinical weighting towards final diagnosis of IGD.

71 citations


Cites background from "Critical Values for Yen’s Q3: Ident..."

  • ...One residual correlation between items 3 and 9 was found to violate this assumption (0.23), indicating that these two items are somewhat dependent even after accounting for latent traits (Christensen et al., 2017; Yen, 1984)....

    [...]

Journal ArticleDOI
TL;DR: The eHLA consists of 7 short, robust scales that assess individual’s knowledge and skills related to digital literacy and health literacy that assesses individuals’ health literacy and digital literacy using a mix of existing and newly developed scales.
Abstract: Background: To achieve full potential in user-oriented eHealth projects, we need to ensure a match between the eHealth technology and the user’s eHealth literacy, described as knowledge and skills. However, there is a lack of multifaceted eHealth literacy assessment tools suitable for screening purposes. Objective: The objective of our study was to develop and validate an eHealth literacy assessment toolkit (eHLA) that assesses individuals’ health literacy and digital literacy using a mix of existing and newly developed scales. Methods: From 2011 to 2015, scales were continuously tested and developed in an iterative process, which led to 7 tools being included in the validation study. The eHLA validation version consisted of 4 health-related tools (tool 1: “functional health literacy,” tool 2: “health literacy self-assessment,” tool 3: “familiarity with health and health care,” and tool 4: “knowledge of health and disease”) and 3 digitally-related tools (tool 5: “technology familiarity,” tool 6: “technology confidence,” and tool 7: “incentives for engaging with technology”) that were tested in 475 respondents from a general population sample and an outpatient clinic. Statistical analyses examined floor and ceiling effects, interitem correlations, item-total correlations, and Cronbach coefficient alpha (CCA). Rasch models (RM) examined the fit of data. Tools were reduced in items to secure robust tools fit for screening purposes. Reductions were made based on psychometrics, face validity, and content validity. Results: Tool 1 was not reduced in items; it consequently consists of 10 items. The overall fit to the RM was acceptable (Anderson conditional likelihood ratio, CLR=10.8; df=9; P=.29), and CCA was .67. Tool 2 was reduced from 20 to 9 items. The overall fit to a log-linear RM was acceptable (Anderson CLR=78.4, df=45, P=.002), and CCA was .85. Tool 3 was reduced from 23 to 5 items. The final version showed excellent fit to a log-linear RM (Anderson CLR=47.7, df=40, P=.19), and CCA was .90. Tool 4 was reduced from 12 to 6 items. The fit to a log-linear RM was acceptable (Anderson CLR=42.1, df=18, P=.001), and CCA was .59. Tool 5 was reduced from 20 to 6 items. The fit to the RM was acceptable (Anderson CLR=30.3, df=17, P=.02), and CCA was .94. Tool 6 was reduced from 5 to 4 items. The fit to a log-linear RM taking local dependency (LD) into account was acceptable (Anderson CLR=26.1, df=21, P=.20), and CCA was .91. Tool 7 was reduced from 6 to 4 items. The fit to a log-linear RM taking LD and differential item functioning into account was acceptable (Anderson CLR=23.0, df=29, P=.78), and CCA was .90. Conclusions: The eHLA consists of 7 short, robust scales that assess individual’s knowledge and skills related to digital literacy and health literacy.

69 citations


Cites methods from "Critical Values for Yen’s Q3: Ident..."

  • ...We evaluated the overall fit of a subscale to the RM using the Anderson conditional likelihood ratio (CLR) test [30], tested the fit of individual items using comparison of observed and expected item-rest score correlation [31], and evaluated DIF and local dependency (LD; [32]) using the Q3 index [33] log-linear RM tests [34]....

    [...]

Journal ArticleDOI
TL;DR: This paper reviewed and coded 215 papers using Rasch measurement published in the past decade and found that over the past decades, the application of Rassch measurement in language assessment has gradually increased.
Abstract: Over the past decades, the application of Rasch measurement in language assessment has gradually increased. In the present study, we reviewed and coded 215 papers using Rasch measurement published ...

56 citations

Journal ArticleDOI
TL;DR: A reliable and validated measure of HS‐specific HRQOL in clinical studies is needed to assess the impact of the condition on health‐related quality of life in patients with HS.
Abstract: Background Hidradenitis suppurativa (HS) is a chronic, inflammatory condition that can have a large negative impact on health‐related quality of life (HRQOL). A reliable and validated measure of HS‐specific HRQOL in clinical studies is needed. Objective To develop and validate the Hidradenitis Suppurtiva Quality Of Life (HiSQOL©) scale, for clinical trial measurement of HS‐specific HRQOL. Methods Stage 1: Qualitative concept elicitation (CE) interviews were conducted with HS patients in Denmark (DK) (n = 21) and the United States (US) (n=21). Stage 2: Cognitive debriefing (CD) interviews were performed with US HS patients (n = 30) and Danish HS patients (n=30). Stage 3: Observational study of 222 HS patients in the US was conducted for item reduction, measure validation and assessment of psychometric properties. Stage 4: Observational study of 215 HS patients in Denmark was conducted to confirm the psychometric structure derived in stage 3. In both studies ‐ the Dermatology Life Quality Index, Hospital Anxiety and Depression Scale, and numerical rating scale for pain ‐ were also included. Results In CE, 99 items were generated and reduced to 41 after removing duplicates. In CD, 2 items were added and 1 items removed. A 42‐item instrument was psychometrically assessed. Based on psychometric analyses and patient input, the instrument was reduced to 17 items that had strong psychometric properties in both US and DK samples.

47 citations

References
More filters
Book
01 Mar 1981

7,518 citations


"Critical Values for Yen’s Q3: Ident..." refers methods in this paper

  • ...A fundamental assumption in the Rasch (1960) model and in other item response theory (IRT) models is that item responses are conditionally independent given the latent variable:...

    [...]

  • ...A fundamental assumption in the Rasch (1960) model and in other IRT models is that item responses are conditionally independent given the latent variable 鶏岫隙怠 噺 捲怠┸ ┼ ┸ 隙彫 噺 捲彫】肯岻 噺 テ 鶏岫隙沈 噺 捲沈】肯岻彫沈退怠 ....

    [...]

Book
01 Jan 1968
TL;DR: In this paper, the authors present a survey of test theory models and their application in the field of mental test analysis. But the focus of the survey is on test-score theories and models, and not the practical applications and limitations of each model studied.
Abstract: This is a reprint of the orginal book released in 1968. Our primary goal in this book is to sharpen the skill, sophistication, and in- tuition of the reader in the interpretation of mental test data, and in the construction and use of mental tests both as instruments of psychological theory and as tools in the practical problems of selection, evaluation, and guidance. We seek to do this by exposing the reader to some psychologically meaningful statistical theories of mental test scores. Although this book is organized in terms of test-score theories and models, the practical applications and limitations of each model studied receive substantial emphasis, and these discussions are presented in as nontechnical a manner as we have found possible. Since this book catalogues a host of test theory models and formulas, it may serve as a reference handbook. Also, for a limited group of specialists, this book aims to provide a more rigorous foundation for further theoretical research than has heretofore been available.One aim of this book is to present statements of the assumptions, together with derivations of the implications, of a selected group of statistical models that the authors believe to be useful as guides in the practices of test construction and utilization. With few exceptions we have given a complete proof for each major result presented in the book. In many cases these proofs are simpler, more complete, and more illuminating than those originally offered. When we have omitted proofs or parts of proofs, we have generally provided a reference containing the omitted argument. We have left some proofs as exercises for the reader, but only when the general method of proof has already been demonstrated. At times we have proved only special cases of more generally stated theorems, when the general proof affords no additional insight into the problem and yet is substantially more complex mathematically.

6,814 citations

Journal ArticleDOI
TL;DR: In this paper, an unidimensional latent trait model for responses scored in two or more ordered categories is developed, which can be viewed as an extension of Andrich's Rating Scale model to situations in which ordered response alternatives are free to vary in number and structure from item to item.
Abstract: A unidimensional latent trait model for responses scored in two or more ordered categories is developed. This “Partial Credit” model is a member of the family of latent trait models which share the property of parameter separability and so permit “specifically objective” comparisons of persons and items. The model can be viewed as an extension of Andrich's Rating Scale model to situations in which ordered response alternatives are free to vary in number and structure from item to item. The difference between the parameters in this model and the “category boundaries” in Samejima's Graded Response model is demonstrated. An unconditional maximum likelihood procedure for estimating the model parameters is developed.

3,368 citations


"Critical Values for Yen’s Q3: Ident..." refers methods in this paper

  • ...For the partial credit model (Masters, 1982) and the rating scale model (Andrich, 1978) a generalized version this methodology exists (Andrich, Humphry and Marais, 2012) Beyond the Rasch model, Yen (1984) proposed the Q 3 statistic for detecting LD in the 3PL model....

    [...]

  • ...For the partial credit model (Masters, 1982) and the rating scale model (Andrich, 1978) a generalized version this methodology exists (Andrich, Humphry and Marais, 2012) Beyond the Rasch model, Yen (1984) proposed the Q3 statistic for detecting LD in the 3PL model....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a rating response mechanism for ordered categories, which is related to the traditional threshold formulation but distinctively different from it, is formulated, in which subject and item parameters are derived in terms of thresholds on a latent continuum and discriminations at the thresholds.
Abstract: A rating response mechanism for ordered categories, which is related to the traditional threshold formulation but distinctively different from it, is formulated. In addition to the subject and item parameters two other sets of parameters, which can be interpreted in terms of thresholds on a latent continuum and discriminations at the thresholds, are obtained. These parameters are identified with the category coefficients and the scoring function of the Rasch model for polychotomous responses in which the latent trait is assumed uni-dimensional. In the case where the threshold discriminations are equal, the scoring of successive categories by the familiar assignment of successive integers is justified. In the case where distances between thresholds are also equal, a simple pattern of category coefficients is shown to follow.

2,709 citations


"Critical Values for Yen’s Q3: Ident..." refers methods in this paper

  • ...For the partial credit model (Masters, 1982) and the rating scale model (Andrich, 1978) a generalized version this methodology exists (Andrich, Humphry and Marais, 2012) Beyond the Rasch model, Yen (1984) proposed the Q 3 statistic for detecting LD in the 3PL model....

    [...]

  • ...For the partial credit model (Masters, 1982) and the rating scale model (Andrich, 1978) a generalized version this methodology exists (Andrich, Humphry and Marais, 2012) Beyond the Rasch model, Yen (1984) proposed the Q3 statistic for detecting LD in the 3PL model....

    [...]