scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 2019"


Journal ArticleDOI
TL;DR: In this paper, the authors give the sufficient and necessary condition for identifiability of the basic DINA model, which not only addresses the open problem in Xu and Zhang (Psychometrika 81:625-649, 2016), but also sheds light on the study of more general CDMs, which often cover DINA as a submodel.
Abstract: Cognitive diagnosis models (CDMs) are useful statistical tools in cognitive diagnosis assessment. However, as many other latent variable models, the CDMs often suffer from the non-identifiability issue. This work gives the sufficient and necessary condition for identifiability of the basic DINA model, which not only addresses the open problem in Xu and Zhang (Psychometrika 81:625–649, 2016) on the minimal requirement for identifiability, but also sheds light on the study of more general CDMs, which often cover DINA as a submodel. Moreover, we show the identifiability condition ensures the consistent estimation of the model parameters. From a practical perspective, the identifiability condition only depends on the Q-matrix structure and is easy to verify, which would provide a guideline for designing statistically valid and estimable cognitive diagnosis tests.

50 citations


Journal ArticleDOI
TL;DR: A notion of statistical consistency is established for a constrained JML estimator, under an asymptotic setting that both the numbers of items and people grow to infinity and that many responses may be missing.
Abstract: Joint maximum likelihood (JML) estimation is one of the earliest approaches to fitting item response theory (IRT) models. This procedure treats both the item and person parameters as unknown but fixed model parameters and estimates them simultaneously by solving an optimization problem. However, the JML estimator is known to be asymptotically inconsistent for many IRT models, when the sample size goes to infinity and the number of items keeps fixed. Consequently, in the psychometrics literature, this estimator is less preferred to the marginal maximum likelihood (MML) estimator. In this paper, we re-investigate the JML estimator for high-dimensional exploratory item factor analysis, from both statistical and computational perspectives. In particular, we establish a notion of statistical consistency for a constrained JML estimator, under an asymptotic setting that both the numbers of items and people grow to infinity and that many responses may be missing. A parallel computing algorithm is proposed for this estimator that can scale to very large datasets. Via simulation studies, we show that when the dimensionality is high, the proposed estimator yields similar or even better results than those from the MML estimator, but can be obtained computationally much more efficiently. An illustrative real data example is provided based on the revised version of Eysenck’s Personality Questionnaire (EPQ-R).

44 citations


Journal ArticleDOI
TL;DR: Bayesian estimation of Q is considered using a prior based upon expert knowledge using a fully Bayesian formulation for a general diagnostic model and can be used to validate which of the underlying attributes are predicted by experts and to identify residual attributes that remain unexplained by expert knowledge.
Abstract: Cognitive diagnosis models (CDMs) are an important psychometric framework for classifying students in terms of attribute and/or skill mastery. The $$\varvec{Q}$$ matrix, which specifies the required attributes for each item, is central to implementing CDMs. The general unavailability of $$\varvec{Q}$$ for most content areas and datasets poses a barrier to widespread applications of CDMs, and recent research accordingly developed fully exploratory methods to estimate Q. However, current methods do not always offer clear interpretations of the uncovered skills and existing exploratory methods do not use expert knowledge to estimate Q. We consider Bayesian estimation of $$\varvec{Q}$$ using a prior based upon expert knowledge using a fully Bayesian formulation for a general diagnostic model. The developed method can be used to validate which of the underlying attributes are predicted by experts and to identify residual attributes that remain unexplained by expert knowledge. We report Monte Carlo evidence about the accuracy of selecting active expert-predictors and present an application using Tatsuoka’s fraction-subtraction dataset.

36 citations


Journal ArticleDOI
TL;DR: This case study shows how Warp-III bridge sampling can be used to compute the marginal likelihood for hierarchical MPTs, and illustrates the procedure with two published data sets and demonstrates how warp-III facilitates Bayesian model averaging.
Abstract: Multinomial processing trees (MPTs) are a popular class of cognitive models for categorical data. Typically, researchers compare several MPTs, each equipped with many parameters, especially when the models are implemented in a hierarchical framework. A Bayesian solution is to compute posterior model probabilities and Bayes factors. Both quantities, however, rely on the marginal likelihood, a high-dimensional integral that cannot be evaluated analytically. In this case study, we show how Warp-III bridge sampling can be used to compute the marginal likelihood for hierarchical MPTs. We illustrate the procedure with two published data sets and demonstrate how Warp-III facilitates Bayesian model averaging.

28 citations


Journal ArticleDOI
TL;DR: Fundamental results for statistical analysis based on diagnostic classification models are established, notably item response probabilities, attribute distribution, and Q-matrix-induced partial information structure, and identifiability results for various modeling parameters are established.
Abstract: This paper establishes fundamental results for statistical analysis based on diagnostic classification models (DCMs). The results are developed at a high level of generality and are applicable to essentially all diagnostic classification models. In particular, we establish identifiability results for various modeling parameters, notably item response probabilities, attribute distribution, and Q-matrix-induced partial information structure. These results are stated under a general setting of latent class models. Through a nonparametric Bayes approach, we construct an estimator that can be shown to be consistent when the identifiability conditions are satisfied. Simulation results show that these estimators perform well under various model settings. We also apply the proposed method to a dataset from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC).

28 citations


Journal ArticleDOI
TL;DR: An exploratory DM for ordinal data is presented, which uses a cumulative probit link along with Bayesian variable selection techniques to uncover the latent structure underlying the teacher and parent ratings.
Abstract: Diagnostic models (DMs) provide researchers and practitioners with tools to classify respondents into substantively relevant classes. DMs are widely applied to binary response data; however, binary response models are not applicable to the wealth of ordinal data collected by educational, psychological, and behavioral researchers. Prior research developed confirmatory ordinal DMs that require expert knowledge to specify the underlying structure. This paper introduces an exploratory DM for ordinal data. In particular, we present an exploratory ordinal DM, which uses a cumulative probit link along with Bayesian variable selection techniques to uncover the latent structure. Furthermore, we discuss new identifiability conditions for structured multinomial mixture models with binary attributes. We provide evidence of accurate parameter recovery in a Monte Carlo simulation study across moderate to large sample sizes. We apply the model to twelve items from the public-use, Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999 approaches to learning and self-description questionnaire and report evidence to support a three-attribute solution with eight classes to describe the latent structure underlying the teacher and parent ratings. In short, the developed methodology contributes to the development of ordinal DMs and broadens their applicability to address theoretical and substantive issues more generally across the social sciences.

26 citations


Journal ArticleDOI
TL;DR: It is shown that the SA model can recover parameters in the presence of missing values due to time limits and that the response time model, using item-level timing information rather than a count of not-reached items, results in person parameter estimates that differ from missing data IRT models applied to not- reached items.
Abstract: Missing values at the end of a test typically are the result of test takers running out of time and can as such be understood by studying test takers' working speed. As testing moves to computer-based assessment, response times become available allowing to simulatenously model speed and ability. Integrating research on response time modeling with research on modeling missing responses, we propose using response times to model missing values due to time limits. We identify similarities between approaches used to account for not-reached items (Rose et al. in ETS Res Rep Ser 2010:i-53, 2010) and the speed-accuracy (SA) model for joint modeling of effective speed and effective ability as proposed by van der Linden (Psychometrika 72(3):287-308, 2007). In a simulation, we show (a) that the SA model can recover parameters in the presence of missing values due to time limits and (b) that the response time model, using item-level timing information rather than a count of not-reached items, results in person parameter estimates that differ from missing data IRT models applied to not-reached items. We propose using the SA model to model the missing data process and to use both, ability and speed, to describe the performance of test takers. We illustrate the application of the model in an empirical analysis.

24 citations


Journal ArticleDOI
TL;DR: A new type of analytical approach for item response data that does not require standard local independence assumptions is proposed, by adapting a latent space joint modeling approach that can estimate pairwise distances to represent the item and person dependence structures.
Abstract: Item response theory (IRT) is one of the most widely utilized tools for item response analysis; however, local item and person independence, which is a critical assumption for IRT, is often violated in real testing situations. In this article, we propose a new type of analytical approach for item response data that does not require standard local independence assumptions. By adapting a latent space joint modeling approach, our proposed model can estimate pairwise distances to represent the item and person dependence structures, from which item and person clusters in latent spaces can be identified. We provide an empirical data analysis to illustrate an application of the proposed method. A simulation study is provided to evaluate the performance of the proposed method in comparison with existing methods.

22 citations


Journal ArticleDOI
TL;DR: In this article, the authors compare conditional and marginal deviation information criterion (DICs) and Watanabe-Akaike Information Criteria (WAICs), and show that marginal DIC corresponds to leave-one-cluster out cross-validation, whereas conditional WAIC correspond to leave one unit out.
Abstract: Typical Bayesian methods for models with latent variables (or random effects) involve directly sampling the latent variables along with the model parameters. In high-level software code for model definitions (using, e.g., BUGS, JAGS, Stan), the likelihood is therefore specified as conditional on the latent variables. This can lead researchers to perform model comparisons via conditional likelihoods, where the latent variables are considered model parameters. In other settings, however, typical model comparisons involve marginal likelihoods where the latent variables are integrated out. This distinction is often overlooked despite the fact that it can have a large impact on the comparisons of interest. In this paper, we clarify and illustrate these issues, focusing on the comparison of conditional and marginal Deviance Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in psychometric modeling. The conditional/marginal distinction corresponds to whether the model should be predictive for the clusters that are in the data or for new clusters (where "clusters" typically correspond to higher-level units like people or schools). Correspondingly, we show that marginal WAIC corresponds to leave-one-cluster out cross-validation, whereas conditional WAIC corresponds to leave-one-unit out. These results lead to recommendations on the general application of the criteria to models with latent variables.

22 citations


Journal ArticleDOI
TL;DR: In computerized adaptive testing (CAT), a variable-length stopping rule refers to ending item administration after a pre-specified measurement precision standard has been satisfied as mentioned in this paper, which is useful when an item bank is exhaustive of good items.
Abstract: In computerized adaptive testing (CAT), a variable-length stopping rule refers to ending item administration after a pre-specified measurement precision standard has been satisfied. The goal is to provide equal measurement precision for all examinees regardless of their true latent trait level. Several stopping rules have been proposed in unidimensional CAT, such as the minimum information rule or the maximum standard error rule. These rules have also been extended to multidimensional CAT and cognitive diagnostic CAT, and they all share the same idea of monitoring measurement error. Recently, Babcock and Weiss (J Comput Adapt Test 2012. https://doi.org/10.7333/1212-0101001) proposed an “absolute change in theta” (CT) rule, which is useful when an item bank is exhaustive of good items for one or more ranges of the trait continuum. Choi, Grady and Dodd (Educ Psychol Meas 70:1–17, 2010) also argued that a CAT should stop when the standard error does not change, implying that the item bank is likely exhausted. Although these stopping rules have been evaluated and compared in different simulation studies, the relationships among the various rules remain unclear, and therefore there lacks a clear guideline regarding when to use which rule. This paper presents analytic results to show the connections among various stopping rules within both unidimensional and multidimensional CAT. In particular, it is argued that the CT-rule alone can be unstable and it can end the test prematurely. However, the CT-rule can be a useful secondary rule to monitor the point of diminished returns. To further provide empirical evidence, three simulation studies are reported using both the 2PL model and the multidimensional graded response model.

16 citations


Journal ArticleDOI
TL;DR: This paper proposes a Bayesian estimation method by adopting a new data-augmentation strategy in uni- and multidimensional IRT models based on the Pólya–Gamma family of distributions which provides a closed-form posterior distribution for logistic-based models.
Abstract: Fully Bayesian estimation of item response theory models with logistic link functions suffers from low computational efficiency due to posterior density functions that do not have known forms. To improve algorithmic computational efficiency, this paper proposes a Bayesian estimation method by adopting a new data-augmentation strategy in uni- and multidimensional IRT models. The strategy is based on the Polya-Gamma family of distributions which provides a closed-form posterior distribution for logistic-based models. In this paper, an overview of Polya-Gamma distributions is described within a logistic regression framework. In addition, we provide details about deriving conditional distributions of IRT, incorporating Polya-Gamma distributions into the conditional distributions for Bayesian samplers' construction, and random drawing from the samplers such that a faster convergence can be achieved. Simulation studies and applications to real datasets were conducted to demonstrate the efficiency and utility of the proposed method.

Journal ArticleDOI
TL;DR: It is proven that parceling cannot reduce factor indeterminacy, and composites of observed variables as indicators for a common factor-strengthens loadings, but reduces the number of indicators.
Abstract: Parceling—using composites of observed variables as indicators for a common factor—strengthens loadings, but reduces the number of indicators. Factor indeterminacy is reduced when there are many observed variables per factor, and when loadings and factor correlations are strong. It is proven that parceling cannot reduce factor indeterminacy. In special cases where the ratio of loading to residual variance is the same for all items included in each parcel, factor indeterminacy is unaffected by parceling. Otherwise, parceling worsens factor indeterminacy. While factor indeterminacy does not affect the parameter estimates, standard errors, or fit indices associated with a factor model, it does create uncertainty, which endangers valid inference.

Journal ArticleDOI
TL;DR: It is demonstrated that the filtered monotonic polynomial (FMP) item response model can be used to specify item response models on metrics other than the $$\theta $$θ metric.
Abstract: The $$\theta $$ metric in item response theory is often not the most useful metric for score reporting or interpretation. In this paper, I demonstrate that the filtered monotonic polynomial (FMP) item response model, a recently proposed nonparametric item response model (Liang & Browne in J Educ Behav Stat 40:5–34, 2015), can be used to specify item response models on metrics other than the $$\theta $$ metric. Specifically, I demonstrate that any item response function (IRF) defined within the FMP framework can be re-expressed as another FMP IRF by taking monotonic transformations of the latent trait. I derive the item parameter transformations that correspond to both linear and nonlinear transformations of the latent trait metric. These item parameter transformations can be used to define an item response model based on any monotonic transformation of the $$\theta $$ metric, so long as the metric transformation is approximated by a monotonic polynomial. I demonstrate this result by defining an item response model directly on the approximate true score metric and discuss the implications of metric transformations for applied testing situations.

Journal ArticleDOI
TL;DR: This work proposes a new perspective for studying the robustness toward distributional misspecification in ordinal models using a class of non-normal ordinal covariance models, and shows how to simulate data from such models, indicating that standard methodology is sensitive to violation of normality.
Abstract: A standard approach for handling ordinal data in covariance analysis such as structural equation modeling is to assume that the data were produced by discretizing a multivariate normal vector. Recently, concern has been raised that this approach may be less robust to violation of the normality assumption than previously reported. We propose a new perspective for studying the robustness toward distributional misspecification in ordinal models using a class of non-normal ordinal covariance models. We show how to simulate data from such models, and our simulation results indicate that standard methodology is sensitive to violation of normality. This emphasizes the importance of testing distributional assumptions in empirical studies. We include simulation results on the performance of such tests.

Journal ArticleDOI
TL;DR: A modified approach, penalized best linear prediction, is proposed that weights both mean square error of prediction and a quadratic measure of subgroup biases and is applied to three high-stakes writing assessments.
Abstract: In best linear prediction (BLP), a true test score is predicted by observed item scores and by ancillary test data If the use of BLP rather than a more direct estimate of a true score has disparate impact for different demographic groups, then a fairness issue arises To improve population invariance but to preserve much of the efficiency of BLP, a modified approach, penalized best linear prediction, is proposed that weights both mean square error of prediction and a quadratic measure of subgroup biases The proposed methodology is applied to three high-stakes writing assessments

Journal ArticleDOI
TL;DR: A two-stage least squares (2SLS) estimator is used to jointly assess measurement invariance and prediction invariance in high-stakes testing and finds evidence that group differences in SAT-M measurement intercepts may partly explain the well-known finding of observed differences in prediction intercepts.
Abstract: The existence of differences in prediction systems involving test scores across demographic groups continues to be a thorny and unresolved scientific, professional, and societal concern. Our case study uses a two-stage least squares (2SLS) estimator to jointly assess measurement invariance and prediction invariance in high-stakes testing. So, we examined differences across groups based on latent as opposed to observed scores with data for 176 colleges and universities from The College Board. Results showed that evidence regarding measurement invariance was rejected for the SAT mathematics (SAT-M) subtest at the 0.01 level for 74.5% and 29.9% of cohorts for Black versus White and Hispanic versus White comparisons, respectively. Also, on average, Black students with the same standing on a common factor had observed SAT-M scores that were nearly a third of a standard deviation lower than for comparable Whites. We also found evidence that group differences in SAT-M measurement intercepts may partly explain the well-known finding of observed differences in prediction intercepts. Additionally, results provided evidence that nearly a quarter of the statistically significant observed intercept differences were not statistically significant at the 0.05 level once predictor measurement error was accounted for using the 2SLS procedure. Our joint measurement and prediction invariance approach based on latent scores opens the door to a new high-stakes testing research agenda whose goal is to not simply assess whether observed group-based differences exist and the size and direction of such differences. Rather, the goal of this research agenda is to assess the causal chain starting with underlying theoretical mechanisms (e.g., contextual factors, differences in latent predictor scores) that affect the size and direction of any observed differences.

Journal ArticleDOI
TL;DR: This work provides full statistical theory for RR of IRT models under the framework of pseudo-maximum likelihood estimation, and describes the standard error calculation for the focal parameters, the assessment of overall goodness-of-fit (GOF) of the model, and the identification of misfitting items.
Abstract: In item response theory (IRT), it is often necessary to perform restricted recalibration (RR) of the model: A set of (focal) parameters is estimated holding a set of (nuisance) parameters fixed. Typical applications of RR include expanding an existing item bank, linking multiple test forms, and associating constructs measured by separately calibrated tests. In the current work, we provide full statistical theory for RR of IRT models under the framework of pseudo-maximum likelihood estimation. We describe the standard error calculation for the focal parameters, the assessment of overall goodness-of-fit (GOF) of the model, and the identification of misfitting items. We report a simulation study to evaluate the performance of these methods in the scenario of adding a new item to an existing test. Parameter recovery for the focal parameters as well as Type I error and power of the proposed tests are examined. An empirical example is also included, in which we validate the pediatric fatigue short-form scale in the Patient-Reported Outcome Measurement Information System (PROMIS), compute global and local GOF statistics, and update parameters for the misfitting items.

Journal ArticleDOI
TL;DR: The theoretical arguments in favor of optimal scoring are supplemented with the results from simulation experiments, and the analysis of test data suggests that sum-scored tests would need to be longer than an optimally scored test in order to attain the same level of accuracy.
Abstract: The aim of this paper is to discuss nonparametric item response theory scores in terms of optimal scores as an alternative to parametric item response theory scores and sum scores. Optimal scores take advantage of the interaction between performance and item impact that is evident in most testing data. The theoretical arguments in favor of optimal scoring are supplemented with the results from simulation experiments, and the analysis of test data suggests that sum-scored tests would need to be longer than an optimally scored test in order to attain the same level of accuracy. Because optimal scoring is built on a nonparametric procedure, it also offers a flexible alternative for estimating item characteristic curves that can fit items that do not show good fit to item response theory models.

Journal ArticleDOI
TL;DR: It is shown that discretized data stemming from the VM method with a prescribed target covariance matrix are usually numerically equal to data stemmingFrom discretizing a multivariate normal vector, which has a different covariance structure than the target.
Abstract: Previous influential simulation studies investigate the effect of underlying non-normality in ordinal data using the Vale–Maurelli (VM) simulation method. We show that discretized data stemming from the VM method with a prescribed target covariance matrix are usually numerically equal to data stemming from discretizing a multivariate normal vector. This normal vector has, however, a different covariance matrix than the target. It follows that these simulation studies have in fact studied data stemming from normal data with a possibly misspecified covariance structure. This observation affects the interpretation of previous simulation studies.

Journal ArticleDOI
TL;DR: A model averaging technique within the frequentist statistical framework that instead of selecting an optimal model, the contributions of all candidate models are acknowledged and is an interesting compromise between model selection and the full model.
Abstract: Model selection from a set of candidate models plays an important role in many structural equation modelling applications. However, traditional model selection methods introduce extra randomness that is not accounted for by post-model selection inference. In the current study, we propose a model averaging technique within the frequentist statistical framework. Instead of selecting an optimal model, the contributions of all candidate models are acknowledged. Valid confidence intervals and a [Formula: see text] test statistic are proposed. A simulation study shows that the proposed method is able to produce a robust mean-squared error, a better coverage probability, and a better goodness-of-fit test compared to model selection. It is an interesting compromise between model selection and the full model.

Journal ArticleDOI
TL;DR: A class of conjugate priors is proposed for the random-effect variance parameters in the BCSM framework, which gives support to testing the presence of random effects, reduce boundary effects by allowing non-positive (co)variance parameters, and support accurate estimation even for very small true variance parameters.
Abstract: A multivariate generalization of the log-normal model for response times is proposed within an innovative Bayesian modeling framework. A novel Bayesian Covariance Structure Model (BCSM) is proposed, where the inclusion of random-effect variables is avoided, while their implied dependencies are modeled directly through an additive covariance structure. This makes it possible to jointly model complex dependencies due to for instance the test format (e.g., testlets, complex constructs), time limits, or features of digitally based assessments. A class of conjugate priors is proposed for the random-effect variance parameters in the BCSM framework. They give support to testing the presence of random effects, reduce boundary effects by allowing non-positive (co)variance parameters, and support accurate estimation even for very small true variance parameters. The conjugate priors under the BCSM lead to efficient posterior computation. Bayes factors and the Bayesian Information Criterion are discussed for the purpose of model selection in the new framework. In two simulation studies, a satisfying performance of the MCMC algorithm and of the Bayes factor is shown. In comparison with parameter expansion through a half-Cauchy prior, estimates of variance parameters close to zero show no bias and undercoverage of credible intervals is avoided. An empirical example showcases the utility of the BCSM for response times to test the influence of item presentation formats on the test performance of students in a Latin square experimental design.

Journal ArticleDOI
TL;DR: In this article, three methods that take into account heteroscedastic measurement errors of the dependent variable in stage II analysis are introduced; they are the closed-form marginal MLE, the expectation maximization algorithm, and the moment estimation method.
Abstract: When latent variables are used as outcomes in regression analysis, a common approach that is used to solve the ignored measurement error issue is to take a multilevel perspective on item response modeling (IRT). Although recent computational advancement allows efficient and accurate estimation of multilevel IRT models, we argue that a two-stage divide-and-conquer strategy still has its unique advantages. Within the two-stage framework, three methods that take into account heteroscedastic measurement errors of the dependent variable in stage II analysis are introduced; they are the closed-form marginal MLE, the expectation maximization algorithm, and the moment estimation method. They are compared to the naive two-stage estimation and the one-stage MCMC estimation. A simulation study is conducted to compare the five methods in terms of model parameter recovery and their standard error estimation. The pros and cons of each method are also discussed to provide guidelines for practitioners. Finally, a real data example is given to illustrate the applications of various methods using the National Educational Longitudinal Survey data (NELS 88).

Journal ArticleDOI
TL;DR: The modified signed likelihood ratio test and the Lugannani–Rice approximation, both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations.
Abstract: In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3–26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238–254, 2010; Glas & Dagohoy, Psychometrika 72:159–180, 2007; Guo & Drasgow, Int J Sel Assess 18:351–364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193–206, 1990; Sinharay, J Educ Behav Stat 42:46–68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307–322, 1986) and the Lugannani–Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475–490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.

Journal ArticleDOI
TL;DR: The paper extends existing research on adaptivity by discontinue rules in intelligence tests in multiple ways: First, an analytical study of the distributional properties of discontinue rule scored items is presented, and a simulation is presented that includes additional scoring rules and uses ability estimators that may be suitable to reduce bias for discontinue ruled intelligence tests.
Abstract: This paper provides results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty. The presentation of items is adaptive in the sense that a session is discontinued once a test taker produces a certain number of incorrect responses in sequence, with subsequent (not observed) responses commonly scored as wrong. The Stanford-Binet Intelligence Scales (SB5; Riverside Publishing Company, 2003) and the Kaufman Assessment Battery for Children (KABC-II; Kaufman and Kaufman, 2004), the Kaufman Adolescent and Adult Intelligence Test (Kaufman and Kaufman 2014) and the Universal Nonverbal Intelligence Test (2nd ed.) (Bracken and McCallum 2015) are some of the many examples using this rule. He and Wolfe (Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937 ) compared different ability estimation methods in a simulation study for this discontinue rule adaptation of test length. However, there has been no study, to our knowledge, of the underlying distributional properties based on analytic arguments drawing on probability theory, of what these authors call stochastic censoring of responses. The study results obtained by He and Wolfe (Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937 ) agree with results presented by DeAyala et al. (J Educ Meas 38:213-234, 2001) as well as Rose et al. (Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11), Educational Testing Service, Princeton, 2010) and Rose et al. (Psychometrika 82:795-819, 2017. https://doi.org/10.1007/s11336-016-9544-7 ) in that ability estimates are biased most when scoring the not observed responses as wrong. This scoring is used operationally, so more research is needed in order to improve practice in this field. The paper extends existing research on adaptivity by discontinue rules in intelligence tests in multiple ways: First, an analytical study of the distributional properties of discontinue rule scored items is presented. Second, a simulation is presented that includes additional scoring rules and uses ability estimators that may be suitable to reduce bias for discontinue rule scored intelligence tests.

Journal ArticleDOI
TL;DR: It is found that yea-saying tendencies depend on whether items are presented as part of a scale that contains affirmative and/or polar-opposite items, and the contextual information provided by an item scale can serve as a determinant of differential item functioning.
Abstract: This paper presents a systematic investigation of how affirmative and polar-opposite items presented either jointly or separately affect yea-saying tendencies. We measure these yea-saying tendencies with item response models that estimate a respondent’s tendency to give a “yea”-response that may be unrelated to the target trait. In a re-analysis of the Zhang et al. (PLoS ONE, 11:1–15, 2016) data, we find that yea-saying tendencies depend on whether items are presented as part of a scale that contains affirmative and/or polar-opposite items. Yea-saying tendencies are stronger for affirmative than for polar-opposite items. Moreover, presenting polar-opposite items together with affirmative items creates lower yea-saying tendencies for polar-opposite items than when presented in isolation. IRT models that do not account for these yea-saying effects arrive at a two-dimensional representation of the target trait. These findings demonstrate that the contextual information provided by an item scale can serve as a determinant of differential item functioning.

Journal ArticleDOI
TL;DR: The aim of this paper is to generalize the usual characterization of local independence without introducing new parameters, to merge the information provided by the IRT and KST perspectives, and to contribute to the literature that bridges continuous and discrete theories of assessment.
Abstract: Knowledge space theory (KST) structures are introduced within item response theory (IRT) as a possible way to model local dependence between items. The aim of this paper is threefold: firstly, to generalize the usual characterization of local independence without introducing new parameters; secondly, to merge the information provided by the IRT and KST perspectives; and thirdly, to contribute to the literature that bridges continuous and discrete theories of assessment. In detail, connections are established between the KST simple learning model (SLM) and the IRT General Graded Response Model, and between the KST Basic Local Independence Model and IRT models in general. As a consequence, local independence is generalized to account for the existence of prerequisite relations between the items, IRT models become a subset of KST models, IRT likelihood functions can be generalized to broader families, and the issues of local dependence and dimensionality are partially disentangled. Models are discussed for both dichotomous and polytomous items and conclusions are drawn on their interpretation. Considerations on possible consequences in terms of model identifiability and estimation procedures are also provided.

Journal ArticleDOI
TL;DR: This work proposes to apply the generalized method of moments (GMoM), using more statistics than parameters, to analyse dynamic network data, and describes the stochastic algorithm developed to approximate the GMoM solution.
Abstract: Stochastic actor-oriented models (SAOMs) can be used to analyse dynamic network data, collected by observing a network and a behaviour in a panel design. The parameters of SAOMs are usually estimated by the method of moments (MoM) implemented by a stochastic approximation algorithm, where statistics defining the moment conditions correspond in a natural way to the parameters. Here, we propose to apply the generalized method of moments (GMoM), using more statistics than parameters. We concentrate on statistics depending jointly on the network and the behaviour, because of the importance of their interdependence, and propose to add contemporaneous statistics to the usual cross-lagged statistics. We describe the stochastic algorithm developed to approximate the GMoM solution. A small simulation study supports the greater statistical efficiency of the GMoM estimator compared to the MoM.

Journal ArticleDOI
TL;DR: The genealogy of presidents of the Psychometric Society is presented by constructing a genealogical tree, in which Ph.D. students are encoded as descendants of their advisors, and shows that most of the presidents belong to five distinct lineages.
Abstract: In this paper, we present the academic genealogy of presidents of the Psychometric Society by constructing a genealogical tree, in which Ph.D. students are encoded as descendants of their advisors. Results show that most of the presidents belong to five distinct lineages that can be traced to Wilhelm Wundt, James Angell, William James, Albert Michotte or Carl Friedrich Gauss. Important psychometricians Lee Cronbach and Charles Spearman play only a marginal role. The genealogy systematizes important historical knowledge that can be used to inform studies on the history of psychometrics and exposes the rich and multidisciplinary background of the Psychometric Society.

Journal ArticleDOI
TL;DR: The consistency theory of the GNPC proficiency-class estimator is developed and its statistical consistency is proven.
Abstract: Parametric likelihood estimation is the prevailing method for fitting cognitive diagnosis models—also called diagnostic classification models (DCMs). Nonparametric concepts and methods that do not rely on a parametric statistical model have been proposed for cognitive diagnosis. These methods are particularly useful when sample sizes are small. The general nonparametric classification (GNPC) method for assigning examinees to proficiency classes can accommodate assessment data conforming to any diagnostic classification model that describes the probability of a correct item response as an increasing function of the number of required attributes mastered by an examinee (known as the “monotonicity assumption”). Hence, the GNPC method can be used with any model that can be represented as a general DCM. However, the statistical properties of the estimator of examinees’ proficiency class are currently unknown. In this article, the consistency theory of the GNPC proficiency-class estimator is developed and its statistical consistency is proven.

Journal ArticleDOI
TL;DR: The method for characterizing the manifest probability distributions is related to the Dutch identity, and it will find that the manifest distributions of the latent trait models share several important features, such as the dependency between accuracy and response time, but there are also differences in what function of response time is being modeled.
Abstract: In this paper we study the statistical relations between three latent trait models for accuracies and response times: the hierarchical model (HM) of van der Linden (Psychometrika 72(3):287–308, 2007), the signed residual time model (SM) proposed by Maris and van der Maas (Psychometrika 77(4):615–633, 2012), and the drift diffusion model (DM) as proposed by Tuerlinckx and De Boeck (Psychometrika 70(4):629–650, 2005). One important distinction between these models is that the HM and the DM either assume or imply that accuracies and response times are independent given the latent trait variables, while the SM does not. In this paper we investigate the impact of this conditional independence property—or a lack thereof—on the manifest probability distribution for accuracies and response times. We will find that the manifest distributions of the latent trait models share several important features, such as the dependency between accuracy and response time, but we also find important differences, such as in what function of response time is being modeled. Our method for characterizing the manifest probability distributions is related to the Dutch identity (Holland in Psychometrika 55(6):5–18, 1990).