scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 2015"


Journal ArticleDOI
TL;DR: It is proposed that CAT technology can be very useful to support individualized instruction on a mass scale and even paper and pencil based tests can be made adaptive to support classroom teaching.
Abstract: The paper provides a survey of 18 years’ progress that my colleagues, students (both former and current) and I made in a prominent research area in Psychometrics—Computerized Adaptive Testing (CAT). We start with a historical review of the establishment of a large sample foundation for CAT. It is worth noting that the asymptotic results were derived under the framework of Martingale Theory, a very theoretical perspective of Probability Theory, which may seem unrelated to educational and psychological testing. In addition, we address a number of issues that emerged from large scale implementation and show that how theoretical works can be helpful to solve the problems. Finally, we propose that CAT technology can be very useful to support individualized instruction on a mass scale. We show that even paper and pencil based tests can be made adaptive to support classroom teaching.

123 citations


Journal ArticleDOI
TL;DR: With this approach it is possible to detect groups of subjects exhibiting DIF, which are not pre-specified, but result from combinations of observed covariates, which can thus help generate hypotheses about the psychological sources of DIF.
Abstract: A variety of statistical methods have been suggested for detecting differential item functioning (DIF) in the Rasch model. Most of these methods are designed for the comparison of pre-specified focal and reference groups, such as males and females. Latent class approaches, on the other hand, allow the detection of previously unknown groups exhibiting DIF. However, this approach provides no straightforward interpretation of the groups with respect to person characteristics. Here, we propose a new method for DIF detection based on model-based recursive partitioning that can be considered as a compromise between those two extremes. With this approach it is possible to detect groups of subjects exhibiting DIF, which are not pre-specified, but result from combinations of observed covariates. These groups are directly interpretable and can thus help generate hypotheses about the psychological sources of DIF. The statistical background and construction of the new method are introduced by means of an instructive example, and extensive simulation studies are presented to support and illustrate the statistical properties of the method, which is then applied to empirical data from a general knowledge quiz. A software implementation of the method is freely available in the R system for statistical computing.

96 citations


Journal ArticleDOI
TL;DR: This work focuses on a crossed-random effects extension of the Bayesian latent-trait pair-clustering MPT model that assumes that participant and item effects combine additively on the probit scale and postulates (multivariate) normal distributions for the random effects.
Abstract: Multinomial processing tree (MPT) models are theoretically motivated stochastic models for the analysis of categorical data. Here we focus on a crossed-random effects extension of the Bayesian latent-trait pair-clustering MPT model. Our approach assumes that participant and item effects combine additively on the probit scale and postulates (multivariate) normal distributions for the random effects. We provide a WinBUGS implementation of the crossed-random effects pair-clustering model and an application to novel experimental data. The present approach may be adapted to handle other MPT models.

87 citations


Journal ArticleDOI
TL;DR: An explicit model for differential item functioning that includes a set of variables, containing metric as well as categorical components, as potential candidates for inducing DIF is proposed, which is able to detect items with DIF.
Abstract: A new diagnostic tool for the identification of differential item functioning (DIF) is proposed. Classical approaches to DIF allow to consider only few subpopulations like ethnic groups when investigating if the solution of items depends on the membership to a subpopulation. We propose an explicit model for differential item functioning that includes a set of variables, containing metric as well as categorical components, as potential candidates for inducing DIF. The ability to include a set of covariates entails that the model contains a large number of parameters. Regularized estimators, in particular penalized maximum likelihood estimators, are used to solve the estimation problem and to identify the items that induce DIF. It is shown that the method is able to detect items with DIF. Simulations and two applications demonstrate the applicability of the method.

73 citations


Journal ArticleDOI
TL;DR: The general methodology is illustrated with several item response data sets, and it is shown that there is a substantial improvement on existing models both conceptually and in fit to data.
Abstract: Factor or conditional independence models based on copulas are proposed for multivariate discrete data such as item responses. The factor copula models have interpretations of latent maxima/minima (in comparison with latent means) and can lead to more probability in the joint upper or lower tail compared with factor models based on the discretized multivariate normal distribution (or multidimensional normal ogive model). Details on maximum likelihood estimation of parameters for the factor copula model are given, as well as analysis of the behavior of the log-likelihood. Our general methodology is illustrated with several item response data sets, and it is shown that there is a substantial improvement on existing models both conceptually and in fit to data.

59 citations


Journal ArticleDOI
TL;DR: A Bayesian hierarchical analysis of the cognitive process model of response choice and response time performance data that has excellent psychometric properties and may be used in a wide variety of contexts is provided.
Abstract: We present a cognitive process model of response choice and response time performance data that has excellent psychometric properties and may be used in a wide variety of contexts. In the model there is an accumulator associated with each response option. These accumulators have bounds, and the first accumulator to reach its bound determines the response time and response choice. The times at which accumulator reaches its bound is assumed to be lognormally distributed, hence the model is race or minima process among lognormal variables. A key property of the model is that it is relatively straightforward to place a wide variety of models on the logarithm of these finishing times including linear models, structural equation models, autoregressive models, growth-curve models, etc. Consequently, the model has excellent statistical and psychometric properties and can be used in a wide range of contexts, from laboratory experiments to high-stakes testing, to assess performance. We provide a Bayesian hierarchical analysis of the model, and illustrate its flexibility with an application in testing and one in lexical decision making, a reading skill.

52 citations


Journal ArticleDOI
TL;DR: Existing tree algorithms designed specifically for censored responses as well as recently developed survival ensemble methods are reviewed, and available computer software is introduced.
Abstract: Classification and Regression Trees (CART), and their successors—bagging and random forests, are statistical learning tools that are receiving increasing attention. However, due to characteristics of censored data collection, standard CART algorithms are not immediately transferable to the context of survival analysis. Questions about the occurrence and timing of events arise throughout psychological and behavioral sciences, especially in longitudinal studies. The prediction power and other key features of tree-based methods are promising in studies where an event occurrence is the outcome of interest. This article reviews existing tree algorithms designed specifically for censored responses as well as recently developed survival ensemble methods, and introduces available computer software. Through simulations and a practical example, merits and limitations of these methods are discussed. Suggestions are provided for practical use.

48 citations


Journal ArticleDOI
TL;DR: An IRT-based statistical test for differential item functioning (DIF) that is closely related to Lord’s (Applications of item response theory to practical testing problems) test for item DIF albeit with a different and more correct interpretation.
Abstract: This paper presents an IRT-based statistical test for differential item functioning (DIF). The test is developed for items conforming to the Rasch (Probabilistic models for some intelligence and attainment tests, The Danish Institute of Educational Research, Copenhagen, 1960) model but we will outline its extension to more complex IRT models. Its difference from the existing procedures is that DIF is defined in terms of the relative difficulties of pairs of items and not in terms of the difficulties of individual items. The argument is that the difficulty of an item is not identified from the observations, whereas the relative difficulties are. This leads to a test that is closely related to Lord’s (Applications of item response theory to practical testing problems, Erlbaum, Hillsdale, 1980) test for item DIF albeit with a different and more correct interpretation. Illustrations with real and simulated data are provided.

47 citations


Journal ArticleDOI
TL;DR: It is proved that in fact the Multiple Strategy DINA (Deterministic Input Noisy AND-gate) model and the CBLIM, a competence-based extension of the basic local independence model (BLIM), are equivalent.
Abstract: The present work explores the connections between cognitive diagnostic models (CDM) and knowledge space theory (KST) and shows that these two quite distinct approaches overlap It is proved that in fact the Multiple Strategy DINA (Deterministic Input Noisy AND-gate) model and the CBLIM, a competence-based extension of the basic local independence model (BLIM), are equivalent To demonstrate the benefits that arise from integrating the two theoretical perspectives, it is shown that a fairly complete picture on the identifiability of these models emerges by combining results from both camps The impact of the results is illustrated by an empirical example, and topics for further research are pointed out

47 citations


Journal ArticleDOI
TL;DR: Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to TML, and they control type I errors reasonably well whenever N≥max(50,2p).
Abstract: Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when the number of observations (N) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N, empirical results show that T ML rejects the correct model too often when p is not too small. Various corrections to T ML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct T ML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to T ML, and they control type I errors reasonably well whenever N≥max(50,2p). The formulations of the empirically corrected statistics are further used to predict type I errors of T ML as reported in the literature, and they perform well.

43 citations


Journal ArticleDOI
TL;DR: A theorem proving that the item parameters are not identified, do not have an empirical interpretation and that it is not possible to obtain consistent and unbiased estimates of them is provided for the 3PL model.
Abstract: The paper offers a general review of the basic concepts of both statistical model and parameter identification, and revisits the conceptual relationships between parameter identification and both parameter interpretability and properties of parameter estimates. All these issues are then exemplified for the 1PL, 2PL, and 1PL-G fixed-effects models. For the 3PL model, however, we provide a theorem proving that the item parameters are not identified, do not have an empirical interpretation and that it is not possible to obtain consistent and unbiased estimates of them.

Journal ArticleDOI
TL;DR: This new method, denoted as multivariate weighted MLE (MWLE), is proposed to reduce the bias of the MLE even for short tests and is compared to alternative estimators to be more accurate in terms of bias than MLE while maintaining a similar variance.
Abstract: Making inferences from IRT-based test scores requires accurate and reliable methods of person parameter estimation. Given an already calibrated set of item parameters, the latent trait could be estimated either via maximum likelihood estimation (MLE) or using Bayesian methods such as maximum a posteriori (MAP) estimation or expected a posteriori (EAP) estimation. In addition, Warm’s (Psychometrika 54:427–450, 1989) weighted likelihood estimation method was proposed to reduce the bias of the latent trait estimate in unidimensional models. In this paper, we extend the weighted MLE method to multidimensional models. This new method, denoted as multivariate weighted MLE (MWLE), is proposed to reduce the bias of the MLE even for short tests. MWLE is compared to alternative estimators (i.e., MLE, MAP and EAP) and shown, both analytically and through simulations studies, to be more accurate in terms of bias than MLE while maintaining a similar variance. In contrast, Bayesian estimators (i.e., MAP and EAP) result in biased estimates with smaller variability.

Journal ArticleDOI
TL;DR: This work investigates the implications of penalizing incorrect answers to multiple-choice tests, from the perspective of both test-takers and test-makers using a model that combines a well-known item response theory model with prospect theory.
Abstract: We investigate the implications of penalizing incorrect answers to multiple-choice tests, from the perspective of both test-takers and test-makers. To do so, we use a model that combines a well-known item response theory model with prospect theory (Kahneman and Tversky, Prospect theory: An analysis of decision under risk, Econometrica 47:263–91, 1979). Our results reveal that when test-takers are fully informed of the scoring rule, the use of any penalty has detrimental effects for both test-takers (they are always penalized in excess, particularly those who are risk averse and loss averse) and test-makers (the bias of the estimated scores, as well as the variance and skewness of their distribution, increase as a function of the severity of the penalty).

Journal ArticleDOI
TL;DR: This work reviews the popular Wald-type and lesser known likelihood- based methods in linear SEM, emphasizing profile likelihood-based confidence intervals (CIs) and illustrates the use of these CIs and CRs with two empirical examples.
Abstract: Structural equation models (SEM) are widely used for modeling complex multivariate relationships among measured and latent variables. Although several analytical approaches to interval estimation in SEM have been developed, there lacks a comprehensive review of these methods. We review the popular Wald-type and lesser known likelihood-based methods in linear SEM, emphasizing profile likelihood-based confidence intervals (CIs). Existing algorithms for computing profile likelihood-based CIs are described, including two newer algorithms which are extended to construct profile likelihood-based confidence regions (CRs). Finally, we illustrate the use of these CIs and CRs with two empirical examples, and provide practical recommendations on when to use Wald-type CIs and CRs versus profile likelihood-based CIs and CRs. OpenMx example code is provided in an Online Appendix for constructing profile likelihood-based CIs and CRs for SEM.

Journal ArticleDOI
TL;DR: A Cultural Consensus Theory approach for ordinal data is developed, leading to a new model for ordered polytomous data that introduces a novel way of measuring response biases and also measures consensus item values, a consensus response scale, item difficulty, and informant knowledge.
Abstract: A Cultural Consensus Theory approach for ordinal data is developed, leading to a new model for ordered polytomous data. The model introduces a novel way of measuring response biases and also measures consensus item values, a consensus response scale, item difficulty, and informant knowledge. The model is extended as a finite mixture model to fit both simulated and real multicultural data, in which subgroups of informants have different sets of consensus item values. The extension is thus a form of model-based clustering for ordinal data. The hierarchical Bayesian framework is utilized for inference, and two posterior predictive checks are developed to verify the central assumptions of the model.

Journal ArticleDOI
TL;DR: Methods to enhance the application of Cultural Consensus Theory models by developing the appropriate specifications for hierarchical Bayesian inference are developed.
Abstract: Cultural Consensus Theory (CCT) models have been applied extensively across research domains in the social and behavioral sciences in order to explore shared knowledge and beliefs. CCT models operate on response data, in which the answer key is latent. The current paper develops methods to enhance the application of these models by developing the appropriate specifications for hierarchical Bayesian inference. A primary contribution is the methodology for integrating the use of covariates into CCT models. More specifically, both person- and item-related parameters are introduced as random effects that can respectively account for patterns of inter-individual and inter-item variability.

Journal ArticleDOI
TL;DR: In this paper, a dimension reduction method that is specific to the Lord-Wingersky recursions is developed, taking advantage of the restrictions implied by hierarchical item factor models, e.g., the bifactor model, the testlet model, or the two-tier model, such that a version of the Lord and Wingersky recursive algorithm can operate on a dramatically reduced set of quadrature points.
Abstract: Lord and Wingersky’s (Appl Psychol Meas 8:453–461, 1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined on a grid formed by direct products of quadrature points. However, the increase in computational burden remains exponential in the number of dimensions, making the implementation of the recursive algorithm cumbersome for truly high-dimensional models. In this paper, a dimension reduction method that is specific to the Lord–Wingersky recursions is developed. This method can take advantage of the restrictions implied by hierarchical item factor models, e.g., the bifactor model, the testlet model, or the two-tier model, such that a version of the Lord–Wingersky recursive algorithm can operate on a dramatically reduced set of quadrature points. For instance, in a bifactor model, the dimension of integration is always equal to 2, regardless of the number of factors. The new algorithm not only provides an effective mechanism to produce summed score to IRT scaled score translation tables properly adjusted for residual dependence, but leads to new applications in test scoring, linking, and model fit checking as well. Simulated and empirical examples are used to illustrate the new applications.

Journal ArticleDOI
TL;DR: Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration and a comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D- Optimality relative to A-optimality.
Abstract: An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers’ ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.

Journal ArticleDOI
TL;DR: Theorems are presented that show the consistency of this approach, when the true model is one of several common latent class models for cognitive diagnosis, independent of sample size, because no model parameters need to be estimated.
Abstract: Latent class models for cognitive diagnosis have been developed to classify examinees into one of the 2 (K) attribute profiles arising from a K-dimensional vector of binary skill indicators. These models recognize that response patterns tend to deviate from the ideal responses that would arise if skills and items generated item responses through a purely deterministic conjunctive process. An alternative to employing these latent class models is to minimize the distance between observed item response patterns and ideal response patterns, in a nonparametric fashion that utilizes no stochastic terms for these deviations. Theorems are presented that show the consistency of this approach, when the true model is one of several common latent class models for cognitive diagnosis. Consistency of classification is independent of sample size, because no model parameters need to be estimated. Simultaneous consistency for a large group of subjects can also be shown given some conditions on how sample size and test length grow with one another.

Journal ArticleDOI
TL;DR: A heteroscedastic latent trait model is presented for dichotomous data, studied in a simulation study, and applied to data pertaining alcohol use and cognitive ability.
Abstract: Effort has been devoted to account for heteroscedasticity with respect to observed or latent moderator variables in item or test scores. For instance, in the multi-group generalized linear latent trait model, it could be tested whether the observed (polychoric) covariance matrix differs across the levels of an observed moderator variable. In the case that heteroscedasticity arises across the latent trait itself, existing models commonly distinguish between heteroscedastic residuals and a skewed trait distribution. These models have valuable applications in intelligence, personality and psychopathology research. However, existing approaches are only limited to continuous and polytomous data, while dichotomous data are common in intelligence and psychopathology research. Therefore, in present paper, a heteroscedastic latent trait model is presented for dichotomous data. The model is studied in a simulation study, and applied to data pertaining alcohol use and cognitive ability.

Journal ArticleDOI
TL;DR: This work deduces the distribution and the copula for a vector generated by a generalized VM transformation, and shows that it is fundamentally linked to the underlying Gaussian distribution and copula.
Abstract: The Vale–Maurelli (VM) approach to generating non-normal multivariate data involves the use of Fleishman polynomials applied to an underlying Gaussian random vector. This method has been extensively used in Monte Carlo studies during the last three decades to investigate the finite-sample performance of estimators under non-Gaussian conditions. The validity of conclusions drawn from these studies clearly depends on the range of distributions obtainable with the VM method. We deduce the distribution and the copula for a vector generated by a generalized VM transformation, and show that it is fundamentally linked to the underlying Gaussian distribution and copula. In the process we derive the distribution of the Fleishman polynomial in full generality. While data generated with the VM approach appears to be highly non-normal, its truly multivariate properties are close to the Gaussian case. A Monte Carlo study illustrates that generating data with a different copula than that implied by the VM approach severely weakens the performance of normal-theory based ML estimates.

Journal ArticleDOI
TL;DR: The Bayesian hierarchical IRT model using the MCMC algorithms developed in the current study has the potential to be widely implemented for IDA studies or multi-site studies, and can be further refined to meet more complicated needs in applied research.
Abstract: The present paper proposes a hierarchical, multi-unidimensional two-parameter logistic item response theory (2PL-MUIRT) model extended for a large number of groups. The proposed model was motivated by a large-scale integrative data analysis (IDA) study which combined data (N = 24,336) from 24 independent alcohol intervention studies. IDA projects face unique challenges that are different from those encountered in individual studies, such as the need to establish a common scoring metric across studies and to handle missingness in the pooled data. To address these challenges, we developed a Markov chain Monte Carlo (MCMC) algorithm for a hierarchical 2PL-MUIRT model for multiple groups in which not only were the item parameters and latent traits estimated, but the means and covariance structures for multiple dimensions were also estimated across different groups. Compared to a few existing MCMC algorithms for multidimensional IRT models that constrain the item parameters to facilitate estimation of the covariance matrix, we adapted an MCMC algorithm so that we could directly estimate the correlation matrix for the anchor group without any constraints on the item parameters. The feasibility of the MCMC algorithm and the validity of the basic calibration procedure were examined using a simulation study. Results showed that model parameters could be adequately recovered, and estimated latent trait scores closely approximated true latent trait scores. The algorithm was then applied to analyze real data (69 items across 20 studies for 22,608 participants). The posterior predictive model check showed that the model fit all items well, and the correlations between the MCMC scores and original scores were overall quite high. An additional simulation study demonstrated robustness of the MCMC procedures in the context of the high proportion of missingness in data. The Bayesian hierarchical IRT model using the MCMC algorithms developed in the current study has the potential to be widely implemented for IDA studies or multi-site studies, and can be further refined to meet more complicated needs in applied research.

Journal ArticleDOI
TL;DR: Using the fact that each curve corresponds to a natural univariate measure of diagnostic accuracy, it is shown how covariate adjusted mixtures lead to a meta-regression on SROC curves.
Abstract: Many screening tests dichotomize a measurement to classify subjects. Typically a cut-off value is chosen in a way that allows identification of an acceptable number of cases relative to a reference procedure, but does not produce too many false positives at the same time. Thus for the same sample many pairs of sensitivities and false positive rates result as the cut-off is varied. The curve of these points is called the receiver operating characteristic (ROC) curve. One goal of diagnostic meta-analysis is to integrate ROC curves and arrive at a summary ROC (SROC) curve. Holling, Bohning, and Bohning (Psychometrika 77:106–126, 2012a) demonstrated that finite semiparametric mixtures can describe the heterogeneity in a sample of Lehmann ROC curves well; this approach leads to clusters of SROC curves of a particular shape. We extend this work with the help of the $$t_{\alpha }$$ transformation, a flexible family of transformations for proportions. A collection of SROC curves is constructed that approximately contains the Lehmann family but in addition allows the modeling of shapes beyond the Lehmann ROC curves. We introduce two rationales for determining the shape from the data. Using the fact that each curve corresponds to a natural univariate measure of diagnostic accuracy, we show how covariate adjusted mixtures lead to a meta-regression on SROC curves. Three worked examples illustrate the method.

Journal ArticleDOI
TL;DR: An approach to quantifying errors in covariance structures in which adventitious error is explicitly modeled as a random effect with a distribution, and the dispersion parameter of this distribution to be estimated gives a measure of misspecification.
Abstract: We present an approach to quantifying errors in covariance structures in which adventitious error, identified as the process underlying the discrepancy between the population and the structured model, is explicitly modeled as a random effect with a distribution, and the dispersion parameter of this distribution to be estimated gives a measure of misspecification. Analytical properties of the resultant procedure are investigated and the measure of misspecification is found to be related to the root mean square error of approximation. An algorithm is developed for numerical implementation of the procedure. The consistency and asymptotic sampling distributions of the estimators are established under a new asymptotic paradigm and an assumption weaker than the standard Pitman drift assumption. Simulations validate the asymptotic sampling distributions and demonstrate the importance of accounting for the variations in the parameter estimates due to adventitious error. Two examples are also given as illustrations.

Journal ArticleDOI
TL;DR: Using the odds avoids the arbitrary choice between statistical tests of answer copying that do and do not condition on the responses the test taker is suspected to have copied and allows the testing agency to account for existing circumstantial evidence of cheating through the specification of prior odds.
Abstract: Posterior odds of cheating on achievement tests are presented as an alternative to $$p$$ values reported for statistical hypothesis testing for several of the probabilistic models in the literature on the detection of cheating. It is shown how to calculate their combinatorial expressions with the help of a reformulation of the simple recursive algorithm for the calculation of number-correct score distributions used throughout the testing industry. Using the odds avoids the arbitrary choice between statistical tests of answer copying that do and do not condition on the responses the test taker is suspected to have copied and allows the testing agency to account for existing circumstantial evidence of cheating through the specification of prior odds.

Journal ArticleDOI
TL;DR: This work reconsiders the standard institutionally generated PV methodology and finds it applies with greater generality than shown previously and offers an alternative approach that avoids biases based on the mixed effects structural equations model of Schofield.
Abstract: Plausible values (PVs) are a standard multiple imputation tool for analysis of large education survey data, which measures latent proficiency variables. When latent proficiency is the dependent variable, we reconsider the standard institutionally generated PV methodology and find it applies with greater generality than shown previously. When latent proficiency is an independent variable, we show that the standard institutional PV methodology produces biased inference because the institutional conditioning model places restrictions on the form of the secondary analysts’ model. We offer an alternative approach that avoids these biases based on the mixed effects structural equations model of Schofield (Modeling measurement error when using cognitive test scores in social science research. Doctoral dissertation. Department of Statistics and Heinz College of Public Policy. Pittsburgh, PA: Carnegie Mellon University, 2008).

Journal ArticleDOI
TL;DR: This work proposes a new alternative approach to component loadings interpretation, which produces sparse loadings in an optimal way and is illustrated on two well-known data sets and compared to the existing rotation methods.
Abstract: The component loadings are interpreted by considering their magnitudes, which indicates how strongly each of the original variables relates to the corresponding principal component. The usual ad hoc practice in the interpretation process is to ignore the variables with small absolute loadings or set to zero loadings smaller than some threshold value. This, in fact, makes the component loadings sparse in an artificial and a subjective way. We propose a new alternative approach, which produces sparse loadings in an optimal way. The introduced approach is illustrated on two well-known data sets and compared to the existing rotation methods.

Journal ArticleDOI
TL;DR: It is shown that the asymptotic distribution-free (ADF) method for computing the covariance matrix of standardized regression coefficients works well with nonnormal data in moderate-to-large samples using both simulated and real-data examples.
Abstract: Yuan and Chan (Psychometrika, 76, 670–690, 2011) recently showed how to compute the covariance matrix of standardized regression coefficients from covariances. In this paper, we describe a method for computing this covariance matrix from correlations. Next, we describe an asymptotic distribution-free (ADF; Browne in British Journal of Mathematical and Statistical Psychology, 37, 62–83, 1984) method for computing the covariance matrix of standardized regression coefficients. We show that the ADF method works well with nonnormal data in moderate-to-large samples using both simulated and real-data examples. R code (R Development Core Team, 2012) is available from the authors or through the Psychometrika online repository for supplementary materials.

Journal ArticleDOI
TL;DR: This paper considers CAT for diagnostic classification models, for which attribute estimation corresponds to a classification problem, and proposes an alternative criterion based on the asymptotic decay rate of the misclassification probabilities.
Abstract: Computerized adaptive testing (CAT) is a sequential experiment design scheme that tailors the selection of experiments to each subject. Such a scheme measures subjects’ attributes (unknown parameters) more accurately than the regular prefixed design. In this paper, we consider CAT for diagnostic classification models, for which attribute estimation corresponds to a classification problem. After a review of existing methods, we propose an alternative criterion based on the asymptotic decay rate of the misclassification probabilities. The new criterion is then developed into new CAT algorithms, which are shown to achieve the asymptotically optimal misclassification rate. Simulation studies are conducted to compare the new approach with existing methods, demonstrating its effectiveness, even for moderate length tests.

Journal ArticleDOI
TL;DR: In this article, the authors introduced quantile lower bound coefficients λ¯¯¯¯ 4(Q) that refer to cumulative proportions of potential locally optimal "split-half" coefficients that are below a particular point Q in the distribution of split-halves based on different partitions of variables into two sets.
Abstract: Extending the theory of lower bounds to reliability based on splits given by Guttman (in Psychometrika 53, 63–70, 1945), this paper introduces quantile lower bound coefficients λ 4(Q) that refer to cumulative proportions of potential locally optimal “split-half” coefficients that are below a particular point Q in the distribution of split-halves based on different partitions of variables into two sets. Interesting quantile values are Q=0.05,0.50,0.95,1.00 with λ 4(0.05)≤λ 4(0.50)≤λ 4(0.95)≤λ 4(1.0). Only the global optimum λ 4(1.0), Guttman’s maximal λ 4, has previously been considered to be interesting, but in small samples it substantially overestimates population reliability ρ. The three coefficients λ 4(0.05), λ 4(0.50), and λ 4(0.95) provide new lower bounds to reliability. The smallest, λ 4(0.05), provides the most protection against capitalizing on chance associations, and thus overestimation, λ 4(0.50) is the median of these coefficients, while λ 4(0.95) tends to overestimate reliability, but also exhibits less bias than previous estimators. Computational theory, algorithm, and publicly available code based in R are provided to compute these coefficients. Simulation studies evaluate the performance of these coefficients and compare them to coefficient alpha and the greatest lower bound under several population reliability structures.