scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 2012"


Journal ArticleDOI
TL;DR: This paper considers oblique rotation and to compare it to orthogonal rotation and finds a surprising result arises when oblique bi-factor rotation methods are applied to ideal data.
Abstract: Bi-factor analysis is a form of confirmatory factor analysis originally introduced by Holzinger and Swineford (Psychometrika 47:41–54, 1937). The bi-factor model has a general factor, a number of group factors, and an explicit bi-factor structure. Jennrich and Bentler (Psychometrika 76:537–549, 2011) introduced an exploratory form of bi-factor analysis that does not require one to provide an explicit bi-factor structure a priori. They use exploratory factor analysis and a bifactor rotation criterion designed to produce a rotated loading matrix that has an approximate bi-factor structure. Among other things this can be used as an aid in finding an explicit bi-factor structure for use in a confirmatory bi-factor analysis. They considered only orthogonal rotation. The purpose of this paper is to consider oblique rotation and to compare it to orthogonal rotation. Because there are many more oblique rotations of an initial loading matrix than orthogonal rotations, one expects the oblique results to approximate a bi-factor structure better than orthogonal rotations and this is indeed the case. A surprising result arises when oblique bi-factor rotation methods are applied to ideal data.

123 citations


Journal ArticleDOI
TL;DR: In this article, an explicit scoring rule for time limit tasks incorporating both response time and accuracy, and a definite trade-off between speed and accuracy was derived, and the model belongs to the exponential family.
Abstract: Starting from an explicit scoring rule for time limit tasks incorporating both response time and accuracy, and a definite trade-off between speed and accuracy, a response model is derived. Since the scoring rule is interpreted as a sufficient statistic, the model belongs to the exponential family. The various marginal and conditional distributions for response accuracy and response time are derived, and it is shown how the model parameters can be estimated. The model for response accuracy is found to be the two-parameter logistic model. It is found that the time limit determines the item discrimination, and this effect is illustrated with the Amsterdam Chess Test II.

104 citations


Journal ArticleDOI
TL;DR: Three simulation studies and one case study are presented to elaborate the proposed two-step Bayesian propensity score approach and reveal that greater precision in the propensity score equation yields better recovery of the frequentist-based treatment effect.
Abstract: A two-step Bayesian propensity score approach is introduced that incorporates prior information in the propensity score equation and outcome equation without the problems associated with simultaneous Bayesian propensity score approaches. The corresponding variance estimators are also provided. The two-step Bayesian propensity score is provided for three methods of implementation: propensity score stratification, weighting, and optimal full matching. Three simulation studies and one case study are presented to elaborate the proposed two-step Bayesian propensity score approach. Results of the simulation studies reveal that greater precision in the propensity score equation yields better recovery of the frequentist-based treatment effect. A slight advantage is shown for the Bayesian approach in small samples. Results also reveal that greater precision around the wrong treatment effect can lead to seriously distorted results. However, greater precision around the correct treatment effect parameter yields quite good results, with slight improvement seen with greater precision in the propensity score equation. A comparison of coverage rates for the conventional frequentist approach and proposed Bayesian approach is also provided. The case study reveals that credible intervals are wider than frequentist confidence intervals when priors are non-informative.

60 citations


Journal ArticleDOI
TL;DR: In this paper, a two-stage robust procedure for structural equation modeling (SEM) and an R package rsem were developed to facilitate the use of the procedure by applied researchers.
Abstract: The paper develops a two-stage robust procedure for structural equation modeling (SEM) and an R package rsem to facilitate the use of the procedure by applied researchers. In the first stage, M-estimates of the saturated mean vector and covariance matrix of all variables are obtained. Those corresponding to the substantive variables are then fitted to the structural model in the second stage. A sandwich-type covariance matrix is used to obtain consistent standard errors (SE) of the structural parameter estimates. Rescaled, adjusted as well as corrected and F-statistics are proposed for overall model evaluation. Using R and EQS, the R package rsem combines the two stages and generates all the test statistics and consistent SEs. Following the robust analysis, multiple model fit indices and standardized solutions are provided in the corresponding output of EQS. An example with open/closed book examination data illustrates the proper use of the package. The method is further applied to the analysis of a data set from the National Longitudinal Survey of Youth 1997 cohort, and results show that the developed procedure not only gives a better endorsement of the substantive models but also yields estimates with uniformly smaller standard errors than the normal-distribution-based maximum likelihood.

59 citations


Journal ArticleDOI
TL;DR: This research is a starting point of introducing online calibration in CD-CAT, and further studies are proposed for investigations such as different sample sizes, cognitive diagnostic models, and attribute-hierarchical structures.
Abstract: Item replenishing is essential for item bank maintenance in cognitive diagnostic computerized adaptive testing (CD-CAT). In regular CAT, online calibration is commonly used to calibrate the new items continuously. However, until now no reference has publicly become available about online calibration for CD-CAT. Thus, this study investigates the possibility to extend some current strategies used in CAT to CD-CAT. Three representative online calibration methods were investigated: Method A (Stocking in Scale drift in on-line calibration. Research Rep. 88-28, 1988), marginal maximum likelihood estimate with one EM cycle (OEM) (Wainer & Mislevy In H. Wainer (ed.) Computerized adaptive testing: A primer, pp. 65–102, 1990) and marginal maximum likelihood estimate with multiple EM cycles (MEM) (Ban, Hanson, Wang, Yi, & Harris in J. Educ. Meas. 38:191–212, 2001). The objective of the current paper is to generalize these methods to the CD-CAT context under certain theoretical justifications, and the new methods are denoted as CD-Method A, CD-OEM and CD-MEM, respectively. Simulation studies are conducted to compare the performance of the three methods in terms of item-parameter recovery, and the results show that all three methods are able to recover item parameters accurately and CD-Method A performs best when the items have smaller slipping and guessing parameters. This research is a starting point of introducing online calibration in CD-CAT, and further studies are proposed for investigations such as different sample sizes, cognitive diagnostic models, and attribute-hierarchical structures.

45 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the quadratically weighted kappa does not depend on the value of the center cell of the agreement table, since the center cells reflects the exact agreement of the two raters on the middle category.
Abstract: The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of the quadratically weighted kappa that are paradoxical. For agreement tables with an odd number of categories n it is shown that if one of the raters uses the same base rates for categories 1 and n, categories 2 and n−1, and so on, then the value of quadratically weighted kappa does not depend on the value of the center cell of the agreement table. Since the center cell reflects the exact agreement of the two raters on the middle category, this result questions the applicability of the quadratically weighted kappa to agreement studies. If one wants to report a single index of agreement for an ordinal scale, it is recommended that the linearly weighted kappa instead of the quadratically weighted kappa is used.

44 citations


Journal ArticleDOI
TL;DR: In this paper, a psychometric model was proposed to separate intuitive and deliberate response tendencies in reasoning vignettes, which facilitates the analysis of dual-process item responses and the assessment of individual-difference factors, as well as conditions that favor one response tendency over another one.
Abstract: In a number of psychological studies, answers to reasoning vignettes have been shown to result from both intuitive and deliberate response processes. This paper utilizes a psychometric model to separate these two response tendencies. An experimental application shows that the proposed model facilitates the analysis of dual-process item responses and the assessment of individual-difference factors, as well as conditions that favor one response tendency over another one.

43 citations


Journal ArticleDOI
Klaas Sijtsma1
TL;DR: In this paper, the authors discuss the influence of test length on decision quality in personnel selection and quality of difference scores in therapy assessment, and theory development in test construction and validity research.
Abstract: I address two issues that were inspired by my work on the Dutch Committee on Tests and Testing (COTAN). The first issue is the understanding of problems test constructors and researchers using tests have of psychometric knowledge. I argue that this understanding is important for a field, like psychometrics, for which the dissemination of psychometric knowledge among test constructors and researchers in general is highly important. The second issue concerns the identification of psychometric research topics that are relevant for test constructors and test users but in my view do not receive enough attention in psychometrics. I discuss the influence of test length on decision quality in personnel selection and quality of difference scores in therapy assessment, and theory development in test construction and validity research. I also briefly mention the issue of whether particular attributes are continuous or discrete.

42 citations


Journal ArticleDOI
TL;DR: This paper presents the Heteroscedastic GRM with Skewed Latent Trait, which extends the traditional GRM by incorporation of heteroscedastics error variances and a skew-normal latent trait and investigates the viability of the model and the specificity of the effects.
Abstract: The Graded Response Model (GRM; Samejima, Estimation of ability using a response pattern of graded scores, Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, θ, to underlie the ordinal item scores (Takane & de Leeuw in Psychometrika, 52:393–408, 1987). Traditionally, a normal distribution is specified for Z implying homoscedastic error variances and a normally distributed θ. In this paper, we present the Heteroscedastic GRM with Skewed Latent Trait, which extends the traditional GRM by incorporation of heteroscedastic error variances and a skew-normal latent trait. An appealing property of the extended GRM is that it includes the traditional GRM as a special case. This enables specific tests on the normality assumption of Z. We show how violations of normality in Z can lead to asymmetrical category response functions. The ability to test this normality assumption is beneficial from both a statistical and substantive perspective. In a simulation study, we show the viability of the model and investigate the specificity of the effects. We apply the model to a dataset on affect and a dataset on alexithymia.

39 citations


Journal ArticleDOI
TL;DR: Five item selection procedures in the MCAT framework are compared by varying the structure of item pools, the population distribution of the simulees, the number of items selected, and the content area to find an item selection procedure that yields higher precisions for both the domain and composite abilities and a higher percentage of selected items from the item pool.
Abstract: Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure of item pools, the population distribution of the simulees, the number of items selected, and the content area. The existing procedures such as Volume (Segall in Psychometrika, 61:331–354, 1996), Kullback–Leibler information (Veldkamp & van der Linden in Psychometrika 67:575–588, 2002), Minimize the error variance of the linear combination (van der Linden in J. Educ. Behav. Stat. 24:398–412, 1999), and Minimum Angle (Reckase in Multidimensional item response theory, Springer, New York, 2009) are compared to a new procedure, Minimize the error variance of the composite score with the optimized weight, proposed for the first time in this study. The intent is to find an item selection procedure that yields higher precisions for both the domain and composite abilities and a higher percentage of selected items from the item pool. The comparison is performed by examining the absolute bias, correlation, test reliability, time used, and item usage. Three sets of item pools are used with the item parameters estimated from real live CAT data. Results show that Volume and Minimum Angle performed similarly, balancing information for all content areas, while the other three procedures performed similarly, with a high precision for both domain and overall scores when selecting items with the required number of items for each domain. The new item selection procedure has the highest percentage of item usage. Moreover, for the overall score, it produces similar or even better results compared to those from the method that selects items favoring the general dimension using the general model (Segall in Psychometrika 66:79–97, 2001); the general dimension method has low precision for the domain scores. In addition to the simulation study, the mathematical theories for certain procedures are derived. The theories are confirmed by the simulation applications.

37 citations


Journal ArticleDOI
TL;DR: A model is proposed that unifies two popular approaches to response time modeling: Proportional hazard models and the accelerated failure time model with log–normally distributed response times by resorting to discrete time.
Abstract: Latent trait models for response times in tests have become popular recently. One challenge for response time modeling is the fact that the distribution of response times can differ considerably even in similar tests. In order to reduce the need for tailor-made models, a model is proposed that unifies two popular approaches to response time modeling: Proportional hazard models and the accelerated failure time model with log–normally distributed response times. This is accomplished by resorting to discrete time. The categorization of response time allows the formulation of a response time model within the framework of generalized linear models by using a flexible link function. Item parameters of the proposed model can be estimated with marginal maximum likelihood estimation. Applicability of the proposed approach is demonstrated with a simulation study and an empirical application. Additionally, means for the evaluation of model fit are suggested.

Journal ArticleDOI
TL;DR: A composite likelihood estimation approach that uses bivariate instead of multivariate marginal probabilities for ordinal longitudinal responses using a latent variable model that considers time-dependent latent variables and item-specific random effects to be accountable for the interdependencies of the multivariate ordinal items.
Abstract: The paper proposes a composite likelihood estimation approach that uses bivariate instead of multivariate marginal probabilities for ordinal longitudinal responses using a latent variable model. The model considers time-dependent latent variables and item-specific random effects to be accountable for the interdependencies of the multivariate ordinal items. Time-dependent latent variables are linked with an autoregressive model. Simulation results have shown composite likelihood estimators to have a small amount of bias and mean square error and as such they are feasible alternatives to full maximum likelihood. Model selection criteria developed for composite likelihood estimation are used in the applications. Furthermore, lower-order residuals are used as measures-of-fit for the selected models.

Journal ArticleDOI
TL;DR: In this article, the authors show that multidimensional response models are compensatory in their ability parameters if and only if they are monotone, and a minimal set of assumptions are presented under which the ability parameters of the model are also compensatory.
Abstract: The issue of compensation in multidimensional response modeling is addressed. We show that multidimensional response models are compensatory in their ability parameters if and only if they are monotone. In addition, a minimal set of assumptions is presented under which the MLEs of the ability parameters are also compensatory. In a recent series of articles, beginning with Hooker, Finkelman, and Schwartzman (2009) in this journal, the second type of compensation was presented as a paradoxical result for certain multidimensional response models, leading to occasional unfairness in maximum-likelihood test scoring. First, it is indicated that the compensation is not unique and holds generally for any multiparameter likelihood with monotone score functions. Second, we analyze why, in spite of its generality, the compensation may give the impression of a paradox or unfairness.

Journal ArticleDOI
TL;DR: In this paper, the reliability coefficient for item response theory (IRT) ability estimates is defined for a population of examinees in two different ways: as (a) the product-moment correlation between ability estimates on two parallel forms of a test and (b) the squared correlation between the true abilities and estimates.
Abstract: Assuming item parameters on a test are known constants, the reliability coefficient for item response theory (IRT) ability estimates is defined for a population of examinees in two different ways: as (a) the product-moment correlation between ability estimates on two parallel forms of a test and (b) the squared correlation between the true abilities and estimates. Due to the bias of IRT ability estimates, the parallel-forms reliability coefficient is not generally equal to the squared-correlation reliability coefficient. It is shown algebraically that the parallel-forms reliability coefficient is expected to be greater than the squared-correlation reliability coefficient, but the difference would be negligible in a practical sense.

Journal ArticleDOI
TL;DR: This paper modify Kim’s algorithm so it can handle missing data, and performs a simulation study to investigate its performance in (relatively) short time series in cases of different kinds of missing data and in case of complete data.
Abstract: Many psychological processes are characterized by recurrent shifts between distinct regimes or states. Examples that are considered in this paper are the switches between different states associated with premenstrual syndrome, hourly fluctuations in affect during a major depressive episode, and shifts between a “hot hand” and a “cold hand” in a top athlete. We model these processes with the regime switching state-space model proposed by Kim (J. Econom. 60:1–22, 1994), which results in both maximum likelihood estimates for the model parameters and estimates of the latent variables and the discrete states of the process. However, the current algorithm cannot handle missing data, which limits its applicability to psychological data. Moreover, the performance of standard errors for the purpose of making inferences about the parameter estimates is yet unknown. In this paper we modify Kim’s algorithm so it can handle missing data and we perform a simulation study to investigate its performance in (relatively) short time series in cases of different kinds of missing data and in case of complete data. Finally, we apply the regime switching state-space model to the three empirical data sets described above.

Journal ArticleDOI
TL;DR: The authors proposed an improved regression calibration approach, a general pseudo maximum likelihood estimation method based on a conveniently decomposed form of the likelihood, which is both consistent and computationally efficient, and produces point estimates and estimated standard errors which are practically identical to those obtained by maximum likelihood.
Abstract: The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration approach, a general pseudo maximum likelihood estimation method based on a conveniently decomposed form of the likelihood. It is both consistent and computationally efficient, and produces point estimates and estimated standard errors which are practically identical to those obtained by maximum likelihood. Simulations suggest that improved regression calibration, which is easy to implement in standard software, works well in a range of situations.

Journal ArticleDOI
TL;DR: A new method for predicting class scores is proposed that, in contrast to posterior probability-based methods, yields consistent estimators of the parameters in the third step and in simulation studies the new methodology exhibited only a minor loss of efficiency.
Abstract: Latent class regression models relate covariates and latent constructs such as psychiatric disorders. Though full maximum likelihood estimation is available, estimation is often in three steps: (i) a latent class model is fitted without covariates; (ii) latent class scores are predicted; and (iii) the scores are regressed on covariates. We propose a new method for predicting class scores that, in contrast to posterior probability-based methods, yields consistent estimators of the parameters in the third step. Additionally, in simulation studies the new methodology exhibited only a minor loss of efficiency. Finally, the new and the posterior probability-based methods are compared in an analysis of mobility/exercise.

Journal ArticleDOI
TL;DR: The proposed method extends the original GSCA by incorporating a multivariate autoregressive model to account for the dynamic nature of data taken over time, and incorporates direct and modulating effects of input variables on specific latent variables and on connections between latent variables, respectively.
Abstract: We propose a new method of structural equation modeling (SEM) for longitudinal and time series data, named Dynamic GSCA (Generalized Structured Component Analysis). The proposed method extends the original GSCA by incorporating a multivariate autoregressive model to account for the dynamic nature of data taken over time. Dynamic GSCA also incorporates direct and modulating effects of input variables on specific latent variables and on connections between latent variables, respectively. An alternating least square (ALS) algorithm is developed for parameter estimation. An improved bootstrap method called a modified moving block bootstrap method is used to assess reliability of parameter estimates, which deals with time dependence between consecutive observations effectively. We analyze synthetic and real data to illustrate the feasibility of the proposed method.

Journal ArticleDOI
TL;DR: In this paper, a functional multiple-set canonical correlation analysis for exploring associations among multiple sets of functions is proposed, which is a special case when only two sets of function are considered.
Abstract: We propose functional multiple-set canonical correlation analysis for exploring associations among multiple sets of functions. The proposed method includes functional canonical correlation analysis as a special case when only two sets of functions are considered. As in classical multiple-set canonical correlation analysis, computationally, the method solves a matrix eigen-analysis problem through the adoption of a basis expansion approach to approximating data and weight functions. We apply the proposed method to functional magnetic resonance imaging (fMRI) data to identify networks of neural activity that are commonly activated across subjects while carrying out a working memory task.

Journal ArticleDOI
TL;DR: In this article, the power of the goodness-of-fit test of a model with no interactions is compared with the importance of the third-order moments in assessing interaction terms of the model.
Abstract: Starting with Kenny and Judd (Psychol. Bull. 96:201–210, 1984) several methods have been introduced for analyzing models with interaction terms. In all these methods more information from the data than just means and covariances is required. In this paper we also use more than just first- and second-order moments; however, we are aiming to adding just a selection of the third-order moments. The key issue in this paper is to develop theoretical results that will allow practitioners to evaluate the strength of different third-order moments in assessing interaction terms of the model. To select the third-order moments, we propose to be guided by the power of the goodness-of-fit test of a model with no interactions, which varies with each selection of third-order moments. A theorem is presented that relates the power of the usual goodness-of-fit test of the model with the power of a moment test for the significance of third-order moments; the latter has the advantage that it can be computed without fitting a model. The main conclusion is that the selection of third-order moments can be based on the power of a moment test, thus assessing the relevance in the analysis of different sets of third-order moments can be computationally simple. The paper gives an illustration of the method and argues for the need of refraining from adding into the analysis an excess of higher-order moments.

Journal ArticleDOI
TL;DR: The SIMCLAS model, being a Hierarchical Classes model for the simultaneous analysis of coupled binary two-way matrices, is presented and it is shown that theSIMCLAS technique recovers the underlying structure of coupled data to a very large extent, and the SIMCLas technique outperforms a Hierarchy Classes technique in which all entries contribute equally to the analysis.
Abstract: In many research domains different pieces of information are collected regarding the same set of objects. Each piece of information constitutes a data block, and all these (coupled) blocks have the object mode in common. When analyzing such data, an important aim is to obtain an overall picture of the structure underlying the whole set of coupled data blocks. A further challenge consists of accounting for the differences in information value that exist between and within (i.e., between the objects of a single block) data blocks. To tackle these issues, analysis techniques may be useful in which all available pieces of information are integrated and in which at the same time noise heterogeneity is taken into account. For the case of binary coupled data, however, only methods exist that go for a simultaneous analysis of all data blocks but that do not account for noise heterogeneity. Therefore, in this paper, the SIMCLAS model, being a Hierarchical Classes model for the simultaneous analysis of coupled binary two-way matrices, is presented. In this model, noise heterogeneity between and within the data blocks is accounted for by downweighting entries from noisy blocks/objects within a block. In a simulation study it is shown that (1) the SIMCLAS technique recovers the underlying structure of coupled data to a very large extent, and (2) the SIMCLAS technique outperforms a Hierarchical Classes technique in which all entries contribute equally to the analysis (i.e., noise homogeneity within and between blocks). The latter is also demonstrated in an application of both techniques to empirical data on categorization of semantic concepts.

Journal ArticleDOI
TL;DR: In this article, the authors extend the results of Hooker et al. by considering a generalized class of IRT models, and giving a weaker sufficient condition for the occurrence of the paradox with relations to an important concept of statistical association.
Abstract: Maximum likelihood and Bayesian ability estimation in multidimensional item response models can lead to paradoxical results as proven by Hooker, Finkelman, and Schwartzman (Psychometrika 74(3): 419–442, 2009): Changing a correct response on one item into an incorrect response may produce a higher ability estimate in one dimension. Furthermore, the conditions under which this paradox arises are very general, and may in fact be fulfilled by many of the multidimensional scales currently in use. This paper tries to emphasize and extend the generality of the results of Hooker et al. by (1) considering the paradox in a generalized class of IRT models, (2) giving a weaker sufficient condition for the occurrence of the paradox with relations to an important concept of statistical association, and by (3) providing some additional specific results for linearly compensatory models with special emphasis on the factor analysis model.

Journal ArticleDOI
TL;DR: In this paper, the authors focus on the distribution of these parameter estimates and suggest modelling a potential heterogeneous or cluster structure by a mixture of specifically parameterised normal densities. But they focus instead on finding sub-groups or components in the data representing different diagnostic accuracies.
Abstract: Meta-analysis of diagnostic studies experience the common problem that different studies might not be comparable since they have been using a different cut-off value for the continuous or ordered categorical diagnostic test value defining different regions for which the diagnostic test is defined to be positive. Hence specificities and sensitivities arising from different studies might vary just because the underlying cut-off value had been different. To cope with the cut-off value problem interest is usually directed towards the receiver operating characteristic (ROC) curve which consists of pairs of sensitivities and false-positive rates (1-specificity). In the context of meta-analysis one pair represents one study and the associated diagram is called an SROC curve where the S stands for “summary”. In meta-analysis of diagnostic studies emphasis has traditionally been placed on modelling this SROC curve with the intention of providing a summary measure of the diagnostic accuracy by means of an estimate of the summary ROC curve. Here, we focus instead on finding sub-groups or components in the data representing different diagnostic accuracies. The paper will consider modelling SROC curves with the Lehmann family which is characterised by one parameter only. Each single study can be represented by a specific value of that parameter. Hence we focus on the distribution of these parameter estimates and suggest modelling a potential heterogeneous or cluster structure by a mixture of specifically parameterised normal densities. We point out that this mixture is completely nonparametric and the associated mixture likelihood is welldefined and globally bounded. We use the theory and algorithms of nonparametric mixture likelihood estimation to identify a potential cluster structure in the diagnostic accuracies of the collection of studies to be analysed. Several meta-analytic applications on diagnostic studies, including AUDIT and AUDIT-C for detection of unhealthy alcohol use, the mini-mental state examination for cognitive disorders, as well as diagnostic accuracy inspection data on metal fatigue of aircraft spare parts, are discussed to illustrate the methodology.

Journal ArticleDOI
TL;DR: A multidimensional extension of a nested logit item response model is illustrated that can be used to evaluate distinctions and also defines a new framework for incorporating collateral information from distractor selection when differences exist.
Abstract: Nested logit models have been presented as an alternative to multinomial logistic models for multiple-choice test items (Suh and Bolt in Psychometrika 75:454–473, 2010) and possess a mathematical structure that naturally lends itself to evaluating the incremental information provided by attending to distractor selection in scoring. One potential concern in attending to distractors is the possibility that distractor selection reflects a different trait/ability than that underlying the correct response. This paper illustrates a multidimensional extension of a nested logit item response model that can be used to evaluate such distinctions and also defines a new framework for incorporating collateral information from distractor selection when differences exist. The approach is demonstrated in application to questions faced by a university testing center over whether to incorporate distractor selection into the scoring of its multiple-choice tests. Several empirical examples are presented.

Journal ArticleDOI
TL;DR: In this paper, a three-way formulation of the p-median problem explicitly considers heterogeneity by identifying groups of individual respondents that perceive similar category structures, and three proposed heuristics for the heterogeneous pmedian (HPM) are developed and then illustrated in a consumer psychology context using a sample of undergraduate students who performed a sorting task of major U.S. retailers, as well as a through Monte Carlo analysis.
Abstract: The p-median offers an alternative to centroid-based clustering algorithms for identifying unobserved categories. However, existing p-median formulations typically require data aggregation into a single proximity matrix, resulting in masked respondent heterogeneity. A proposed three-way formulation of the p-median problem explicitly considers heterogeneity by identifying groups of individual respondents that perceive similar category structures. Three proposed heuristics for the heterogeneous p-median (HPM) are developed and then illustrated in a consumer psychology context using a sample of undergraduate students who performed a sorting task of major U.S. retailers, as well as a through Monte Carlo analysis.

Journal ArticleDOI
TL;DR: In this article, a multinormal partial credit model for factor analysis of polytomously scored items with ordered response categories is derived using an extension of the Dutch identity, where latent variables are assumed to have a multivariate normal distribution conditional on unweighted sums of item scores.
Abstract: A multinormal partial credit model for factor analysis of polytomously scored items with ordered response categories is derived using an extension of the Dutch Identity (Holland in Psychometrika 55:5–18, 1990). In the model, latent variables are assumed to have a multivariate normal distribution conditional on unweighted sums of item scores, which are sufficient statistics. Attention is paid to maximum likelihood estimation of item parameters, multivariate moments of latent variables, and person parameters. It is shown that the maximum likelihood estimates can be found without the use of numerical integration techniques. More general models are discussed which can be used for testing the model, and it is shown how models with different numbers of latent variables can be tested against each other. In addition, multi-group extensions are proposed, which can be used for testing both measurement invariance and latent population differences. Models and procedures discussed are demonstrated in an empirical data example.

Journal ArticleDOI
TL;DR: In this paper, the conditional power of three non-parametric tests, the randomization t test, the Wilcoxon-Mann-Whitney (WMW) test, and the parametric t test were compared.
Abstract: Randomization tests are often recommended when parametric assumptions may be violated because they require no distributional or random sampling assumptions in order to be valid. In addition to being exact, a randomization test may also be more powerful than its parametric counterpart. This was demonstrated in a simulation study which examined the conditional power of three nondirectional tests: the randomization t test, the Wilcoxon–Mann–Whitney (WMW) test, and the parametric t test. When the treatment effect was skewed, with degree of skewness correlated with the size of the effect, the randomization t test was systematically more powerful than the parametric t test. The relative power of the WMW test under the skewed treatment effect condition depended on the sample size ratio.

Journal ArticleDOI
TL;DR: In this article, it is shown that the conditions for the oblique factor correlation structure need to be amended for global rotational uniqueness, and hence that the condition sets are not equivalent in terms of unicity of the solution.
Abstract: In an addendum to his seminal 1969 article Joreskog stated two sets of conditions for rotational identification of the oblique factor solution under utilization of fixed zero elements in the factor loadings matrix (Joreskog in Advances in factor analysis and structural equation models, pp. 40–43, 1979). These condition sets, formulated under factor correlation and factor covariance metrics, respectively, were claimed to be equivalent and to lead to global rotational uniqueness of the factor solution. It is shown here that the conditions for the oblique factor correlation structure need to be amended for global rotational uniqueness, and, hence, that the condition sets are not equivalent in terms of unicity of the solution.

Journal ArticleDOI
TL;DR: This work investigates two relevant issues: dimensionality of the latent structure and discriminating power of the items composing the questionnaire, based on a multidimensional item response theory model, which assumes a two-parameter logistic parameterization for the response probabilities.
Abstract: With reference to a questionnaire aimed at assessing the performance of Italian nursing homes on the basis of the health conditions of their patients, we investigate two relevant issues: dimensionality of the latent structure and discriminating power of the items composing the questionnaire. The approach is based on a multidimensional item response theory model, which assumes a two-parameter logistic parameterization for the response probabilities. This model represents the health status of a patient by latent variables having a discrete distribution and, therefore, it may be seen as a constrained version of the latent class model. On the basis of the adopted model, we implement a hierarchical clustering algorithm aimed at assessing the actual number of dimensions measured by the questionnaire. These dimensions correspond to disjoint groups of items. Once the number of dimensions is selected, we also study the discriminating power of every item, so that it is possible to select the subset of these items which is able to provide an amount of information close to that of the full set. We illustrate the proposed approach on the basis of the data collected on 1,051 elderly people hosted in a sample of Italian nursing homes.

Journal ArticleDOI
TL;DR: OLS is not capable of testing hypotheses about group differences in latent intercepts and slopes, and a theorem is presented which shows that researchers should not employ hierarchical regression to assess intercept differences with selected samples.
Abstract: The study of prediction bias is important and the last five decades include research studies that examined whether test scores differentially predict academic or employment performance. Previous studies used ordinary least squares (OLS) to assess whether groups differ in intercepts and slopes. This study shows that OLS yields inaccurate inferences for prediction bias hypotheses. This paper builds upon the criterion-predictor factor model by demonstrating the effect of selection, measurement error, and measurement bias on prediction bias studies that use OLS. The range restricted, criterion-predictor factor model is used to compute Type I error and power rates associated with using regression to assess prediction bias hypotheses. In short, OLS is not capable of testing hypotheses about group differences in latent intercepts and slopes. Additionally, a theorem is presented which shows that researchers should not employ hierarchical regression to assess intercept differences with selected samples.