Showing papers in "Educational and Psychological Measurement in 2019"

PDF

Open Access

Journal Article•DOI•

Understanding the Model Size Effect on SEM Fit Indices

[...]

Dexin Shi¹, Taehun Lee², Alberto Maydeu-Olivares¹•Institutions (2)

University of South Carolina¹, Chung-Ang University²

01 Apr 2019-Educational and Psychological Measurement

TL;DR: The results showed that the effect of p on the population CFI and TLI depended on thetype of specification error, whereas a higher p was associated with lower values of the population RMSEA regardless of the type of model misspecification.

...read moreread less

Abstract: This study investigated the effect the number of observed variables (p) has on three structural equation modeling indices: the comparative fit index (CFI), the Tucker-Lewis index (TLI), and the root mean square error of approximation (RMSEA). The behaviors of the population fit indices and their sample estimates were compared under various conditions created by manipulating the number of observed variables, the types of model misspecification, the sample size, and the magnitude of factor loadings. The results showed that the effect of p on the population CFI and TLI depended on the type of specification error, whereas a higher p was associated with lower values of the population RMSEA regardless of the type of model misspecification. In finite samples, all three fit indices tended to yield estimates that suggested a worse fit than their population counterparts, which was more pronounced with a smaller sample size, higher p, and lower factor loading.

...read moreread less

323 citations

Journal Article•DOI•

Evaluation of Variance Inflation Factors in Regression Models Using Latent Variable Modeling Methods

[...]

Katerina M. Marcoulides¹, Tenko Raykov²•Institutions (2)

University of Florida¹, Michigan State University²

01 Oct 2019-Educational and Psychological Measurement

TL;DR: A procedure that can be used to evaluate the variance inflation factors and tolerance indices in linear regression models is discussed, which allows more informed evaluation of these quantities when addressing multicollinearity-related issues in empirical research using regression models.

...read moreread less

Abstract: A procedure that can be used to evaluate the variance inflation factors and tolerance indices in linear regression models is discussed. The method permits both point and interval estimation of these factors and indices associated with explanatory variables considered for inclusion in a regression model. The approach makes use of popular latent variable modeling software to obtain these point and interval estimates. The procedure allows more informed evaluation of these quantities when addressing multicollinearity-related issues in empirical research using regression models. The method is illustrated on an empirical example using the popular software Mplus. Results of a simulation study investigating the capabilities of the procedure are also presented.

...read moreread less

124 citations

Journal Article•DOI•

Thanks Coefficient Alpha, We Still Need You!.

[...]

Tenko Raykov¹, George A. Marcoulides²•Institutions (2)

Michigan State University¹, University of California, Santa Barbara²

01 Feb 2019-Educational and Psychological Measurement

TL;DR: The article highlights the fact that as an index aimed at informing about multiple-component measuring instrument reliability, coefficient alpha is dependable then as a reliability estimator, and should remain in service when these conditions are fulfilled and not be abandoned.

...read moreread less

Abstract: This note discusses the merits of coefficient alpha and their conditions in light of recent critical publications that miss out on significant research findings over the past several decades. That earlier research has demonstrated the empirical relevance and utility of coefficient alpha under certain empirical circumstances. The article highlights the fact that as an index aimed at informing about multiple-component measuring instrument reliability, coefficient alpha is dependable then as a reliability estimator. Therefore, alpha should remain in service when these conditions are fulfilled and not be abandoned.

...read moreread less

111 citations

Journal Article•DOI•

Kappa Coefficients for Missing Data

[...]

Alexandra de Raadt¹, Matthijs J. Warrens¹, Roel Bosker¹, Henk A.L. Kiers¹•Institutions (1)

University of Groningen¹

16 Jan 2019-Educational and Psychological Measurement

TL;DR: Three variants of Cohen’s kappa that can handle missing data are presented and it is recommended to use the kappa coefficient that is based on listwise deletion of missing ratings if it can be assumed that missingness is completely at random or not at random.

...read moreread less

Abstract: Cohen's kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen's kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data under two missing data mechanisms-namely, missingness completely at random and a form of missingness not at random. The kappa coefficient considered in Gwet (Handbook of Inter-rater Reliability, 4th ed.) and the kappa coefficient based on listwise deletion of units with missing ratings were found to have virtually no bias and mean squared error if missingness is completely at random, and small bias and mean squared error if missingness is not at random. Furthermore, the kappa coefficient that treats missing ratings as a regular category appears to be rather heavily biased and has a substantial mean squared error in many of the simulations. Because it performs well and is easy to compute, we recommend to use the kappa coefficient that is based on listwise deletion of missing ratings if it can be assumed that missingness is completely at random or not at random.

...read moreread less

53 citations

Journal Article•DOI•

From a Sampling Precision Perspective, Skewness Is a Friend and Not an Enemy!

[...]

David Trafimow¹, Tonghui Wang¹, Cong Wang¹•Institutions (1)

New Mexico State University¹

01 Feb 2019-Educational and Psychological Measurement

TL;DR: This research provides the necessary equations and shows how skewness can increase the precision with which locations of distributions can be estimated, and contrasts with a typical argument in favor of performing transformations to normalize skewed data for the sake of performing more efficient significance tests.

...read moreread less

Abstract: Two recent publications in Educational and Psychological Measurement advocated that researchers consider using the a priori procedure. According to this procedure, the researcher specifies, prior to data collection, how close she wishes her sample mean(s) to be to the corresponding population mean(s), and the desired probability of being that close. A priori equations provide the necessary sample size to meet specifications under the normal distribution. Or, if sample size is taken as given, a priori equations provide the precision with which estimates of distribution means can be made. However, there is currently no way to perform these calculations under the more general family of skew-normal distributions. The present research provides the necessary equations. In addition, we show how skewness can increase the precision with which locations of distributions can be estimated. This conclusion, based on the perspective of improving sampling precision, contrasts with a typical argument in favor of performi...

...read moreread less

42 citations

Journal Article•DOI•

On the Statistical and Practical Limitations of Thurstonian IRT Models.

[...]

Paul-Christian Bürkner¹, Niklas Schulte¹, Heinz Holling¹•Institutions (1)

University of Münster¹

22 Feb 2019-Educational and Psychological Measurement

TL;DR: It is demonstrated that trait scores based on only equally keyed blocks can be improved substantially by measuring a sizable number of traits, and it is concluded that Thurstonian IRT models should only be applied in high-stakes situations where persons are motivated to give fake answers.

...read moreread less

Abstract: Forced-choice questionnaires have been proposed to avoid common response biases typically associated with rating scale questionnaires. To overcome ipsativity issues of trait scores obtained from classical scoring approaches of forced-choice items, advanced methods from item response theory (IRT) such as the Thurstonian IRT model have been proposed. For convenient model specification, we introduce the thurstonianIRT R package, which uses Mplus, lavaan, and Stan for model estimation. Based on practical considerations, we establish that items within one block need to be equally keyed to achieve similar social desirability, which is essential for creating forced-choice questionnaires that have the potential to resist faking intentions. According to extensive simulations, measuring up to five traits using blocks of only equally keyed items does not yield sufficiently accurate trait scores and inter-trait correlation estimates, neither for frequentist nor for Bayesian estimation methods. As a result, persons' trait scores remain partially ipsative and, thus, do not allow for valid comparisons between persons. However, we demonstrate that trait scores based on only equally keyed blocks can be improved substantially by measuring a sizable number of traits. More specifically, in our simulations of 30 traits, scores based on only equally keyed blocks were non-ipsative and highly accurate. We conclude that in high-stakes situations where persons are motivated to give fake answers, Thurstonian IRT models should only be applied to tests measuring a sizable number of traits.

...read moreread less

38 citations

Journal Article•DOI•

Detecting Examinees With Item Preknowledge in Large-Scale Testing Using Extreme Gradient Boosting (XGBoost):

[...]

Cengiz Zopluoglu¹•Institutions (1)

University of Miami¹

02 Apr 2019-Educational and Psychological Measurement

TL;DR: The utility of XGBoost in detecting examinees with potential item preknowledge is investigated using a real data set that includes examinees who engaged in fraudulent testing behavior, such as illegally obtaining live test content before the exam.

...read moreread less

Abstract: Researchers frequently use machine-learning methods in many fields. In the area of detecting fraud in testing, there have been relatively few studies that have used these methods to identify potent...

...read moreread less

27 citations

Journal Article•DOI•

The effects of sample size on the estimation of regression mixture models

[...]

Thomas Jaki¹, Minjung Kim², Andrea Lamont³, Melissa W. George⁴, Chi Chang⁵, Daniel J. Feaster⁶, M. Lee Van Horn⁷ - Show less +3 more•Institutions (7)

Lancaster University¹, Ohio State University², University of South Carolina³, Colorado State University⁴, Michigan State University⁵, University of Miami⁶, University of New Mexico⁷

01 Apr 2019-Educational and Psychological Measurement

TL;DR: The results suggest that when class separation is low, very large sample sizes may be needed to obtain stable results and it may often be necessary to consider a preponderance of evidence in latent class enumeration.

...read moreread less

Abstract: Regression mixture models are a statistical approach used for estimating heterogeneity in effects. This study investigates the impact of sample size on regression mixture’s ability to produce “stab...

...read moreread less

25 citations

Journal Article•DOI•

Negative Binomial Models for Visual Fixation Counts on Test Items.

[...]

Kaiwen Man¹, Jeffrey R. Harring¹•Institutions (1)

University of Maryland, College Park¹

29 Jan 2019-Educational and Psychological Measurement

TL;DR: Eye-tracking biometric indicators, an essential eye-tracking indicator, is modeled to reflect the degree of test engagement when a test taker solves a set of test questions.

...read moreread less

Abstract: With the development of technology-enhanced learning platforms, eye-tracking biometric indicators can be recorded simultaneously with students item responses. In the current study, visual fixation, an essential eye-tracking indicator, is modeled to reflect the degree of test engagement when a test taker solves a set of test questions. Three negative binomial regression models are proposed for modeling visual fixation counts of test takers solving a set of items. These models follow a similar structure to the lognormal response time model and the two-parameter logistic item response model. The proposed modeling structures include individualized latent person parameters reflecting the level of engagement of each test taker and two item parameters indicating the visual attention intensity and discriminating power of each test item. A Markov chain Monte Carlo estimation method is implemented for parameter estimation. Real data are fitted to the three proposed models, and the results are discussed.

...read moreread less

21 citations

Journal Article•DOI•

New Developments in Factor Score Regression: Fit Indices and a Model Comparison Test:

[...]

Ines Devlieger¹, Wouter Talloen¹, Yves Rosseel¹•Institutions (1)

Ghent University¹

03 May 2019-Educational and Psychological Measurement

TL;DR: Fit indices for FSR are proposed that can be used to inspect the model fit and be used for inference of the estimators on the regression coefficients and a model comparison test is introduced based on one of these newly proposed fit indices.

...read moreread less

Abstract: Factor score regression (FSR) is a popular alternative for structural equation modeling. Naively applying FSR induces bias for the estimators of the regression coefficients. Croon proposed a method...

...read moreread less

20 citations

Journal Article•DOI•

Item Response Tree Models to Investigate Acquiescence and Extreme Response Styles in Likert-Type Rating Scales.

[...]

Minjeong Park¹, Amery D. Wu¹•Institutions (1)

University of British Columbia¹

15 Feb 2019-Educational and Psychological Measurement

TL;DR: Two types of IRTree models were introduced, descriptive and explanatory models, perceived under a larger modeling framework, called explanatory item response models, proposed by De Boeck and Wilson, and suggested the presence of two distinct extreme response styles and acquiescence response style in the scale.

...read moreread less

Abstract: Item response tree (IRTree) models are recently introduced as an approach to modeling response data from Likert-type rating scales. IRTree models are particularly useful to capture a variety of ind...

...read moreread less

Journal Article•DOI•

Using Quantile Regression to Estimate Intervention Effects Beyond the Mean.

[...]

Spyros Konstantopoulos¹, Wei Li², Shazia Miller³, Arie van der Ploeg⁴•Institutions (4)

Michigan State University¹, University of Alabama², University of Chicago³, American Institutes for Research⁴

29 Mar 2019-Educational and Psychological Measurement

TL;DR: The applicability of quantiles regression to empirical work to estimate intervention effects is demonstrated using education data from a large-scale experiment and the estimation of quantile treatment effects at various quantiles in the presence of dropouts is discussed.

...read moreread less

Abstract: This study discusses quantile regression methodology and its usefulness in education and social science research. First, quantile regression is defined and its advantages vis-a-vis vis ordinary least squares regression are illustrated. Second, specific comparisons are made between ordinary least squares and quantile regression methods. Third, the applicability of quantile regression to empirical work to estimate intervention effects is demonstrated using education data from a large-scale experiment. The estimation of quantile treatment effects at various quantiles in the presence of dropouts is also discussed. Quantile regression is especially suitable in examining predictor effects at various locations of the outcome distribution (e.g., lower and upper tails).

...read moreread less

Journal Article•DOI•

Rethinking the Interpretation of Item Discrimination and Factor Loadings.

[...]

Pascal Jordan¹, Martin Spiess¹•Institutions (1)

University of Hamburg¹

06 May 2019-Educational and Psychological Measurement

TL;DR: The counterintuitive way in which the best prediction of a test taker’s latent ability depends on the factor loadings is highlighted, which means practitioners need to shift their focus to an interpretation which incorporates the structure of the model-based latent ability estimate.

...read moreread less

Abstract: Factor loadings and item discrimination parameters play a key role in scale construction. A multitude of heuristics regarding their interpretation are hardwired into practice—for example, neglectin...

...read moreread less

Journal Article•DOI•

Comparison of Single-Response Format and Forced-Choice Format Instruments Using Thurstonian Item Response Theory.

[...]

David Dueber¹, Abigail M. A. Love¹, Michael D. Toland¹, Trisha A. Turner¹•Institutions (1)

University of Kentucky¹

01 Feb 2019-Educational and Psychological Measurement

TL;DR: This study aims to elucidate and illustrate an alternative response format and analytic technique, Thurstonian item response theory (IRT), for analyzing data from surveys using an alternate response format, the forced-choice format.

...read moreread less

Abstract: One of the most cited methodological issues is with the response format, which is traditionally a single-response Likert response format Therefore, our study aims to elucidate and illustrate an alternative response format and analytic technique, Thurstonian item response theory (IRT), for analyzing data from surveys using an alternate response format, the forced-choice format Specifically, we strove to give a thorough introduction of Thurstonian IRT at a more elementary level than previous publications in order to widen the possible audience This article presents analyses and comparison of two versions of a self-report scale, one version using a single-response format and the other using a forced-choice format Drawing from lessons learned from our study and literature, we present a number of recommendations for conducting research using the forced-choice format and Thurstonian IRT, as well as suggested avenues for future research

...read moreread less

Journal Article•DOI•

Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance Assessments

[...]

Stefanie A. Wind¹, Wenjing Guo¹•Institutions (1)

University of Alabama¹

02 Apr 2019-Educational and Psychological Measurement

TL;DR: It is suggested that it is possible to use common numeric and graphical indicators of DRF and rater misfit when raters exhibit both these effects, but that these effects may be difficult to distinguish using only numeric indicators.

...read moreread less

Abstract: Rater effects, or raters’ tendencies to assign ratings to performances that are different from the ratings that the performances warranted, are well documented in rater-mediated assessments across ...

...read moreread less

Journal Article•DOI•

Fitting Large Factor Analysis Models With Ordinal Data

[...]

Christine DiStefano¹, Heather L. McDaniel¹, Liyun Zhang¹, Dexin Shi¹, Zhehan Jiang² - Show less +1 more•Institutions (2)

University of South Carolina¹, University of Alabama²

01 Jun 2019-Educational and Psychological Measurement

TL;DR: When evaluating goodness-of-fit for ordinal CFA with many observed indicators, researchers should be cautious in interpreting the root mean square error of approximation, as this value appeared overly optimistic under misspecified conditions.

...read moreread less

Abstract: A simulation study was conducted to investigate the model size effect when confirmatory factor analysis (CFA) models include many ordinal items. CFA models including between 15 and 120 ordinal items were analyzed with mean- and variance-adjusted weighted least squares to determine how varying sample size, number of ordered categories, and misspecification affect parameter estimates, standard errors of parameter estimates, and selected fit indices. As the number of items increased, the number of admissible solutions and accuracy of parameter estimates improved, even when models were misspecified. Also, standard errors of parameter estimates were closer to empirical standard deviation values as the number of items increased. When evaluating goodness-of-fit for ordinal CFA with many observed indicators, researchers should be cautious in interpreting the root mean square error of approximation, as this value appeared overly optimistic under misspecified conditions.

...read moreread less

Journal Article•DOI•

A Measurement Is a Choice and Stevens' Scales of Measurement Do Not Help Make It: A Response to Chalmers.

[...]

Bruno D. Zumbo¹, Edward Kroc¹•Institutions (1)

University of British Columbia¹

25 Apr 2019-Educational and Psychological Measurement

TL;DR: It is argued that Chalmers’ critique of ordinal α, proposed in Zumbo et al. as a measure of test reliability in certain research settings, is unfounded.

...read moreread less

Abstract: Chalmers recently published a critique of the use of ordinal α proposed in Zumbo et al. as a measure of test reliability in certain research settings. In this response, we take up the task of refut...

...read moreread less

Journal Article•DOI•

Reconsidering Cutoff Points in the General Method of Empirical Q-Matrix Validation:

[...]

Pablo Nájera¹, Miguel A. Sorrel¹, Francisco J. Abad¹•Institutions (1)

Autonomous University of Madrid¹

10 Jan 2019-Educational and Psychological Measurement

TL;DR: This article includes two simulation studies to test this empirical Q-matrix validation method under a wider range of conditions, with the purpose of providing it with a higher generalization, and to empirically determine the most suitable EPS considering the data conditions.

...read moreread less

Abstract: Cognitive diagnosis models (CDMs) are latent class multidimensional statistical models that help classify people accurately by using a set of discrete latent variables, commonly referred to as attr...

...read moreread less

Journal Article•DOI•

On the Added Value of Multiple Factor Score Estimates in Essentially Unidimensional Models.

[...]

Pere J. Ferrando, Urbano Lorenzo-Seva

01 Apr 2019-Educational and Psychological Measurement

TL;DR: The procedures proposed are an FA extension of the “added-value” procedures initially proposed for subscale scores in educational testing, and the basic principle is that the multiple FA solution is defensible when the factor score estimates of the primary factors are better measures of these factors than score estimates derived from a unidimensional or second-order solution.

...read moreread less

Abstract: Measures initially designed to be single-trait often yield data that are compatible with both an essentially unidimensional factor-analysis (FA) solution and a correlated-factors solution. For thes...

...read moreread less

Journal Article•DOI•

Does the Effect of a Time Limit for Testing Impair Structural Investigations by Means of Confirmatory Factor Models

[...]

Karl Schweizer¹, Siegbert Reiß¹, Stefan J. Troche•Institutions (1)

Goethe University Frankfurt¹

01 Feb 2019-Educational and Psychological Measurement

TL;DR: Three simulation studies were conducted to find out whether the effect of a time limit for testing impairs model fit in investigations of structural validity, whether the representation of the assumed source of the effect prevents impairment of model fit and whether it is possible to identify and discriminate this method effect from another method effect.

...read moreread less

Abstract: The article reports three simulation studies conducted to find out whether the effect of a time limit for testing impairs model fit in investigations of structural validity, whether the representat...

...read moreread less

Journal Article•DOI•

When Nonresponse Mechanisms Change: Effects on Trends and Group Comparisons in International Large-Scale Assessments.

[...]

Karoline Sachse¹, Nicole Mahler¹, Steffi Pohl²•Institutions (2)

Humboldt University of Berlin¹, Free University of Berlin²

14 Feb 2019-Educational and Psychological Measurement

TL;DR: The results suggest that the most accurate estimates can be obtained from the application of multiple group models for nonignorable missing values when the amounts of missing data and the missing data mechanisms changed over time.

...read moreread less

Abstract: Mechanisms causing item nonresponses in large-scale assessments are often said to be nonignorable. Parameter estimates can be biased if nonignorable missing data mechanisms are not adequately modeled. In trend analyses, it is plausible for the missing data mechanism and the percentage of missing values to change over time. In this article, we investigated (a) the extent to which the missing data mechanism and the percentage of missing values changed over time in real large-scale assessment data, (b) how different approaches for dealing with missing data performed under such conditions, and (c) the practical implications for trend estimates. These issues are highly relevant because the conclusions hold for all kinds of group mean differences in large-scale assessments. In a reanalysis of PISA (Programme for International Student Assessment) data from 35 OECD countries, we found that missing data mechanisms and numbers of missing values varied considerably across time points, countries, and domains. In a simulation study, we generated data in which we allowed the missing data mechanism and the amount of missing data to change over time. We showed that the trend estimates were biased if differences in the missing-data mechanisms were not taken into account, in our case, when omissions were scored as wrong, when omissions were ignored, or when model-based approaches assuming a constant missing data mechanism over time were used. The results suggest that the most accurate estimates can be obtained from the application of multiple group models for nonignorable missing values when the amounts of missing data and the missing data mechanisms changed over time. In an empirical example, we furthermore showed that the large decline in PISA reading literacy in Ireland in 2009 was reduced when we estimated trends using missing data treatments that accounted for changes in missing data mechanisms.

...read moreread less

Journal Article•DOI•

An External Validity Approach for Assessing Essential Unidimensionality in Correlated-Factor Models.

[...]

Pere J. Ferrando, Urbano Lorenzo-Seva

17 Feb 2019-Educational and Psychological Measurement

TL;DR: This article proposes an external auxiliary procedure in which primary factor scores and general factor scores are related to relevant external variables and is assessed by means of a simulation study and its usefulness is illustrated with a real-data example in the personality domain.

...read moreread less

Abstract: Many psychometric measures yield data that are compatible with (a) an essentially unidimensional factor analysis solution and (b) a correlated-factor solution. Deciding which of these structures is the most appropriate and useful is of considerable importance, and various procedures have been proposed to help in this decision. The only fully developed procedures available to date, however, are internal, and they use only the information contained in the item scores. In contrast, this article proposes an external auxiliary procedure in which primary factor scores and general factor scores are related to relevant external variables. Our proposal consists of two groups of procedures. The procedures in the first group (differential validity procedures) assess the extent to which the primary factor scores relate differentially to the external variables. Procedures in the second group (incremental validity procedures) assess the extent to which the primary factor scores yield predictive validity increments with respect to the single general factor scores. Both groups of procedures are based on a second-order structural model with latent variables from which new methodological results are obtained. The functioning of the proposal is assessed by means of a simulation study, and its usefulness is illustrated with a real-data example in the personality domain.

...read moreread less

Journal Article•DOI•

Categorical Omega with Small Sample Sizes via Bayesian Estimation: An Alternative to Frequentist Estimators.

[...]

Yanyun Yang¹, Yan Xia²•Institutions (2)

Florida State University¹, Arizona State University²

01 Feb 2019-Educational and Psychological Measurement

TL;DR: The Bayes estimator appears to be a promising method for estimating categorical omega under a variety of conditions through manipulating the scale length, number of response categories, distributions of the categorical variable, heterogeneities of thresholds across items, and prior distributions for model parameters.

...read moreread less

Abstract: When item scores are ordered categorical, categorical omega can be computed based on the parameter estimates from a factor analysis model using frequentist estimators such as diagonally weighted least squares. When the sample size is relatively small and thresholds are different across items, using diagonally weighted least squares can yield a substantially biased estimate of categorical omega. In this study, we applied Bayesian estimation methods for computing categorical omega. The simulation study investigated the performance of categorical omega under a variety of conditions through manipulating the scale length, number of response categories, distributions of the categorical variable, heterogeneities of thresholds across items, and prior distributions for model parameters. The Bayes estimator appears to be a promising method for estimating categorical omega. Mplus and SAS codes for computing categorical omega were provided.

...read moreread less

Journal Article•DOI•

On the Connections Between Item Response Theory and Classical Test Theory: A Note on True Score Evaluation for Polytomous Items via Item Response Modeling

[...]

Tenko Raykov¹, Dimiter M. Dimitrov², George A. Marcoulides³, Michael Harrison⁴•Institutions (4)

Michigan State University¹, George Mason University², University of California, Santa Barbara³, University of North Carolina at Chapel Hill⁴

01 Dec 2019-Educational and Psychological Measurement

TL;DR: An item response modeling procedure is discussed that can be used for point and interval estimation of the individual true score on any item in a measuring instrument or item set following the popular and widely applicable graded response model.

...read moreread less

Abstract: This note highlights and illustrates the links between item response theory and classical test theory in the context of polytomous items. An item response modeling procedure is discussed that can be used for point and interval estimation of the individual true score on any item in a measuring instrument or item set following the popular and widely applicable graded response model. The method contributes to the body of research on the relationships between classical test theory and item response theory and is illustrated on empirical data.

...read moreread less

Journal Article•DOI•

Centering in Multiple Regression Does Not Always Reduce Multicollinearity: How to Tell When Your Estimates Will Not Benefit From Centering.

[...]

Oscar L. Olvera Astivia¹, Edward Kroc¹•Institutions (1)

University of British Columbia¹

28 Aug 2019-Educational and Psychological Measurement

TL;DR: Three new findings are presented that suggest the original assumption of expectation-independence among predictors can be expanded to encompass many other joint distributions and that for many jointly distributed random variables, even some that enjoy considerable symmetry, the correlation between the centered main effects and their respective interaction can increase when compared with the correlation of the uncentered effects.

...read moreread less

Abstract: Within the context of moderated multiple regression, mean centering is recommended both to simplify the interpretation of the coefficients and to reduce the problem of multicollinearity. For almost 30 years, theoreticians and applied researchers have advocated for centering as an effective way to reduce the correlation between variables and thus produce more stable estimates of regression coefficients. By reviewing the theory on which this recommendation is based, this article presents three new findings. First, that the original assumption of expectation-independence among predictors on which this recommendation is based can be expanded to encompass many other joint distributions. Second, that for many jointly distributed random variables, even some that enjoy considerable symmetry, the correlation between the centered main effects and their respective interaction can increase when compared with the correlation of the uncentered effects. Third, that the higher order moments of the joint distribution play as much of a role as lower order moments such that the symmetry of lower dimensional marginals is a necessary but not sufficient condition for a decrease in correlation between centered main effects and their interaction. Theoretical and simulation results are presented to help conceptualize the issues.

...read moreread less

Journal Article•DOI•

Item Selection Criteria With Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing

[...]

Chuan Ju Lin¹, Hua Hua Chang²•Institutions (2)

National University of Tainan¹, University of Illinois at Urbana–Champaign²

01 Apr 2019-Educational and Psychological Measurement

TL;DR: This research first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP_SWDDDI) by casting the SWDG DI in a progressive algorithm.

...read moreread less

Abstract: For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications.

...read moreread less

Journal Article•DOI•

Comparing the Robustness of Stepwise Mixture Modeling With Continuous Nonnormal Distal Outcomes.

[...]

Myungho Shin, Unkyung No, Sehee Hong¹•Institutions (1)

Korea University¹

12 Apr 2019-Educational and Psychological Measurement

TL;DR: This study has implications for researchers looking to apply recommended latent class analysis mixture modeling approaches in that nonnormality, which has been not fully considered in previous studies, was taken into account to address the distributional form of distal outcomes.

...read moreread less

Abstract: The present study aims to compare the robustness under various conditions of latent class analysis mixture modeling approaches that deal with auxiliary distal outcomes. Monte Carlo simulations were employed to test the performance of four approaches recommended by previous simulation studies: maximum likelihood (ML) assuming homoskedasticity (ML_E), ML assuming heteroskedasticity (ML_U), BCH, and LTB. For all investigated simulation conditions, the BCH approach yielded the most unbiased estimates of class-specific distal outcome means. This study has implications for researchers looking to apply recommended latent class analysis mixture modeling approaches in that nonnormality, which has been not fully considered in previous studies, was taken into account to address the distributional form of distal outcomes.

...read moreread less

Journal Article•DOI•

A Short Note on Obtaining Point Estimates of the IRT Ability Parameter With MCMC Estimation in Mplus: How Many Plausible Values Are Needed?:

[...]

Yong Luo, Dimiter M. Dimitrov¹•Institutions (1)

George Mason University¹

01 Apr 2019-Educational and Psychological Measurement

TL;DR: Results indicate that 20 is the minimum number of plausible values required to obtain point estimates of the IRT ability parameter that are comparable to marginal maximum likelihood estimation (MMLE)/expected a posteriori (EAP) estimates.

...read moreread less

Abstract: Plausible values can be used to either estimate population-level statistics or compute point estimates of latent variables. While it is well known that five plausible values are usually sufficient ...

...read moreread less

Journal Article•DOI•

Proportion of Indicator Common Variance Due to a Factor as an Effect Size Statistic in Revised Parallel Analysis.

[...]

Yan Xia¹, Samuel B. Green¹, Yuning Xu¹, Marilyn S. Thompson¹•Institutions (1)

Arizona State University¹

01 Feb 2019-Educational and Psychological Measurement

TL;DR: This article assessed the psychometric qualities of three PCV statistics that can be used in conjunction with principal axis factor analysis: the standard PCV statistic and two modifications of it and concluded that practitioners can gain additional information from π ^ SMC : k ′ + Λ ^ and make more nuanced decision about the number of factors when R-PA fails to retain the correct number of Factors.

...read moreread less

Abstract: Past research suggests revised parallel analysis (R-PA) tends to yield relatively accurate results in determining the number of factors in exploratory factor analysis. R-PA can be interpreted as a ...

...read moreread less

Journal Article•DOI•

Exploring the Impersonal Judgments and Personal Preferences of Raters in Rater-Mediated Assessments With Unfolding Models.

[...]

Jue Wang¹, George Engelhard²•Institutions (2)

University of Miami¹, University of Georgia²

05 Feb 2019-Educational and Psychological Measurement

TL;DR: The results suggest that unfolding models offer a useful way to evaluate rater-mediated assessments in order to initially explore the judgmental processes underlying the ratings.

...read moreread less

Abstract: The purpose of this study is to explore the use of unfolding models for evaluating the quality of ratings obtained in rater-mediated assessments. Two different judgmental processes can be used to conceptualize ratings: impersonal judgments and personal preferences. Impersonal judgments are typically expected in rater-mediated assessments, and these ratings reflect a cumulative response process. However, raters may also be influenced by their personal preferences in providing ratings, and these ratings may reflect a noncumulative or unfolding response process. The goal of rater training in rater-mediated assessments is to stress impersonal judgments represented by scoring rubrics and to minimize the personal preferences that may represent construct-irrelevant variance in the assessment system. In this study, we explore the use of unfolding models as a framework for evaluating the quality of ratings in rater-mediated assessments. Data from a large-scale assessment of writing in the United States are used to illustrate our approach. The results suggest that unfolding models offer a useful way to evaluate rater-mediated assessments in order to initially explore the judgmental processes underlying the ratings. The data also indicate that there are significant relationships between some essay features (e.g., word count, syntactic simplicity, word concreteness, and verb cohesion) and essay orderings based on the personal preferences of raters. The implications of unfolding models for theory and practice in rater-mediated assessments are discussed.

...read moreread less