Showing papers in "Educational and Psychological Measurement in 2021"

PDF

Open Access

Journal Article•DOI•

Using the Standardized Root Mean Squared Residual (SRMR) to Assess Exact Fit in Structural Equation Models

[...]

Goran Pavlov¹, Goran Pavlov², Alberto Maydeu-Olivares¹, Alberto Maydeu-Olivares², Dexin Shi² - Show less +1 more•Institutions (2)

University of Barcelona¹, University of South Carolina²

01 Feb 2021-Educational and Psychological Measurement

TL;DR: In this article, the authors examined the accuracy of p values obtained using the asymptotic mean and variance (MV) correction to the distribution of the sample standardized root mean squared residual (SRMR) proposed by Maydeu-Olivares to assess the exact fit of SEM models.

...read moreread less

Abstract: We examine the accuracy of p values obtained using the asymptotic mean and variance (MV) correction to the distribution of the sample standardized root mean squared residual (SRMR) proposed by Maydeu-Olivares to assess the exact fit of SEM models. In a simulation study, we found that under normality, the MV-corrected SRMR statistic provides reasonably accurate Type I errors even in small samples and for large models, clearly outperforming the current standard, that is, the likelihood ratio (LR) test. When data shows excess kurtosis, MV-corrected SRMR p values are only accurate in small models (p = 10), or in medium-sized models (p = 30) if no skewness is present and sample sizes are at least 500. Overall, when data are not normal, the MV-corrected LR test seems to outperform the MV-corrected SRMR. We elaborate on these findings by showing that the asymptotic approximation to the mean of the SRMR sampling distribution is quite accurate, while the asymptotic approximation to the standard deviation is not.

...read moreread less

43 citations

Journal Article•DOI•

The Poor Fit of Model Fit for Selecting Number of Factors in Exploratory Factor Analysis for Scale Evaluation

[...]

Amanda K. Montoya¹, Michael C. Edwards²•Institutions (2)

University of California, Los Angeles¹, Arizona State University²

01 Jun 2021-Educational and Psychological Measurement

TL;DR: All fit indices, except SRMR, are overly sensitive to correlated residuals and nonspecific error, resulting in solutions that are overfactored, and in general, this research does not recommend using model fit indices to select number of factors in a scale evaluation framework.

...read moreread less

Abstract: Model fit indices are being increasingly recommended and used to select the number of factors in an exploratory factor analysis. Growing evidence suggests that the recommended cutoff values for common model fit indices are not appropriate for use in an exploratory factor analysis context. A particularly prominent problem in scale evaluation is the ubiquity of correlated residuals and imperfect model specification. Our research focuses on a scale evaluation context and the performance of four standard model fit indices: root mean square error of approximate (RMSEA), standardized root mean square residual (SRMR), comparative fit index (CFI), and Tucker-Lewis index (TLI), and two equivalence test-based model fit indices: RMSEAt and CFIt. We use Monte Carlo simulation to generate and analyze data based on a substantive example using the positive and negative affective schedule (N = 1,000). We systematically vary the number and magnitude of correlated residuals as well as nonspecific misspecification, to evaluate the impact on model fit indices in fitting a two-factor exploratory factor analysis. Our results show that all fit indices, except SRMR, are overly sensitive to correlated residuals and nonspecific error, resulting in solutions that are overfactored. SRMR performed well, consistently selecting the correct number of factors; however, previous research suggests it does not perform well with categorical data. In general, we do not recommend using model fit indices to select number of factors in a scale evaluation framework.

...read moreread less

36 citations

Journal Article•DOI•

Sample Size Requirements for Simple and Complex Mediation Models

[...]

Mikyung Sim¹, Su Young Kim, Youngsuk Suh²•Institutions (2)

Ewha Womans University¹, Korean Educational Development Institute²

19 Apr 2021-Educational and Psychological Measurement

TL;DR: The results not only present practical and general guidelines for substantive researchers to determine minimum required sample sizes but also improve understanding of which factors are related to sample size requirements in mediation models.

...read moreread less

Abstract: Mediation models have been widely used in many disciplines to better understand the underlying processes between independent and dependent variables. Despite their popularity and importance, the ap...

...read moreread less

33 citations

Journal Article•DOI•

Assessing Preknowledge Cheating via Innovative Measures: A Multiple-Group Analysis of Jointly Modeling Item Responses, Response Times, and Visual Fixation Counts.

[...]

Kaiwen Man¹, Jeffrey R. Harring²•Institutions (2)

University of Alabama¹, University of Maryland, College Park²

01 Jun 2021-Educational and Psychological Measurement

TL;DR: This study demonstrates the application of a new method for multiple-group analysis that concurrently models item responses, response times, and visual fixation counts collected from an eye-tracker.

...read moreread less

Abstract: Many approaches have been proposed to jointly analyze item responses and response times to understand behavioral differences between normally and aberrantly behaved test-takers. Biometric informati...

...read moreread less

19 citations

Journal Article•DOI•

Design of Paper-Based Visual Analogue Scale Items:

[...]

Klemens Weigl¹, Klemens Weigl², Thomas Forstner³•Institutions (3)

University of Marburg¹, Catholic University of Eichstätt-Ingolstadt², Johannes Kepler University of Linz³

01 Jun 2021-Educational and Psychological Measurement

TL;DR: Results revealed that the respondents preferred a paper-based VAS item with a horizontal, 8-cm long, 3 DTP (“desktop publishing point”) wide, black line, with flat line endpoints, and the ascending numerical anchors “0” and “10”, both for women and men.

...read moreread less

Abstract: Paper-based visual analogue scale (VAS) items were developed 100 years ago. Although they gained great popularity in clinical and medical research for assessing pain, they have been scarcely applied in other areas of psychological research for several decades. However, since the beginning of digitization, VAS have attracted growing interest among researchers for carrying out computerized and paper-based data assessments. In the present study, we investigated the research question "Which different design characteristics of paper-based VAS items are preferred by women and men?" Based on a sample of 115 participants (68 female), our results revealed that the respondents preferred a paper-based VAS item with a horizontal, 8-cm long, 3 DTP ("desktop publishing point") wide, black line, with flat line endpoints, and the ascending numerical anchors "0" and "10", both for women and men. Although we did not identify any gender difference in these characteristics, our findings uncovered clear preferences on how to design paper-based VAS items.

...read moreread less

16 citations

Journal Article•DOI•

Parameter Estimation Accuracy of the Effort-Moderated Item Response Theory Model Under Multiple Assumption Violations:

[...]

Joseph A. Rios¹, James Soland²•Institutions (2)

University of Minnesota¹, University of Virginia²

01 Jun 2021-Educational and Psychological Measurement

TL;DR: Simulation results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.

...read moreread less

Abstract: As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as m...

...read moreread less

15 citations

Journal Article•DOI•

Detecting Careless Responding in Survey Data Using Stochastic Gradient Boosting

[...]

Ulrich Schroeders¹, Christoph Schmidt, Timo Gnambs²•Institutions (2)

University of Kassel¹, Leibniz Association²

19 Apr 2021-Educational and Psychological Measurement

TL;DR: Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements as discussed by the authors.

...read moreread less

Abstract: Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements. Di...

...read moreread less

15 citations

Journal Article•DOI•

A Multilevel Mixture IRT Framework for Modeling Response Times as Predictors or Indicators of Response Engagement in IRT Models

[...]

Gabriel Nagy¹, Esther Ulitzsch¹•Institutions (1)

Leibniz Association¹

13 Sep 2021-Educational and Psychological Measurement

TL;DR: The authors identified disengaged item responses pose a threat to the validity of the results provided by large-scale assessments and proposed several procedures for identifying disengaged responses on the basis of observed response.

...read moreread less

Abstract: Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response...

...read moreread less

15 citations

Journal Article•DOI•

Improvement of Norm Score Quality via Regression-Based Continuous Norming:

[...]

Wolfgang Lenhard¹, Alexandra Lenhard•Institutions (1)

University of Würzburg¹

01 Apr 2021-Educational and Psychological Measurement

TL;DR: In this paper, the authors compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for test results for psychometric test results and found that the interpretation of test results is usually based on norm scores.

...read moreread less

Abstract: The interpretation of psychometric test results is usually based on norm scores. We compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for tes...

...read moreread less

14 citations

Journal Article•DOI•

Disentangling Item and Testing Effects in Inoculation Research on Online Misinformation: Solomon Revisited:

[...]

Jon Roozenbeek¹, Rakoen Maertens¹, William P. McClanahan¹, Sander van der Linden¹•Institutions (1)

University of Cambridge¹

01 Apr 2021-Educational and Psychological Measurement

TL;DR: This paper explored the theory of psychological inoculation: If people are preemptively exposed to a weakened version of a virus, they will be more likely to be exposed to the virus in the future.

...read moreread less

Abstract: Online misinformation is a pervasive global problem. In response, psychologists have recently explored the theory of psychological inoculation: If people are preemptively exposed to a weakened vers...

...read moreread less

13 citations

Journal Article•DOI•

Assessing the Accuracy of Parameter Estimates in the Presence of Rapid Guessing Misclassifications

[...]

Joseph A. Rios

21 Apr 2021-Educational and Psychological Measurement

TL;DR: In this paper, the presence of rapid guessing presents a challenge to practitioners in obtaining accurate estimates of measurement properties and examinee ability, and in response to this concern, researchers have proposed a method to deal with this problem.

...read moreread less

Abstract: The presence of rapid guessing (RG) presents a challenge to practitioners in obtaining accurate estimates of measurement properties and examinee ability. In response to this concern, researchers ha...

...read moreread less

Journal Article•DOI•

Is Differential Noneffortful Responding Associated with Type I Error in Measurement Invariance Testing

[...]

Joseph A. Rios¹•Institutions (1)

University of Minnesota¹

12 Feb 2021-Educational and Psychological Measurement

TL;DR: It is suggested that test users should evaluate and document potential differential NER prior to both conducting measurement quality analyses and reporting disaggregated subgroup mean performance.

...read moreread less

Abstract: Low test-taking effort as a validity threat is common when examinees perceive an assessment context to have minimal personal value. Prior research has shown that in such contexts, subgroups may dif...

...read moreread less

Journal Article•DOI•

Modeling Item Revisit Behavior: The Hierarchical Speed-Accuracy-Revisits Model.

[...]

Ummugul Bezirhan¹, Matthias von Davier², Irina Grabovsky³•Institutions (3)

Columbia University¹, Boston College², National Board of Medical Examiners³

01 Apr 2021-Educational and Psychological Measurement

TL;DR: A new approach to the analysis of how students answer tests and how they allocate resources in terms of time on task and revisiting previously answered questions is presented, revealing that examinees’ tendency to revisit items was strongly related to their speed and subgroups of examinees displayed different test-taking behaviors.

...read moreread less

Abstract: This article presents a new approach to the analysis of how students answer tests and how they allocate resources in terms of time on task and revisiting previously answered questions. Previous res...

...read moreread less

Journal Article•DOI•

A Mixture IRTree Model for Extreme Response Style: Accounting for Response Process Uncertainty:

[...]

Nana Kim¹, Daniel M. Bolt¹•Institutions (1)

University of Wisconsin-Madison¹

01 Feb 2021-Educational and Psychological Measurement

TL;DR: It is argued that methodology applied to investigate response styles should attend to the inherent uncertainty of response style influence due to the likely influence of both response styles and the content trait on the selection of extreme response categories.

...read moreread less

Abstract: This paper presents a mixture item response tree (IRTree) model for extreme response style. Unlike traditional applications of single IRTree models, a mixture approach provides a way of representin...

...read moreread less

Journal Article•DOI•

A Simulation Study on the Performance of Different Reliability Estimation Methods.

[...]

Ashley A. Edwards¹, Keanan J. Joyner¹, Christopher Schatschneider¹•Institutions (1)

Florida State University¹

15 Feb 2021-Educational and Psychological Measurement

TL;DR: A follow-up regression comparing alpha and omega revealed alpha to be more sensitive to degree of violation of tau equivalence, whereas omega was affected greater by sample size and number of items, especially when population reliability was low.

...read moreread less

Abstract: The accuracy of certain internal consistency estimators have been questioned in recent years. The present study tests the accuracy of six reliability estimators (Cronbach’s alpha, omega, omega hier...

...read moreread less

Journal Article•DOI•

Can High-Dimensional Questionnaires Resolve the Ipsativity Issue of Forced-Choice Response Formats?

[...]

Niklas Schulte¹, Heinz Holling¹, Paul-Christian Bürkner²•Institutions (2)

University of Münster¹, Aalto University²

01 Apr 2021-Educational and Psychological Measurement

TL;DR: Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales as mentioned in this paper, however, the derived trait scores are often unreliable and ipsative, making inter-subjective responses often unreliable.

...read moreread less

Abstract: Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales. However, the derived trait scores are often unreliable and ipsative, making interi...

...read moreread less

Journal Article•DOI•

Testing Measurement Invariance Across Unobserved Groups: The Role of Covariates in Factor Mixture Modeling.

[...]

Yan Wang¹, Eunsook Kim², John M. Ferron², Robert F. Dedrick², Tony Xing Tan², Stephen Stark² - Show less +2 more•Institutions (2)

University of Massachusetts Lowell¹, University of South Florida²

01 Feb 2021-Educational and Psychological Measurement

TL;DR: The impact of excluding and misspecifying covariate effects on measurement invariance testing and class enumeration was investigated via Monte Carlo simulations and the utility of a model comparison approach in searching for the correct specification of covariates was evidenced.

...read moreread less

Abstract: Factor mixture modeling (FMM) has been increasingly used to investigate unobserved population heterogeneity. This study examined the issue of covariate effects with FMM in the context of measurement invariance testing. Specifically, the impact of excluding and misspecifying covariate effects on measurement invariance testing and class enumeration was investigated via Monte Carlo simulations. Data were generated based on FMM models with (1) a zero covariate effect, (2) a covariate effect on the latent class variable, and (3) covariate effects on both the latent class variable and the factor. For each population model, different analysis models that excluded or misspecified covariate effects were fitted. Results highlighted the importance of including proper covariates in measurement invariance testing and evidenced the utility of a model comparison approach in searching for the correct specification of covariate effects and the level of measurement invariance. This approach was demonstrated using an empirical data set. Implications for methodological and applied research are discussed.

...read moreread less

Journal Article•DOI•

Robustness of Latent Profile Analysis to Measurement Noninvariance Between Profiles

[...]

Yan Wang¹, Eunsook Kim², Zhiyao Yi³•Institutions (3)

University of Massachusetts Lowell¹, University of South Florida², Chongqing Technology and Business University³

09 Mar 2021-Educational and Psychological Measurement

TL;DR: This simulation study examined the robustness of LPA in terms of class enumeration and parameter recovery when the noninvariance was unmodeled by using composite or factor scores as profile indicators.

...read moreread less

Abstract: Latent profile analysis (LPA) identifies heterogeneous subgroups based on continuous indicators that represent different dimensions. It is a common practice to measure each dimension using items, c...

...read moreread less

Journal Article•DOI•

KR20 and KR21 for Some Nondichotomous Data (It’s Not Just Cronbach’s Alpha):

[...]

Robert C. Foster

15 Feb 2021-Educational and Psychological Measurement

TL;DR: Simulations show performance exceeding that of Cronbach's alpha in terms of root mean square error when the formula matching the correct exponential family is used, and a discussion of Jensen’s inequality suggests explanations for peculiarities of the bias and standard error of the simulations across the different exponential families.

...read moreread less

Abstract: This article presents some equivalent forms of the common Kuder-Richardson Formula 21 and 20 estimators for nondichotomous data belonging to certain other exponential families, such as Poisson count data, exponential data, or geometric counts of trials until failure. Using the generalized framework of Foster (2020), an equation for the reliability for a subset of the natural exponential family have quadratic variance function is derived for known population parameters, and both formulas are shown to be different plug-in estimators of this quantity. The equivalent Kuder-Richardson Formulas 20 and 21 are given for six different natural exponential families, and these match earlier derivations in the case of binomial and Poisson data. Simulations show performance exceeding that of Cronbach's alpha in terms of root mean square error when the formula matching the correct exponential family is used, and a discussion of Jensen's inequality suggests explanations for peculiarities of the bias and standard error of the simulations across the different exponential families.

...read moreread less

Journal Article•DOI•

Large-Sample Variance of Fleiss Generalized Kappa.

[...]

Kilem L. Gwet

15 Feb 2021-Educational and Psychological Measurement

TL;DR: The purpose of this article is to show that the large-sample variance of Fleiss’ generalized kappa is systematically being misused, is invalid as a precision measure for kappa, and cannot be used for constructing confidence intervals.

...read moreread less

Abstract: Cohen's kappa coefficient was originally proposed for two raters only, and it later extended to an arbitrarily large number of raters to become what is known as Fleiss' generalized kappa. Fleiss' generalized kappa and its large-sample variance are still widely used by researchers and were implemented in several software packages, including, among others, SPSS and the R package "rel." The purpose of this article is to show that the large-sample variance of Fleiss' generalized kappa is systematically being misused, is invalid as a precision measure for kappa, and cannot be used for constructing confidence intervals. A general-purpose variance expression is proposed, which can be used in any statistical inference procedure. A Monte-Carlo experiment is presented, showing the validity of the new variance estimation procedure.

...read moreread less

Journal Article•DOI•

Latent Variable Modeling and Adaptive Testing for Experimental Cognitive Psychopathology Research.

[...]

Michael L. Thomas¹, Gregory G. Brown², Virginie M. Patt³, John R Duffy¹•Institutions (3)

Colorado State University¹, University of California, San Diego², VA Boston Healthcare System³

01 Feb 2021-Educational and Psychological Measurement

TL;DR: A generalized latent variable model is presented that, when combined with strong parametric assumptions based on mathematical cognitive models, permits the use of adaptive testing without large samples or the need to precalibrate item parameters.

...read moreread less

Abstract: The adaptation of experimental cognitive tasks into measures that can be used to quantify neurocognitive outcomes in translational studies and clinical trials has become a key component of the stra...

...read moreread less

Journal Article•DOI•

A Polytomous Scoring Approach to Handle Not-Reached Items in Low-Stakes Assessments.

[...]

Guher Gorgun¹, Okan Bulut¹•Institutions (1)

University of Alberta¹

12 Feb 2021-Educational and Psychological Measurement

TL;DR: This study proposes a polytomous scoring approach for handling not-reached items and compares its performance with those of the traditional scoring approaches and indicates that the polytomously scoring approaches outperformed the traditional approaches.

...read moreread less

Abstract: In low-stakes assessments, some students may not reach the end of the test and leave some items unanswered due to various reasons (e.g., lack of test-taking motivation, poor time management, and test speededness). Not-reached items are often treated as incorrect or not-administered in the scoring process. However, when the proportion of not-reached items is high, these traditional approaches may yield biased scores and thereby threatening the validity of test results. In this study, we propose a polytomous scoring approach for handling not-reached items and compare its performance with those of the traditional scoring approaches. Real data from a low-stakes math assessment administered to second and third graders were used. The assessment consisted of 40 short-answer items focusing on addition and subtraction. The students were instructed to answer as many items as possible within 5 minutes. Using the traditional scoring approaches, students' responses for not-reached items were treated as either not-administered or incorrect in the scoring process. With the proposed scoring approach, students' nonmissing responses were scored polytomously based on how accurately and rapidly they responded to the items to reduce the impact of not-reached items on ability estimation. The traditional and polytomous scoring approaches were compared based on several evaluation criteria, such as model fit indices, test information function, and bias. The results indicated that the polytomous scoring approaches outperformed the traditional approaches. The complete case simulation corroborated our empirical findings that the scoring approach in which nonmissing items were scored polytomously and not-reached items were considered not-administered performed the best. Implications of the polytomous scoring approach for low-stakes assessments were discussed.

...read moreread less

Journal Article•DOI•

Semisupervised Learning Method to Adjust Biased Item Difficulty Estimates Caused by Nonignorable Missingness in a Virtual Learning Environment

[...]

Kang Xue, Anne Corinne Huggins-Manley¹, Walter L. Leite¹•Institutions (1)

University of Florida¹

04 Jun 2021-Educational and Psychological Measurement

TL;DR: In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability.

...read moreread less

Abstract: In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rel...

...read moreread less

Journal Article•DOI•

Factor Retention in Exploratory Factor Analysis With Missing Data

[...]

David Goretzko¹•Institutions (1)

Ludwig Maximilian University of Munich¹

11 Jun 2021-Educational and Psychological Measurement

TL;DR: In the majority of conditions and for all factor retention criteria except the comparison data approach, the missing data mechanism had little impact on the accuracy and pairwise deletion performed comparably well as the more sophisticated imputation methods.

...read moreread less

Abstract: Determining the number of factors in exploratory factor analysis is arguably the most crucial decision a researcher faces when conducting the analysis. While several simulation studies exist that c...

...read moreread less

Journal Article•DOI•

Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks:

[...]

Stefanie A. Wind¹, Yuan Ge¹•Institutions (1)

University of Alabama¹

19 Jan 2021-Educational and Psychological Measurement

TL;DR: This research presents a novel and scalable approach called "Smart Scorecard™", which automates and automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and rating individual performances in a discrete-time manner.

...read moreread less

Abstract: Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping perfo...

...read moreread less

Journal Article•DOI•

Generalized Linear Factor Score Regression: A Comparison of Four Methods:

[...]

Gustaf Andersson¹, Fan Yang-Wallentin¹•Institutions (1)

Uppsala University¹

01 Aug 2021-Educational and Psychological Measurement

TL;DR: The regression method generally performs the best in terms of coefficient and standard error bias, accuracy, and empirical Type I error rates, and the correlation-preserving method mostly outperform the sum score methods.

...read moreread less

Abstract: Factor score regression has recently received growing interest as an alternative for structural equation modeling. However, many applications are left without guidance because of the focus on norma...

...read moreread less

Journal Article•DOI•

A Comparison of Label Switching Algorithms in the Context of Growth Mixture Models.

[...]

Kristina R. Cassiday¹, Youngmi Cho, Jeffrey R. Harring¹•Institutions (1)

University of Maryland, College Park¹

01 Aug 2021-Educational and Psychological Measurement

TL;DR: This study found that if the class constraint algorithm was used a priori, it should be combined with a post hoc algorithm for accurate classification and is most effective under two-class models when class separation is high.

...read moreread less

Abstract: Simulation studies involving mixture models inevitably aggregate parameter estimates and other output across numerous replications. A primary issue that arises in these methodological investigation...

...read moreread less

Journal Article•DOI•

Evidence That Selecting an Appropriate Item Response Theory–Based Approach to Scoring Surveys Can Help Avoid Biased Treatment Effect Estimates:

[...]

James Soland¹•Institutions (1)

University of Virginia¹

03 May 2021-Educational and Psychological Measurement

TL;DR: This paper presents a meta-analysis of eight randomized control trials (RCTs) conducted in the Netherlands over the course of a 12-month period and found that three out of four trials showed statistically significant improvements in the quality of the control groups.

...read moreread less

Abstract: Considerable thought is often put into designing randomized control trials (RCTs). From power analyses and complex sampling designs implemented preintervention to nuanced quasi-experimental models ...

...read moreread less

Journal Article•DOI•

Prediction With Mixed Effects Models: A Monte Carlo Simulation Study.

[...]

Anthony A. Mangino¹, W. Holmes Finch¹•Institutions (1)

Ball State University¹

16 Feb 2021-Educational and Psychological Measurement

TL;DR: A nested structure based approach to analyzing data obtained within a nested structure (e.g., students within schools) to effectively analyze data with such a structure is recommended.

...read moreread less

Abstract: Oftentimes in many fields of the social and natural sciences, data are obtained within a nested structure (e.g., students within schools). To effectively analyze data with such a structure, multile...

...read moreread less

Journal Article•DOI•

How Days between Tests Impacts Alternate Forms Reliability in Computerized Adaptive Tests.

[...]

Adam E. Wyse

01 Aug 2021-Educational and Psychological Measurement

TL;DR: Results suggest that the highest alternate forms reliability coefficients were obtained when the second test was administered at least 2 to 3 weeks after the first test, suggesting a potential tradeoff in waiting longer to retest as student ability tended to grow with time.

...read moreread less

Abstract: An essential question when computing test–retest and alternate forms reliability coefficients is how many days there should be between tests. This article uses data from reading and math computeriz...

...read moreread less