A comparison of reliability and precision of subscore reporting methods for a state English language proficiency assessment
Citations
113 citations
12 citations
8 citations
Cites background or result from "A comparison of reliability and pre..."
...However, the language testing literature on fine-grained feedback largely lacks an explicit discussion about whether and when such information is psychometrically justified, with notable exceptions of Longabach and Peyton (2018), Papageorgiou and Choi (2018), and Sawaki and Sinharay (2013)....
[...]
...Longabach and Peyton (2018) examined subscores from a K–12 English language proficiency test, and found that, compared to other augmentation methods, augmentation using MIRT (Yao & Boughton, 2007; Haberman & Sinharay, 2010) improved the subscore reliability the most....
[...]
...Both Longabach and Peyton (2018) and Papageorgiou and Choi (2018) focused on subscores for individual test takers and did not evaluate group-level subscores....
[...]
5 citations
Cites background or methods from "A comparison of reliability and pre..."
...According to Longabach and Peyton (2017), it is of great importance to assess the sub-score reliability and total score reliability when assessing the English proficiency of ELLs because the ultimate goal of assessment is to improve the education for ELLs, which relies on the accuracy of the…...
[...]
...Longabach and Peyton (2017) used Cronbach爷s alpha and standard error of measurement respectively to estimate the reliability and precision of the four methods....
[...]
...According to Longabach and Peyton (2017), MIRT was found to be the most reliable one among the four methods to score the sub-domains for all grade levels....
[...]
References
2,731 citations
"A comparison of reliability and pre..." refers background in this paper
...…domains on various grounds related to the inappropriate representation of the construct in question, some authors (e.g., Brindley & Slatyer, 2002; Brown, 2004; Khoii & Paydarnia, 2001; Shin, 2007) specifically draw attention to the fact that listening skills, while critical to ELD, are not very…...
[...]
...Although a number of studies criticize language assessment tasks for all domains on various grounds related to the inappropriate representation of the construct in question, some authors (e.g., Brindley & Slatyer, 2002; Brown, 2004; Khoii & Paydarnia, 2001; Shin, 2007) specifically draw attention to the fact that listening skills, while critical to ELD, are not very well understood and hard to assess....
[...]
1,420 citations
"A comparison of reliability and pre..." refers methods in this paper
...The present research was carried out using only the R package mirt (Chalmers, 2012; code available upon request), which not only simplified calculations and parameter estimations, but also made them more consistent from one method of subscore reporting to another....
[...]
1,219 citations
868 citations
"A comparison of reliability and pre..." refers methods in this paper
...Item parameters are calculated from the factor loadings of each item on each subscale and the estimated covariance between subscores’ matrices, and then the person parameter is calculated based on these values (Reckase, 2009)....
[...]
682 citations
"A comparison of reliability and pre..." refers background in this paper
...IRT, on the other hand, is based on the premise that the probability of a correct response to an item is a function of person’s trait (such as ELD) and item parameters (difficulty, discrimination, and guessing) (Hambleton & Jones, 1993)....
[...]