scispace - formally typeset
Search or ask a question
Author

Sandip Sinharay

Bio: Sandip Sinharay is an academic researcher from Princeton University. The author has contributed to research in topics: Item response theory & Equating. The author has an hindex of 29, co-authored 196 publications receiving 3513 citations. Previous affiliations of Sandip Sinharay include Educational Testing Service & CTB/McGraw Hill.


Papers
More filters
Journal ArticleDOI
TL;DR: The idea behind MI, the advantages of MI over existing techniques for addressing missing data, how to do MI for real problems, the software available to implement MI, and the results of a simulation study aimed at finding out how assumptions regarding the imputation model affect the parameter estimates provided by MI are discussed.
Abstract: This article provides a comprehensive review of multiple imputation (MI), a technique for analyzing data sets with missing values. Formally, MI is the process of replacing each missing data point with a set of m > 1 plausible values to generate m complete data sets. These complete data sets are then analyzed by standard statistical software, and the results combined, to give parameter estimates and standard errors that take into account the uncertainty due to the missing data values. This article introduces the idea behind MI, discusses the advantages of MI over existing techniques for addressing missing data, describes how to do MI for real problems, reviews the software available to implement MI, and discusses the results of a simulation study aimed at finding out how assumptions regarding the imputation model affect the parameter estimates provided by MI.

504 citations

Journal ArticleDOI
TL;DR: This article examines the performance of a number of discrepancy measures for assessing different aspects of fit of the common IRT models and makes specific recommendations about what measures are most useful in assessing model fit.
Abstract: Model checking in item response theory (IRT) is an underdeveloped area. There is no universally accepted tool for checking IRT models. The posterior predictive model-checking method is a popular Bayesian model-checking tool because it has intuitive appeal, is simple to apply, has a strong theoretical basis, and can provide graphical or numerical evidence about model misfit. An important issue with the application of the posterior predictive model-checking method is the choice of a discrepancy measure (which plays a role like that of a test statistic in traditional hypothesis tests). This article examines the performance of a number of discrepancy measures for assessing different aspects of fit of the common IRT models and makes specific recommendations about what measures are most useful in assessing model fit. Graphical summaries of model-checking results are demonstrated to provide useful insights about model fit.

196 citations

Journal ArticleDOI
TL;DR: The posterior predictive model checking (PPMC) method is applied to a number of real applications of unidimensional IRT models, demonstrating how to exploit the flexibility of the posterior predictive checks to meet the need of the researcher.
Abstract: Even though Bayesian estimation has recently become quite popular in item response theory (IRT), there is a lack of works on model checking from a Bayesian perspective. This paper applies the posterior predictive model checking (PPMC) method (Guttman, 1967; Rubin, 1984), a popular Bayesian model checking tool, to a number of real applications of unidimensional IRT models. The applications demonstrate how to exploit the flexibility of the posterior predictive checks to meet the need of the researcher. This paper also examines practical consequences of misfit, an area often ignored in educational measurement literature while assessing model fit.

121 citations

Journal ArticleDOI
TL;DR: In this paper, a detailed simulation study was conducted to examine what properties subscores should possess in order to have added value, and the results indicated that subscores have to satisfy strict standards of reliability and correlation to achieve added value.
Abstract: Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman suggested a method based on classical test theory to determine whether subscores have added value over total scores. In this article I first provide a rich collection of results regarding when subscores were found to have added value for several operational data sets. Following that I provide results from a detailed simulation study that examines what properties subscores should possess in order to have added value. The results indicate that subscores have to satisfy strict standards of reliability and correlation to have added value. A weighted average of the subscore and the total score was found to have added value more often.

119 citations

Journal ArticleDOI
TL;DR: In this article, a multidimensional item response theory (MIRT) model is fitted using a stabilized Newton-Raphson algorithm, and a new statistical approach is proposed to assess when subscores using the MIRT model have any added value over the total score or the subscores based on classical test theory.
Abstract: Recently, there has been increasing interest in reporting subscores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models (e.g., Reckase in Appl. Psychol. Meas. 21:25–36, 1997; C.R. Rao and S. Sinharay (Eds), Handbook of Statistics, vol. 26, pp. 607–642, North-Holland, Amsterdam, 2007; Beguin & Glas in Psychometrika, 66:471–488, 2001). A MIRT model is fitted using a stabilized Newton–Raphson algorithm (Haberman in The Analysis of Frequency Data, University of Chicago Press, Chicago, 1974; Sociol. Methodol. 18:193–211, 1988) with adaptive Gauss–Hermite quadrature (Haberman, von Davier, & Lee in ETS Research Rep. No. RR-08-45, ETS, Princeton, 2008). A new statistical approach is proposed to assess when subscores using the MIRT model have any added value over (i) the total score or (ii) subscores based on classical test theory (Haberman in J. Educ. Behav. Stat. 33:204–229, 2008; Haberman, Sinharay, & Puhan in Br. J. Math. Stat. Psychol. 62:79–95, 2008). The MIRT-based methods are applied to several operational data sets. The results show that the subscores based on MIRT are slightly more accurate than subscore estimates derived by classical test theory.

92 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI) are presented and may eventually extend the ML and MI methods that currently represent the state of the art.
Abstract: Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.

10,568 citations

Journal ArticleDOI

6,278 citations

Posted Content
TL;DR: E elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision are demonstrated.
Abstract: Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.

5,092 citations

01 Jan 2006
TL;DR: For example, Standardi pružaju okvir koje ukazuju na ucinkovitost kvalitetnih instrumenata u onim situacijama u kojima je njihovo koristenje potkrijepljeno validacijskim podacima.
Abstract: Pedagosko i psiholosko testiranje i procjenjivanje spadaju među najvažnije doprinose znanosti o ponasanju nasem drustvu i pružaju temeljna i znacajna poboljsanja u odnosu na ranije postupke. Iako se ne može ustvrditi da su svi testovi dovoljno usavrseni niti da su sva testiranja razborita i korisna, postoji velika kolicina informacija koje ukazuju na ucinkovitost kvalitetnih instrumenata u onim situacijama u kojima je njihovo koristenje potkrijepljeno validacijskim podacima. Pravilna upotreba testova može dovesti do boljih odluka o pojedincima i programima nego sto bi to bio slucaj bez njihovog koristenja, a također i ukazati na put za siri i pravedniji pristup obrazovanju i zaposljavanju. Međutim, losa upotreba testova može dovesti do zamjetne stete nanesene ispitanicima i drugim sudionicima u procesu donosenja odluka na temelju testovnih podataka. Cilj Standarda je promoviranje kvalitetne i eticne upotrebe testova te uspostavljanje osnovice za ocjenu kvalitete postupaka testiranja. Svrha objavljivanja Standarda je uspostavljanje kriterija za evaluaciju testova, provedbe testiranja i posljedica upotrebe testova. Iako bi evaluacija prikladnosti testa ili njegove primjene trebala ovisiti prvenstveno o strucnim misljenjima, Standardi pružaju okvir koji osigurava obuhvacanje svih relevantnih pitanja. Bilo bi poželjno da svi autori, sponzori, nakladnici i korisnici profesionalnih testova usvoje Standarde te da poticu druge da ih također prihvate.

3,905 citations

Journal ArticleDOI

3,152 citations