Showing papers on "Item response theory published in 2008"

PDF

Open Access

Journal Article•DOI•

The performance of the Japanese version of the K6 and K10 in the World Mental Health Survey Japan

[...]

Toshi A. Furukawa¹, Norito Kawakami², Mari Saitoh², Yutaka Ono³, Yoshibumi Nakane⁴, Yosikazu Nakamura, Hisateru Tachimori, Noboru Iwata⁵, Hidenori Uda, Hideyuki Nakane⁶, Makoto Watanabe, Yoichi Naganuma, Yukihiro Hata, Masayo Kobayashi, Yuko Miyake, Tadashi Takeshima, Takehiko Kikkawa⁷ - Show less +13 more•Institutions (7)

Nagoya City University¹, University of Tokyo², Keio University³, Nagasaki International University⁴, International University, Cambodia⁵, Nagasaki University⁶, Chubu Gakuin University⁷

01 Sep 2008-International Journal of Methods in Psychiatric Research

TL;DR: The Japanese versions of the K6 and K10 demonstrated screening performances essentially equivalent to those of the original English versions, and Stratum‐specific likelihood ratios (SSLRs) were strikingly similar between the Japanese and the original versions.

...read moreread less

Abstract: Two new screening scales for psychological distress, the K6 and K10, have been developed using the item response theory and shown to outperform existing screeners in English. We developed their Japanese versions using the standard back-translaton method and included them in the World Mental Health Survey Japan (WMH-J), which is a psychiatric epidemiologic study conducted in seven communities across Japan with 2436 participants. The WMH-J used the WMH Survey Initiative version of the Composite International Diagnostic Interview (CIDI) to assess the 30-day Diagnostic and Statistical Manual of Mental Disorders--Fourth Edition (DSM-IV). Performance of the two screening scales in detecting DSM-IV mood and anxiety disorders, as assessed by the areas under receiver operating characteristic curves (AUCs), was excellent, with values as high as 0.94 (95% confidence interval = 0.88 to 0.99) for K6 and 0.94 (0.88 to 0.995) for K10. Stratum-specific likelihood ratios (SSLRs), which express screening test characteristics and can be used to produce individual-level predicted probabilities of being a case from screening scale scores and pretest probabilities in other samples, were strikingly similar between the Japanese and the original versions. The Japanese versions of the K6 and K10 thus demonstrated screening performances essentially equivalent to those of the original English versions.

...read moreread less

970 citations

Journal Article•DOI•

Measurement invariance: Review of practice and implications

[...]

Neal Schmitt¹, Goran Kuljanin¹•Institutions (1)

Michigan State University¹

01 Dec 2008-Human Resource Management Review

TL;DR: A review of efforts to assess the invariance of measurement instruments across different respondent groups using confirmatory factor analysis (CFA) is provided for the years since the Vandenberg and Lance [Vandenberg, R. J., & Lance, C. E. as mentioned in this paper ].

...read moreread less

537 citations

Journal Article•DOI•

Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS).

[...]

Matthias Rose¹, Jakob B. Bjorner¹, J. Becker¹, James F. Fries², John E. Ware³ - Show less +1 more•Institutions (3)

University of Hamburg¹, Stanford University², Tufts University³

01 Jan 2008-Journal of Clinical Epidemiology

TL;DR: The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.

...read moreread less

394 citations

Journal Article•DOI•

A general diagnostic model applied to language testing data

[...]

Matthias von Davier¹•Institutions (1)

Princeton University¹

01 Nov 2008-British Journal of Mathematical and Statistical Psychology

TL;DR: This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels.

...read moreread less

Abstract: Probabilistic models with one or more latent variables are designed to report on a corresponding number of skills or cognitive attributes. Multidimensional skill profiles offer additional information beyond what a single test score can provide, if the reported skills can be identified and distinguished reliably. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods such as Markov chain Monte Carlo, since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The paper uses one member of a larger class of diagnostic models, a compensatory diagnostic model for dichotomous and partial credit data. Many well-known models, such as univariate and multivariate versions of the Rasch model and the two-parameter logistic item response theory model, the generalized partial credit model, as well as a variety of skill profile models, are special cases of this GDM. In addition to an introduction to this model, the paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL Internet-based testing.

...read moreread less

380 citations

Journal Article•DOI•

Unique Characteristics of Diagnostic Classification Models: A Comprehensive Review of the Current State-of-the-Art

[...]

André A. Rupp¹, Jonathan Templin²•Institutions (2)

University of Maryland, College Park¹, University of Georgia²

20 Nov 2008-Measurement: Interdisciplinary Research & Perspective

TL;DR: In this paper, a definitional boundary of the space of diagnostic classification models (DCM) is developed, core DCM within this space are reviewed, and their defining features are compared and contrasted with those of other latent variable models.

...read moreread less

Abstract: Diagnostic classification models (DCM) are frequently promoted by psychometricians as important modelling alternatives for analyzing response data in situations where multivariate classifications of respondents are made on the basis of multiple postulated latent skills. In this review paper, a definitional boundary of the space of DCM is developed, core DCM within this space are reviewed, and their defining features are compared and contrasted with those of other latent variable models. The models to which DCM are compared include unrestricted latent class models, multidimensional factor analysis models, and multidimensional item response theory models. Attention is paid to both statistical considerations of model structure, as well as substantive considerations of model use.

...read moreread less

273 citations

Journal Article•DOI•

Using Item Response Theory to Measure Extreme Response Style in Marketing Research: A Global Investigation

[...]

Martijn G. de Jong¹, Jan-Benedict E.M. Steenkamp², Jean-Paul Fox³, Hans Baumgartner⁴•Institutions (4)

Erasmus University Rotterdam¹, University of North Carolina at Chapel Hill², University of Twente³, Pennsylvania State University⁴

01 Feb 2008-Journal of Marketing Research

TL;DR: This model integrates an advanced item response theory measurement model with a structural hierarchical model for studying antecedents of ERS and improves on existing procedures by allowing different items to be differentially useful for measuring ERS.

...read moreread less

Abstract: Extreme response style (ERS) is an important threat to the validity of survey-based marketing research In this article, the authors present a new item response theory–based model for measuring ERS This model contributes to the ERS literature in two ways First, the method improves on existing procedures by allowing different items to be differentially useful for measuring ERS and by accommodating the possibility that an item's usefulness differs across groups (eg, countries) Second, the model integrates an advanced item response theory measurement model with a structural hierarchical model for studying antecedents of ERS The authors simultaneously estimate a person's ERS score and individual- and group-level (country) drivers of ERS Through simulations, they show that the new method improves on traditional procedures They further apply the model to a large data set consisting of 12,506 consumers from 26 countries on four continents The findings show that the model extensions are necessar

...read moreread less

260 citations

Journal Article•DOI•

Random Item IRT Models.

[...]

Paul De Boeck¹•Institutions (1)

Katholieke Universiteit Leuven¹

02 Dec 2008-Psychometrika

TL;DR: It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF.

...read moreread less

Abstract: It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF.

...read moreread less

205 citations

Journal Article•DOI•

A Note on the Relation Between Factor Analytic and Item Response Theory Models

[...]

Akihito Kamata¹, Daniel J. Bauer²•Institutions (2)

Florida State University¹, University of North Carolina at Chapel Hill²

11 Jan 2008-Structural Equation Modeling

TL;DR: In this paper, the relations among several alternative parameterizations of the binary factor analysis model and the 2-parameter item response theory model are discussed, and general formulas are provided.

...read moreread less

Abstract: The relations among several alternative parameterizations of the binary factor analysis model and the 2-parameter item response theory model are discussed. It is pointed out that different parameterizations of factor analysis model parameters can be transformed into item response model theory parameters, and general formulas are provided. Illustrative data analysis is provided to demonstrate the transformations.

...read moreread less

197 citations

Journal Article•DOI•

Development of a measure of early mathematics achievement using the Rasch model: the Research‐Based Early Maths Assessment

[...]

Douglas H. Clements¹, Julie Sarama¹, Xiufeng Liu¹•Institutions (1)

University at Buffalo¹

20 May 2008-Educational Psychology

TL;DR: In this article, the development of a theoretically based, empirically tested instrument designed to measure the mathematical knowledge and skills of children from three to seven years of age, emphasising its submission to the Rasch model, was described.

...read moreread less

Abstract: There are only a few instruments to assess mathematics knowledge and skills in children as young as three to four years of age, and these instruments are limited in scope of content. We describe the development of a theoretically based, empirically tested instrument designed to measure the mathematical knowledge and skills of children from three to seven years of age, emphasising its submission to the Rasch model. After using the data to refine the instrument, they fit the model well, with high reliability. These data also provided empirical support for the developmental progressions for most topics. We conclude with a description of the research’s contribution to theory and empirical research regarding young children’s development of specific mathematical competencies.

...read moreread less

183 citations

Journal Article•DOI•

Using computerized adaptive testing to reduce the burden of mental health assessment.

[...]

Robert D. Gibbons¹, David J. Weiss¹, David J. Kupfer², Ellen Frank¹, Andrea Fagiolini², Victoria J. Grochocinski², Dulal K. Bhaumik¹, Angela M. Stover², R. Darrell Bock¹, Jason C. Immekus³ - Show less +6 more•Institutions (3)

University of Illinois at Chicago¹, University of Pittsburgh², California State University, Fresno³

01 Apr 2008-Psychiatric Services

TL;DR: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden.

...read moreread less

Abstract: Objective: This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments. Methods: Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing. Results: Tests of competing models based on item response theory supported the scale’s bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients—one with bipolar disorder and one without—on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT. Conclusions: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden. (Psychiatric Services 59:361–368, 2008)

...read moreread less

161 citations

Journal Article•DOI•

Distinguishing between latent classes and continuous factors with categorical outcomes: Class invariance of parameters of factor mixture models

[...]

Gitta H. Lubke¹, Michael C. Neale²•Institutions (2)

University of Notre Dame¹, Virginia Commonwealth University²

19 Dec 2008-Multivariate Behavioral Research

TL;DR: Results show that it is possible to discriminate between latent class models and factor models even if responses are categorical, and testing for class invariance of parameters is important in the context of measurement invariance and when using mixture models to approximate nonnormal distributions.

...read moreread less

Abstract: Factor mixture models (FMM's) are latent variable models with categorical and continuous latent variables which can be used as a model-based approach to clustering. A previous paper covered the results of a simulation study showing that in the absence of model violations, it is usually possible to choose the correct model when fitting a series of models with different numbers of classes and factors within class. The response format in the first study was limited to normally distributed outcomes. The current paper has two main goals, firstly, to replicate parts of the first study with 5-point Likert scale and binary outcomes, and secondly, to address the issue of testing class invariance of thresholds and loadings. Testing for class invariance of parameters is important in the context of measurement invariance and when using mixture models to approximate non-normal distributions. Results show that it is possible to discriminate between latent class models and factor models even if responses are categorical. Comparing models with and without class-specific parameters can lead to incorrectly accepting parameter invariance if the compared models differ substantially with respect to the number of estimated parameters. The simulation study is complemented with an illustration of a factor mixture analysis of ten binary depression items obtained from a female subsample of the Virginia Twin Registry.

...read moreread less

Journal Article•DOI•

Model Evaluation and Multiple Strategies in Cognitive Diagnosis: An Analysis of Fraction Subtraction Data

[...]

Jimmy de la Torre¹, Jeffrey A. Douglas²•Institutions (2)

Rutgers University¹, University of Illinois at Urbana–Champaign²

29 Mar 2008-Psychometrika

TL;DR: This paper studies three models for cognitive diagnosis, each illustrated with an application to fraction subtraction data, and employs Markov chain Monte Carlo algorithms to fit the models and presents simulation results to examine the performance of these algorithms.

...read moreread less

Abstract: This paper studies three models for cognitive diagnosis, each illustrated with an application to fraction subtraction data. The objective of each of these models is to classify examinees according to their mastery of skills assumed to be required for fraction subtraction. We consider the DINA model, the NIDA model, and a new model that extends the DINA model to allow for multiple strategies of problem solving. For each of these models the joint distribution of the indicators of skill mastery is modeled using a single continuous higher-order latent trait, to explain the dependence in the mastery of distinct skills. This approach stems from viewing the skills as the specific states of knowledge required for exam performance, and viewing these skills as arising from a broadly defined latent trait resembling the θ of item response models. We discuss several techniques for comparing models and assessing goodness of fit. We then implement these methods using the fraction subtraction data with the aim of selecting the best of the three models for this application. We employ Markov chain Monte Carlo algorithms to fit the models, and we present simulation results to examine the performance of these algorithms.

...read moreread less

Journal Article•DOI•

Classical latent variable models for medical research

[...]

Sophia Rabe-Hesketh¹, Anders Skrondal²•Institutions (2)

Institute of Education¹, Norwegian Institute of Public Health²

01 Feb 2008-Statistical Methods in Medical Research

TL;DR: This paper describes classical latent variable models such as factor analysis, item response theory, latent class models and structural equation models and their usefulness in medical research is demonstrated using real data.

...read moreread less

Abstract: Latent variable models are commonly used in medical statistics, although often not referred to under this name. In this paper we describe classical latent variable models such as factor analysis, item response theory, latent class models and structural equation models. Their usefulness in medical research is demonstrated using real data. Examples include measurement of forced expiratory flow, measurement of physical disability, diagnosis of myocardial infarction and modelling the determinants of clients' satisfaction with counsellors' interviews.

...read moreread less

Journal Article•DOI•

Pooling Data From Multiple Longitudinal Studies: The Role of Item Response Theory in Integrative Data Analysis

[...]

Patrick J. Curran¹, Andrea M. Hussong¹, Li Cai¹, Wenjing Huang¹, Laurie Chassin², Kenneth J. Sher³, Robert A. Zucker⁴ - Show less +3 more•Institutions (4)

University of North Carolina at Chapel Hill¹, Arizona State University², University of Missouri³, University of Michigan⁴

01 Mar 2008-Developmental Psychology

TL;DR: The authors present findings from the analysis of repeated measures of internalizing symptomatology that were pooled from three existing developmental studies and describe and demonstrate each step in the analysis and conclude with a discussion of potential limitations and directions for future research.

...read moreread less

Abstract: There are a number of significant challenges researchers encounter when studying development over an extended period of time, including subject attrition, the changing of measurement structures across groups and developmental periods, and the need to invest substantial time and money. Integrative data analysis is an emerging set of methodologies that allows researchers to overcome many of the challenges of single-sample designs through the pooling of data drawn from multiple existing developmental studies. This approach is characterized by a host of advantages, but this also introduces several new complexities that must be addressed prior to broad adoption by developmental researchers. In this article, the authors focus on methods for fitting measurement models and creating scale scores using data drawn from multiple longitudinal studies. The authors present findings from the analysis of repeated measures of internalizing symptomatology that were pooled from three existing developmental studies. The authors describe and demonstrate each step in the analysis and conclude with a discussion of potential limitations and directions for future research.

...read moreread less

Journal Article•DOI•

Dimensionality and norms of the Rosenberg Self-Esteem Scale in a German general population sample.

[...]

Marcus Roth¹, Oliver Decker¹, Philipp Yorck Herzberg¹, Elmar Brähler¹•Institutions (1)

Leipzig University¹

02 Jul 2008-European Journal of Psychological Assessment

TL;DR: In this article, the dimensionality of the German version of Rosenberg's Self-Esteem scale (RSES) was analyzed in a nationally representative population sample of 4,988 subjects (46.4% males; aged 14-92 years).

...read moreread less

Abstract: This study analyzed the dimensionality of the German version of Rosenberg’s Self-Esteem scale (RSES) in a nationally representative population sample of 4,988 subjects (46.4% males; aged 14–92 years). Using confirmatory factor analysis, one- and two-dimensional models were tested. Results suggest that the RSES is a two-dimensional scale comprising the highly correlated components positive and negative self-evaluation, which constitute a unitary construct of global self-esteem at the second-order level. In order to obtain a more conclusive solution, an item response theory (IRT) analysis (partial credit model) was conducted. Results lend support to a one-dimensional view of the RSES. Furthermore, psychometric properties and norm values based on the representative sample are reported. Analyses revealed extremely high response probabilities for all items, as a consequence of which self-esteem cannot be differentiated at the upper end of the range.

...read moreread less

Journal Article•DOI•

Performance of the Generalized S‐X2 Item Fit Index for Polytomous IRT Models

[...]

Taehoon Kang, Troy T. Chen

01 Dec 2008-Journal of Educational Measurement

TL;DR: In this paper, the utility of S-X2 to polytomous IRT models, including the generalized partial credit model, partial credit models, and rating scale model, was investigated in terms of empirical Type I error rates and power.

...read moreread less

Abstract: Orlando and Thissen's S-X2item fit index has performed better than traditional item fit statistics such as Yen's Q1 and McKinley and Mill's G2 for dichotomous item response theory (IRT) models. This study extends the utility of S-X2to polytomous IRT models, including the generalized partial credit model, partial credit model, and rating scale model. The performance of the generalized S-X2in assessing item model fit was studied in terms of empirical Type I error rates and power and compared to G2. The results suggest that the generalized S-X2is promising for polytomous items in educational and psychological testing programs.

...read moreread less

Journal Article•DOI•

Measurement in cross-cultural neuropsychology.

[...]

Otto Pedraza¹, Dan M Mungas²•Institutions (2)

Mayo Clinic¹, University of California, Davis²

24 Sep 2008-Neuropsychology Review

TL;DR: Several topics relevant to the measurement of cognitive abilities across groups from diverse ancestral origins are considered, including fairness and bias, equivalence, diagnostic validity, item response theory, and differential item functioning.

...read moreread less

Abstract: The measurement of cognitive abilities across diverse cultural, racial, and ethnic groups has a contentious history, with broad political, legal, economic, and ethical repercussions Advances in psychometric methods and converging scientific ideas about genetic variation afford new tools and theoretical contexts to move beyond the reflective analysis of between-group test score discrepancies Neuropsychology is poised to benefit from these advances to cultivate a richer understanding of the factors that underlie cognitive test score disparities To this end, the present article considers several topics relevant to the measurement of cognitive abilities across groups from diverse ancestral origins, including fairness and bias, equivalence, diagnostic validity, item response theory, and differential item functioning

...read moreread less

Journal Article•DOI•

Item response theory facilitated cocalibrating cognitive tests and reduced bias in estimated rates of decline

[...]

Paul K. Crane¹, Kaavya Narasimhalu¹, Laura E. Gibbons¹, Dan M Mungas², Sebastien Haneuse³, Eric B. Larson³, Lewis H. Kuller⁴, Kathleen S. Hall⁵, Gerald van Belle¹ - Show less +5 more•Institutions (5)

University of Washington¹, University of California, Davis², Group Health Cooperative³, University of Pittsburgh⁴, Indiana University – Purdue University Indianapolis⁵

01 Oct 2008-Journal of Clinical Epidemiology

TL;DR: Cocalibration allows direct comparison of cognitive functioning in studies using any of these four tests, and standard scoring appears to be a poor choice for analysis of longitudinal cognitive testing data.

...read moreread less

Journal Article•DOI•

Using Response Times for Item Selection in Adaptive Testing

[...]

Willem J. van der Linden¹•Institutions (1)

University of Twente¹

01 Mar 2008-Journal of Educational and Behavioral Statistics

TL;DR: This research used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the ability and speed parameters in the population of test takers to retrofit an empirical prior distribution for the ability parameter on each occurrence of a new response time.

...read moreread less

Abstract: Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the ability and speed parameters in the population of test takers. The framework allows the author to retrofit an empirical prior distribution for the ability parameter on each occurrence of a new response time. In an example with an adaptive version of the Law School Admission Test (LSAT), the author shows how this additional update of the posterior distribution of the ability leads to a substantial improvement of the ability estimator. Two ways of applying the procedure in real-world adaptive testing are discussed.

...read moreread less

Journal Article•DOI•

Undesired Variance Due to Examiner stringency/leniency Effect in Communication Skill Scores Assessed in OSCEs

[...]

Peter H. Harasym¹, Wayne Woloschuk¹, Leslie Cunning¹•Institutions (1)

University of Calgary¹

01 Dec 2008-Advances in Health Sciences Education

TL;DR: Exploring the benefits of examiner training and employing “true” scores generated using Item Response Theory analyses prior to making pass/fail decisions are recommended.

...read moreread less

Abstract: Physician-patient communication is a clinical skill that can be learned and has a positive impact on patient satisfaction and health outcomes. A concerted effort at all medical schools is now directed at teaching and evaluating this core skill. Student communication skills are often assessed by an Objective Structure Clinical Examination (OSCE). However, it is unknown what sources of error variance are introduced into examinee communication scores by various OSCE components. This study primarily examined the effect different examiners had on the evaluation of students' communication skills assessed at the end of a family medicine clerkship rotation. The communication performance of clinical clerks from Classes 2005 and 2006 were assessed using six OSCE stations. Performance was rated at each station using the 28-item Calgary-Cambridge guide. Item Response Theory analysis using a Multifaceted Rasch model was used to partition the various sources of error variance and generate a "true" communication score where the effects of examiner, case, and items are removed. Variance and reliability of scores were as follows: communication scores (.20 and .87), examiner stringency/leniency (.86 and .91), case (.03 and .96), and item (.86 and .99), respectively. All facet scores were reliable (.87-.99). Examiner variance (.86) was more than four times the examinee variance (.20). About 11% of the clerks' outcome status shifted using "true" rather than observed/raw scores. There was large variability in examinee scores due to variation in examiner stringency/leniency behaviors that may impact pass-fail decisions. Exploring the benefits of examiner training and employing "true" scores generated using Item Response Theory analyses prior to making pass/fail decisions are recommended.

...read moreread less

Journal Article•DOI•

SEM of another flavour: two new applications of the supplemented EM algorithm.

[...]

Li Cai¹•Institutions (1)

University of California, Los Angeles¹

01 Nov 2008-British Journal of Mathematical and Statistical Psychology

TL;DR: The supplemented EM (SEM) algorithm is applied to address two goodness-of-fit testing problems in psychometrics and provides a convenient computational procedure that leads to an asymptotically chi-squared goodness- of-fit statistic for the 'two-stage EM' procedure of fitting covariance structure models in the presence of missing data.

...read moreread less

Abstract: The supplemented EM (SEM) algorithm is applied to address two goodness-of-fit testing problems in psychometrics. The first problem involves computing the information matrix for item parameters in item response theory models. This matrix is important for limited-information goodness-of-fit testing and it is also used to compute standard errors for the item parameter estimates. For the second problem, it is shown that the SEM algorithm provides a convenient computational procedure that leads to an asymptotically chi-squared goodness-of-fit statistic for the ‘two-stage EM’ procedure of fitting covariance structure models in the presence of missing data. Both simulated and real data are used to illustrate the proposed procedures.

...read moreread less

Journal Article•DOI•

Detection and validation of unscalable item score patterns using item response theory:: An illustration with Harter's Self-Perception Profile for Children

[...]

Rob R. Meijer¹, Iris J. L. Egberink¹, Wilco H. M. Emons², Klaas Sijtsma²•Institutions (2)

University of Twente¹, Tilburg University²

17 Apr 2008-Journal of Personality Assessment

TL;DR: Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation, and recommended investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.

...read moreread less

Abstract: We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985)Self-Perception Profile for Children in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.

...read moreread less

Journal Article•DOI•

Estimation of Item Response Theory Parameters in the Presence of Missing Data

[...]

Holmes Finch¹•Institutions (1)

Ball State University¹

01 Sep 2008-Journal of Educational Measurement

TL;DR: In this paper, a number of data imputation methods have been developed outside of the Item Response Theory (IRT) framework and have been shown to be effective tools for dealing with missing data.

...read moreread less

Abstract: Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same time, a number of data imputation methods have been developed outside of the IRT framework and been shown to be effective tools for dealing with missing data. The current study takes several of these methods that have been found to be useful in other contexts and investigates their performance with IRT data that contain missing values. Through a simulation study, it is shown that these methods exhibit varying degrees of effectiveness in terms of imputing data that in turn produce accurate sample estimates of item difficulty and discrimination parameters. Psychometricians and other measurement professionals are familiar with the phenomenon of missing item responses for both cognitive and affective assessments. For example, examinees may leave one or more items unanswered either inadvertently or because they do not know the answer and are afraid to guess. Respondents to a questionnaire might feel inhibited in answering items dealing with a sensitive topic, leading to missing data. Much research has been conducted regarding the im

...read moreread less

Journal Article•DOI•

Fitting a Mixture Item Response Theory Model to Personality Questionnaire Data: Characterizing Latent Classes and Investigating Possibilities for Improving Prediction:

[...]

Annette M. Maij-de Meij¹, Henk Kelderman¹, Henk van der Flier¹•Institutions (1)

VU University Amsterdam¹

16 Apr 2008-Applied Psychological Measurement

TL;DR: This paper applied mixture item response theory (IRT) models to personality tests and found that a three-class mixture version of the nominal response model was the best fitting model for the Extroversion and Neuroticism scales of the Amsterdam Biographical Questionnaire.

...read moreread less

Abstract: Mixture item response theory (IRT) models aid the interpretation of response behavior on personality tests and may provide possibilities for improving prediction. Heterogeneity in the population is modeled by identifying homogeneous subgroups that conform to different measurement models. In this study, mixture IRT models were applied to the Extroversion and Neuroticism scales of the Amsterdam Biographical Questionnaire, and a three-class mixture version of the nominal response model was identified as the best fitting model. The latent classes differed with respect to social desirability and ethnic background. Within latent classes, response tendencies demonstrated a differential use of the ``?'' category. An important issue is whether applying mixture IRT models results in a better prediction of relevant external criteria compared to a one-class model. For the Neuroticism scale the prediction improved, but not for the Extraversion scale. The results demonstrate the possible advantage of applying mixture I...

...read moreread less

Journal Article•DOI•

Longitudinal Stability of the Fugl-Meyer Assessment of the Upper Extremity

[...]

Michelle L. Woodbury¹, Michelle L. Woodbury², Craig A. Velozo¹, Craig A. Velozo², Lorie Richards¹, Lorie Richards², Pamela W. Duncan³, Stephanie A. Studenski⁴, Sue Min Lai⁵ - Show less +5 more•Institutions (5)

Veterans Health Administration¹, University of Florida², Duke University³, University of Pittsburgh⁴, University of Kansas⁵

01 Aug 2008-Archives of Physical Medicine and Rehabilitation

TL;DR: The 30-item Fugl-Meyer assessment shows a longitudinally stable item difficulty order and is valid for measuring volitional arm motor ability over time and had no practical consequences on the longitudinal measurement of person ability.

...read moreread less

Journal Article•DOI•

Psychometric properties of the PsychoSomatic Problems scale : a Rasch analysis on adolescent data

[...]

Curt Hagquist¹•Institutions (1)

Karlstad University¹

01 May 2008-Social Indicators Research

TL;DR: In this article, the authors analyse the psychometric properties of the PSP-scale by means of the Rasch model, with a focus on the operating characteristics of the items, and show that the scale adequately meets measurement criteria of invariance and proper categorisation of items.

...read moreread less

Abstract: The PsychoSomatic Problems (PSP)-scale is built upon eight items intended to tap information about psychosomatic problems among schoolchildren and adolescents in general populations. The purpose of the study is to analyse the psychometric properties of the PSP-scale by means of the Rasch model, with a focus on the operating characteristics of the items. Cross-sectional adolescent data collected in Sweden at six points in time between 1988 and 2005 are used for the analysis. In all more than 15,000 students aged 15–16 are included in the analysis. Data were examined with respect to invariance across the latent trait, Differential Item Functioning (DIF), item categorisation and unidimensionality. The results show that the PSP-scale adequately meets measurement criteria of invariance and proper categorisation of the items. Also the targeting is good and the reliability is high. Since the scale works invariantly across years of investigation it is appropriate for re-current monitoring of psychosomatic health complaints in general populations of adolescents. Taking DIF into account through principles of equating provides a scale that shows no statistically significant signs of gender-DIF enabling invariant comparisons also between boys and girls.

...read moreread less

Journal Article•DOI•

Improvements in short-form measures of health status: introduction to a series.

[...]

Jr. John E. Ware

01 Jan 2008-Journal of Clinical Epidemiology

Journal Article•DOI•

Bayesian Multidimensional IRT Models With a Hierarchical Structure

[...]

Yanyan Sheng¹, Christopher K. Wikle²•Institutions (2)

Southern Illinois University Carbondale¹, University of Missouri²

01 Jun 2008-Educational and Psychological Measurement

TL;DR: The results from simulation studies as well as actual data suggest that IRT-based models with continuous latent traits can be developed and compared with the unidimensional IRT model, the proposed models better describe the actual data.

...read moreread less

Abstract: As item response models gain increased popularity in large-scale educational and measurement testing situations, many studies have been conducted on the development and applications of unidimensional and multidimensional models. Recently, attention has been paid to IRT-based models with an overall ability dimension underlying several ability dimensions specific for individual test items, where the focus is mainly on models with dichotomous latent traits. The purpose of this study is to propose such models with continuous latent traits under the Bayesian framework. The proposed models are further compared with the conventional IRT models using Bayesian model choice techniques. The results from simulation studies as well as actual data suggest that (a) such models can be developed; (b) compared with the unidimensional IRT model, the proposed models better describe the actual data; and (c) the use of the proposed IRT models and the multiunidimensional model should be based on different beliefs about the unde...

...read moreread less

Journal Article•DOI•

Modeling Nonignorable Missing Data in Speeded Tests

[...]

Cornelis A.W. Glas¹, Jonald Pimentel¹•Institutions (1)

University of Twente¹

01 Dec 2008-Educational and Psychological Measurement

TL;DR: In this paper, a combination of two item response theory (IRT) models is used for the observed response data and one for the missing data indicator, which is modeled using a sequential model with linear restrictions on the item parameters.

...read moreread less

Abstract: In tests with time limits, items at the end are often not reached. Usually, the pattern of missing responses depends on the ability level of the respondents; therefore, missing data are not ignorable in statistical inference. This study models data using a combination of two item response theory (IRT) models: one for the observed response data and one for the missing data indicator. The missing data indicator is modeled using a sequential model with linear restrictions on the item parameters. The models are connected by the assumption that the respondents' latent proficiency parameters have a joint multivariate normal distribution. Model parameters are estimated by maximum marginal likelihood. Simulations show that treating missing data as ignorable can lead to considerable bias in parameter estimates. Including an IRT model for the missing data indicator removes this bias. The method is illustrated with data from an intelligence test with a time limit.

...read moreread less

Journal Article•DOI•

Capturing abnormal personality with normal personality inventories: An item response theory approach.

[...]

Kate E. Walton¹, Brent W. Roberts², Robert F. Krueger³, Daniel M. Blonigen³, Brian M. Hicks³ - Show less +1 more•Institutions (3)

St. John's University¹, University of Illinois at Urbana–Champaign², University of Minnesota³

01 Dec 2008-Journal of Personality

TL;DR: This study uses Item Response Theory (IRT) methods to evaluate the range of the latent trait assessed with a normal personality measure and a measure of psychopathy as one example of an abnormal personality construct, and finds that the measures overlapped substantially in terms of the regions ofThe latent trait for which they provide information.

...read moreread less

Abstract: Correlational and factor-analytic methods indicate that abnormal and normal personality constructs may be tapping the same underlying latent trait. However, they do not systematically demonstrate that measures of abnormal personality capture more extreme ranges of the latent trait than measures of normal range personality. Item Response Theory (IRT) methods, in contrast, do provide this information. In the present study, we use IRT methods to evaluate the range of the latent trait assessed with a normal personality measure and a measure of psychopathy as one example of an abnormal personality construct. Contrary to the expectation that the measure of psychopathy would be more extreme than the measure of normal personality traits, the measures overlapped substantially in terms of the regions of the latent trait for which they provide information. Moreover, both types of inventories were limited in terms of measurement bandwidth, such that they did not provide information across the entire latent trait continuum. Implications and future directions are discussed.

...read moreread less

Collapse