Showing papers on "Item response theory published in 2006"

PDF

Open Access

Journal Article•

ltm: An R Package for Latent Variable Modeling and Item Response Theory Analyses

[...]

01 Jan 2006-Journal of Statistical Software

TL;DR: The LTM package ltm as discussed by the authors is developed for the analysis of multivariate dichotomous and polytomous data using latent variable models, under the Item Response Theory approach.

...read moreread less

Abstract: The R package ltm has been developed for the analysis of multivariate dichotomous and polytomous data using latent variable models, under the Item Response Theory approach. For dichotomous data the Rasch, the Two-Parameter Logistic, and Birnbaum’s Three-Parameter models have been implemented, whereas for polytomous data Semejima’s Graded Response model is available. Parameter estimates are obtained under marginal maximum likelihood using the Gauss-Hermite quadrature rule. The capabilities and features of the package are illustrated using two real data examples.

...read moreread less

835 citations

Journal Article•DOI•

Interpreting Change Scores of Tests and Measures Used in Physical Therapy

[...]

Stephen M. Haley¹, Maria A Fragala-Pinkham¹•Institutions (1)

Boston University¹

01 May 2006-Physical Therapy

TL;DR: Recommendations are provided for physical therapists who are interpreting changes in the context of clinical practice, case reports, and intervention research that include a greater application of indexes that help interpret the meaning of clinically significant change to multiple clinical, research, consumer, and payer communities.

...read moreread less

Abstract: Over the past decade, the methods and science used to describe changes in outcomes of physical therapy services have become more refined. Recently, emphasis has been placed not only on changes beyond expected measurement error, but also on the identification of changes that make a real difference in the lives of patients and families. This article will highlight a case example of how to determine and interpret "clinically significant change" from both of these perspectives. The authors also examine how to use item maps within an item response theory model to enhance the interpretation of change at a content level. Recommendations are provided for physical therapists who are interpreting changes in the context of clinical practice, case reports, and intervention research. These recommendations include a greater application of indexes that help interpret the meaning of clinically significant change to multiple clinical, research, consumer, and payer communities.

...read moreread less

766 citations

Journal Article•DOI•

Teacher's Corner: The MACS Approach to Testing for Multigroup Invariance of a Second-Order Structure--A Walk through the Process.

[...]

Barbara M. Byrne¹, Sunita M. Stewart²•Institutions (2)

University of Ottawa¹, University of Texas at Austin²

28 Apr 2006-Structural Equation Modeling

TL;DR: In this article, the authors exemplify strategies associated with tests for measurement invariance that are uncommonly applied and reported in the extant literature and exemplify the use of the MACS approach in testing for an invariant higher-order factor structure, and tests for latent mean differences relative to both levels of a higher order factor structure.

...read moreread less

Abstract: The overarching intent of this article is to exemplify strategies associated with tests for measurement invariance that are uncommonly applied and reported in the extant literature. Designed within a pedagogical framework, the primary purposes are 3-fold and illustrate (a) tests for measurement invariance based on the analysis of means and covariance structures (MACS), (b) use of the MACS approach in testing for an invariant higher order factor structure, and (c) tests for latent mean differences relative to both levels of a higher order factor structure. Addressing additional application limitations, the secondary purposes are 2-fold and illustrate (a) determination of invariance based on two substantially different sets of criteria and (b) interpretation of noninvariant measurement items within the context of an item response theory perspective. We are hopeful that readers will find the didactic approach to be helpful in gaining a better understanding of the invariance-testing process.

...read moreread less

416 citations

Journal Article•DOI•

Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy.

[...]

Stephen Stark¹, Oleksandr S. Chernyshenko², Fritz Drasgow³•Institutions (3)

University of South Florida¹, University of Canterbury², University of Illinois at Urbana–Champaign³

01 Nov 2006-Journal of Applied Psychology

TL;DR: A common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory and results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model.

...read moreread less

Abstract: In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact.

...read moreread less

358 citations

Journal Article•DOI•

Measuring General Self-Efficacy: A Comparison of Three Measures Using Item Response Theory

[...]

Charles A. Scherbaum, Yochi Cohen-Charash, Michael J. Kern¹•Institutions (1)

City University of New York¹

01 Dec 2006-Educational and Psychological Measurement

TL;DR: In this article, the reliability of responses to the items, as well as the item parameters of three GSE measures using item response theory, were examined, and the results indicate that the New General Self-Efficacy Scale has a slight advantage over the other measures examined in this study in terms of the item discrimination, item information, and relative efficiency of the test information function.

...read moreread less

Abstract: General self-efficacy (GSE), individuals'belief in their ability to perform well in a variety of situations, has been the subject of increasing research attention. However, the psychometric properties (e.g., reliability, validity) associated with the scores on GSE measures have been criticized, which has hindered efforts to further establish the construct of GSE. This study examines the reliability of responses to the items, as well as the item parameters of three GSE measures using item response theory. Contrary to the criticisms, the responses to the items on all three measures of GSE demonstrate acceptable psychometric properties, especially at lower levels of GSE. The results indicate that the New General Self-Efficacy Scale has a slight advantage over the other measures examined in this study in terms of the item discrimination, item information, and relative efficiency of the test information function. Implications for GSE research are discussed.

...read moreread less

336 citations

Journal Article•DOI•

Evidence of Validity for the Hip Outcome Score

[...]

RobRoy L. Martin¹, Bryan T. Kelly², Marc J. Philippon³•Institutions (3)

Duquesne University¹, Cornell University², University of Pittsburgh³

01 Dec 2006-Arthroscopy

TL;DR: The results of this study found that the HOS ADL and sports subscales were unidimensional, had adequate internal consistency, were potentially responsive across the spectrum of ability, and contributed information across the range of ability.

...read moreread less

Abstract: Purpose: The purpose of this study was to offer evidence of validity for the Hip Outcome Score (HOS) based on internal structure, test content, and relation to other variables. Methods: The study population consisted of 507 subjects with a labral tear. Internal structure was evaluated by use of factor analysis and coefficient α. Test content was evaluated by use of item response theory. Pearson correlation coefficients were used to assess relations between the Short Form 36 and the HOS. Results: The mean subject age was 38 years (range, 13 to 66 years), with 232 male and 273 female subjects. Of the subjects, 263 (52%) underwent arthroscopic surgery. Factor analysis found that 17 of 19 items on the activities-of-daily-living (ADL) subscale loaded on 1 factor. The 2 items that did not fit the 1-factor model were omitted from further testing. All 9 items on the sports subscale loaded on 1 factor. The coefficient α values were .96 and .95 for the ADL and sports subscales, respectively. The errors associated with a single measure were ±4.6 and ±3.8 points for the ADL and sports subscales, respectively. Item response theory found that all items contributed to their test information curves and were potentially responsive. The correlations between the HOS and Short Form 36 measures of physical function were significantly different than their correlation to measures of mental functioning ( P Conclusions: The results of this study provide evidence of validity to support the use of the HOS ADL and sports subscales for individuals with labral tears. This includes individuals who underwent arthroscopic surgery, as well as those who did not. Specifically, the results of this study found that the HOS ADL and sports subscales were unidimensional, had adequate internal consistency, were potentially responsive across the spectrum of ability, and contributed information across the spectrum of ability. In addition, scores obtained by the HOS related to measures of function and did not relate to measures of mental health. Level of Evidence: Level III, development of diagnostic criteria with nonconsecutive patients.

...read moreread less

311 citations

Journal Article•DOI•

A lognormal model for response times on test items

[...]

Willem J. van der Linden¹•Institutions (1)

University of Twente¹

20 Jun 2006-Journal of Educational and Behavioral Statistics

TL;DR: A lognormal model for the response times of a person on a set of test items is investigated, with an excellent fit to the data, whereas the normal model seemed unable to allow for a characteristic skewness of the response time distributions.

...read moreread less

Abstract: A lognormal model for the response times of a person on a set of test items is investigated. The model has a parameter structure analogous to the two-parameter logistic response models in item response theory, with a parameter for the speed of each person as well as parameters for the time intensity and discriminating power of each item. It is shown how these parameters can be estimated by a Markov chain Monte Carlo method (Gibbs sampler). The method was used to analyze response times for the adaptive version of a test from the Armed Services Vocational Aptitude Battery. The same data set was used to test the validity of the model against a normal model using posterior predictive checks on the response times. The lognormal model showed an excellent fit to the data, whereas the normal model seemed unable to allow for a characteristic skewness of the response time distributions. The addition of an equality constraint on the discrimination parameters led only to a slight loss of fit. The potential use of the model for improving the daily practice of testing is indicated.

...read moreread less

294 citations

Journal Article•DOI•

The Short Mood and Feelings Questionnaire (SMFQ): A Unidimensional Item Response Theory and Categorical Data Factor Analysis of Self-Report Ratings from a Community Sample of 7-through 11-Year-Old Children

[...]

Carla Sharp¹, Ian M. Goodyer², Tim Croudace²•Institutions (2)

Baylor College of Medicine¹, University of Cambridge²

29 Apr 2006-Journal of Abnormal Child Psychology

TL;DR: This work provides the first combined IRT and CDFA analysis of a clinical measure (the SMFQ) in a community sample of 7-through 11-year-old children, and confirms its scaling properties as a potential dimensional measure of symptom severity of childhood depression in community samples.

...read moreread less

Abstract: Item response theory (IRT) and categorical data factor analysis (CDFA) are complementary methods for the analysis of the psychometric properties of psychiatric measures that purport to measure latent constructs. These methods have been applied to relatively few child and adolescent measures. We provide the first combined IRT and CDFA analysis of a clinical measure (the Short Mood and Feelings Questionnaire—SMFQ) in a community sample of 7-through 11-year-old children. Both latent variable models supported the internal construct validity of a single underlying continuum of severity of depressive symptoms. SMFQ items discriminated well at the more severe end of the depressive latent trait. Item performance was not affected by age, although age correlated significantly with latent SMFQ scores suggesting that symptom severity increased within the age period of 7–11. These results extend existing psychometric studies of the SMFQ and confirm its scaling properties as a potential dimensional measure of symptom severity of childhood depression in community samples.

...read moreread less

263 citations

Journal Article•DOI•

Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

[...]

Paul K. Crane¹, Laura E. Gibbons, Lance Jolley, Gerald van Belle•Institutions (1)

University of Washington¹

01 Nov 2006-Medical Care

TL;DR: The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for Dif detection.

...read moreread less

Abstract: Introduction:We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic reg

...read moreread less

261 citations

Journal Article•DOI•

Diagnostic Assessment with Ordered Multiple-Choice Items

[...]

Derek C. Briggs¹, Alicia Alonzo², Cheryl Schwab³, Mark Wilson³•Institutions (3)

University of Colorado Boulder¹, Stanford University², University of California, Berkeley³

01 Jan 2006-Educational Assessment

TL;DR: In this article, the authors describe the development, analysis, and interpretation of a novel item format called Ordered Multiple-Choice (OMC), which is linked to a model of student cognitive development for the construct being measured.

...read moreread less

Abstract: In this article we describe the development, analysis, and interpretation of a novel item format we call Ordered Multiple-Choice (OMC). A unique feature of OMC items is that they are linked to a model of student cognitive development for the construct being measured. Each of the possible answer choices in an OMC item is linked to developmental levels of student understanding, facilitating the diagnostic interpretation of student item responses. OMC items seek to provide greater diagnostic utility than typical multiple-choice items, while retaining their efficiency advantages. On the one hand, sets of OMC items provide information about the developmental understanding of students that is not available with traditional multiple-choice items; on the other hand, this information can be provided to schools, teachers, and students quickly and reliably, unlike traditional open-ended test items.

...read moreread less

258 citations

Journal Article•DOI•

When does measurement invariance matter

[...]

Denny Borsboom¹•Institutions (1)

University of Amsterdam¹

01 Nov 2006-Medical Care

TL;DR: Through this work, it has become clear that differences in raw scores of different groups cannot be used to infer group differences in theoretical attributes unless the test scores accord with a particular set of model invariance restrictions.

...read moreread less

Abstract: question whether observed differences in psychometric test scores can be attributed to differences in the properties that such tests measure is relevant in many research domains; examples include the proper interpretation of differences in intelligence test scores across different generations of people,1 gender differences in affectivity,2 and crosscultural differences in personality. This question also has generated some of the most conspicuous controversies in the social and life sciences, where the highest temperature in the many heated discussions around the topic has, without a doubt, been reached in the debate on IQ-score differences between ethnic groups in the United States.4'5 Such debates are often unproductive because of a lack of unambiguous characterizations of concepts like "biased," "incomparable," and "culture-fair." Terms are easily coined, as is illustrated by Johnson's6 count of no less than 55 types of measurement equivalence; however, it is often less easy to spell out their meaning in terms of their empirical consequences. However, without at least some degree of precision in one's conception of a term like "equivalence," it is difficult to have a scientifically productive debate, or even to agree on what aspects of empirical data are relevant for answering the questions involved. It is for this reason that the establishment of concepts like measurement invariance and bias in an unambigous, formal framework with testable consequences7"9 represents a theoretical development of great importance. Through this work, it has become clear that differences in raw scores (eg, IQ-scores) of different groups (eg, blacks and whites) cannot be used to infer group differences in theoretical attributes (eg, general intelligence) unless the test scores accord with a particular set of model invariance restrictions. Namely, the same attribute must relate to the same set of observations in the same way in each group. Statistically, this means that the mathematical function that relates latent variables to the observations must be the same in each of the groups involved in the comparison.7'8 This idea has become known as the requirement of measurement invariance. The theoretical definitions of measurement invariance and bias are very general, and apply to different models, such as item response theory (IRT) and factor models, in roughly the same way.10'11 This does not hold for the empirical methods available for testing measurement invariance. In the past decades, psychometricians working on measurement invariance have produced many different statistical techniques to assess differential item functioning (DIF). These techniques usually employ different statistical assumptions, for instance, regarding the form of the relation between latent and observed variables and the shape of the population distribution on the latent variable, and employ different modeling strategies as well as selection criteria for flagging items as biased. For this reason, it is difficult to assess the consequences of choosing a particular technique; moreover, it is not always clear to what extent the choice of technique makes a difference with respect to the diagnosis of meaurement invariance and bias in applied situations. For this reason, the articles on DIF collected here (by Crane et al;12 Dorans and Kulick;13 Jones;14 Morales, Flowers, Gutierrez, Kleinman, and Teresi;15 Edelen Orlando et al16) represent a useful project in the application of bias detection methods. Each set of authors analyzes the Mini-Mental State Examination (MMSE) for measurement invariance using the same data, albeit with different methods. Together, the articles provide a

...read moreread less

Journal Article•DOI•

How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective.

[...]

André A. Rupp¹, Tracy Ferne², Hyeran Choi²•Institutions (2)

Humboldt University of Berlin¹, University of Ottawa²

01 Oct 2006-Language Testing

TL;DR: This paper provided renewed converging empirical evidence for the hypothesis that asking test-takers to respond to text passages with multiple-choice questions induces response processes that are strikingly different from those that respondents would draw on when reading in non-testing contexts.

...read moreread less

Abstract: This article provides renewed converging empirical evidence for the hypothesis that asking test-takers to respond to text passages with multiple-choice questions induces response processes that are strikingly different from those that respondents would draw on when reading in non-testing contexts. Moreover, the article shows that the construct of reading comprehension is assessment specific and is fundamentally determined through item design and text selection. The data come from qualitative analyses of 10 cognitive interviews conducted with non-native adult English readers who were given three passages with several multiple-choice questions from the CanTEST, a large-scale language test used for admission and placement purposes in Canada, in a partially counter-balanced design. The analyses show that:• There exist multiple different representations of the construct of ‘reading comprehension’ that are revealed through the characteristics of the items.• Learners view responding to multiple-choice questions ...

...read moreread less

Journal Article•DOI•

An Application of Item Response Time: The Effort‐Moderated IRT Model

[...]

Steven L. Wise¹, Christine E. DeMars¹•Institutions (1)

James Madison University¹

01 Mar 2006-Journal of Educational Measurement

TL;DR: The authors introduced an effort-moderated IRT model, which incorporates item response time into proficiency estimation and item parameter estimation, and found that the effortmoded model showed better model fit, yielded more accurate item parameter estimates, more accurately estimated test information, and yielded proficiency estimates with higher convergent validity.

...read moreread less

Abstract: The validity of inferences based on achievement test scores is dependent on the amount of effort that examinees put forth while taking the test. With low-stakes tests, for which this problem is particularly prevalent, there is a consequent need for psychometric models that can take into account differing levels of examinee effort. This article introduces the effort-moderated IRT model, which incorporates item response time into proficiency estimation and item parameter estimation. In two studies of the effort-moderated model when rapid guessing (i.e., reflecting low examinee effort) was present, one based on real data and the other on simulated data, the effort-moderated model performed better than the standard 3PL model. Specifically, it was found that the effort-moderated model (a) showed better model fit, (b) yielded more accurate item parameter estimates, (c) more accurately estimated test information, and (d) yielded proficiency estimates with higher convergent validity.

...read moreread less

Journal Article•DOI•

Posterior Predictive Assessment of Item Response Theory Models

[...]

Sandip Sinharay, Matthew S. Johnson¹, Hal S. Stern²•Institutions (2)

Baruch College¹, University of California, Irvine²

01 Jul 2006-Applied Psychological Measurement

TL;DR: This article examines the performance of a number of discrepancy measures for assessing different aspects of fit of the common IRT models and makes specific recommendations about what measures are most useful in assessing model fit.

...read moreread less

Abstract: Model checking in item response theory (IRT) is an underdeveloped area. There is no universally accepted tool for checking IRT models. The posterior predictive model-checking method is a popular Bayesian model-checking tool because it has intuitive appeal, is simple to apply, has a strong theoretical basis, and can provide graphical or numerical evidence about model misfit. An important issue with the application of the posterior predictive model-checking method is the choice of a discrepancy measure (which plays a role like that of a test statistic in traditional hypothesis tests). This article examines the performance of a number of discrepancy measures for assessing different aspects of fit of the common IRT models and makes specific recommendations about what measures are most useful in assessing model fit. Graphical summaries of model-checking results are demonstrated to provide useful insights about model fit.

...read moreread less

Journal Article•DOI•

Examining assumptions about item responding in personality assessment: should ideal point methods be considered for scale development and scoring?

[...]

Stephen Stark¹, Oleksandr S. Chernyshenko², Fritz Drasgow³, Bruce A. Williams³•Institutions (3)

University of South Florida¹, University of Canterbury², University of Illinois at Urbana–Champaign³

01 Jan 2006-Journal of Applied Psychology

TL;DR: The authors' results indicate that ideal point models can provide as good or better fit to personality items than do dominance models because they can fit monotonically increasing item response functions but do not require this property.

...read moreread less

Abstract: The present study investigated whether the assumptions of an ideal point response process, similar in spirit to Thurstone's work in the context of attitude measurement, can provide viable alternatives to the traditionally used dominance assumptions for personality item calibration and scoring. Item response theory methods were used to compare the fit of 2 ideal point and 2 dominance models with data from the 5th edition of the Sixteen Personality Factor Questionnaire (S. Conn & M. L. Rieke, 1994). The authors' results indicate that ideal point models can provide as good or better fit to personality items than do dominance models because they can fit monotonically increasing item response functions but do not require this property. Several implications of these findings for personality measurement and personnel selection are described.

...read moreread less

Journal Article•DOI•

Development of an easy-to-use Spanish Health Literacy test.

[...]

Shoou Yih D. Lee¹, Deborah E. Bender¹, Rafael E. Ruiz¹, Young Ik Cho²•Institutions (2)

University of North Carolina at Chapel Hill¹, University of Illinois at Chicago²

01 Aug 2006-Health Services Research

TL;DR: The new SAHLSA-50, a health literacy test for Spanish-speaking Adults, has good reliability and validity and could be used in the clinical or community setting to screen for low health literacy among Spanish speakers.

...read moreread less

Abstract: Objective. The study was intended to develop and validate a health literacy test, termed theShortAssessmentofHealthLiteracyforSpanish-speakingAdults (SAHLSA), for the Spanish-speaking population. Study Design. The design of SAHLSA was based on the Rapid Estimate of Adult Literacy in Medicine (REALM), known as the most easily administered tool for assessing health literacy in English. In addition to the word recognition test in REALM, SAHLSA incorporates a comprehension test using multiple-choice questions designed by an expert panel. Data Collection. Validation of SAHLSA involved testing and comparing the tool with other health literacy instruments in a sample of 201 Spanish-speaking and 202 English-speaking subjects recruited from the Ambulatory Care Center at UNC Health Care. Principal Findings. With only the word recognition test, REALM could not differentiate the level of health literacy in Spanish. The SAHLSA significantly improved the differentiation. Item response theory analysis was performed to calibrate the SAHLSA and reduce the instrument to 50 items. The resulting instrument, SAHLSA-50, was correlated with the Test of Functional Health Literacy in Adults, another health literacy instrument, at r 5 0.65. The SAHLSA-50 score was significantly and positively associated with the physical health status of Spanish-speaking subjects (po.05), holding constant age and years of education. The instrument displayed good internal reliability (Cronbach’s a 5 0.92) and test–retest reliability (Pearson’s r 5 0.86). Conclusions. The new instrument, SAHLSA-50, has good reliability and validity. It could be used in the clinical or community setting to screen for low health literacy among Spanish speakers.

...read moreread less

Journal Article•DOI•

Measuring preschool attainment of print-concept knowledge: a study of typical and at-risk 3- to 5-year-old children using item response theory.

[...]

Laura M. Justice¹, Ryan P. Bowles¹, Lori E. Skibbe¹•Institutions (1)

University of Virginia¹

01 Jul 2006-Language Speech and Hearing Services in Schools

TL;DR: The results showed the Preschool Word and Print Awareness to be suitable for measuring preschoolers' PCK and to be sensitive to differences among children as a function of risk status.

...read moreread less

Abstract: Purpose This research determined the psychometric quality of a criterion-referenced measure that was thought to measure preschoolers' print-concept knowledge (PCK). Method This measure, titled the ...

...read moreread less

Journal Article•DOI•

The Dimensionality of Language Ability in School-Age Children

[...]

J. Bruce Tomblin¹, Xuyang Zhang¹•Institutions (1)

University of Iowa¹

01 Dec 2006-Journal of Speech Language and Hearing Research

TL;DR: There is a developmental trend during middle childhood for grammatical abilities and vocabulary abilities to become differentiated, but standardized measures do not provide differential information concerning receptive and expressive abilities.

...read moreread less

Abstract: Purpose This study asked if children’s performance on language tests reflects different dimensions of language and if this dimensionality changes with development. Method Children were given standardized language batteries at kindergarten and at second, fourth, and eighth grades. A revised modified parallel analysis was used to determine the dimensionality of these items at each grade level. A confirmatory factor analysis was also performed on the subtest scores to evaluate alternate models of dimensionality. Results The revised modified parallel analysis revealed a single dimension across items with evidence of either test specific or language area specific minor dimensions at different ages. The confirmatory factor analysis tested models involving modality (receptive or expressive) and domain (vocabulary or sentence use) against a single-dimension model. The 2-dimensional model involving domains of vocabulary and sentence use fit the data better than the single-dimensional model; however, the single-dim...

...read moreread less

Journal Article•DOI•

The factor structure and screening utility of the Social Interaction Anxiety Scale.

[...]

Thomas L. Rodebaugh¹, Carol M. Woods², Richard G. Heimberg¹, Michael R. Liebowitz, Franklin R. Schneier - Show less +1 more•Institutions (2)

Temple University¹, Washington University in St. Louis²

01 Jun 2006-Psychological Assessment

TL;DR: Overall, this study provides support for the excellent properties of the SIAS's straightforwardly worded items, although questions remain regarding its reverse-scored items.

...read moreread less

Abstract: The widely used Social Interaction Anxiety Scale (SIAS; R. P. Mattick & J. C. Clarke, 1998) possesses favorable psychometric properties, but questions remain concerning its factor structure and item properties. Analyses included 445 people with social anxiety disorder and 1,689 undergraduates. Simple unifactorial models fit poorly, and models that accounted for differences due to item wording (i.e., reverse scoring) provided superior fit. It was further found that clients and undergraduates approached some items differently, and the SIAS may be somewhat overly conservative in selecting analogue participants from an undergraduate sample. Overall, this study provides support for the excellent properties of the SIAS's straightforwardly worded items, although questions remain regarding its reverse-scored items.

...read moreread less

Journal Article•DOI•

The concept of life satisfaction across cultures: An IRT analysis

[...]

Shigehiro Oishi¹•Institutions (1)

University of Virginia¹

01 Aug 2006-Journal of Research in Personality

TL;DR: This paper examined measurement equivalence of the Satisfaction with Life Scale between American and Chinese samples using multigroup Structural Equation Modeling (SEM), Multiple Indicator Multiple Cause Model (MIMIC), and Item Response Theory (IRT).

...read moreread less

Journal Article•DOI•

An Investigation of the Differential Effort Received by Items on a Low-Stakes Computer-Based Test.

[...]

Steven L. Wise

01 Apr 2006-Applied Measurement in Education

TL;DR: In this article, the authors investigated the use of response time to assess the amount of examinee effort received by individual test items and found that the strongest predictors of the effort required by items were item length (i.e., how much reading or scanning was required).

...read moreread less

Abstract: In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found that the strongest predictors of the effort received by items were item length (i.e., how much reading or scanning was required) and item position. In addition, it was found that by treating item responses resulting from rapid guesses as missing, item means and item-total correlations were differentially affected and test score reliability decreased, whereas validity increased. Several implications of these results for low-stakes testing are discussed.

...read moreread less

Journal Article•DOI•

Item response theory analysis of diagnostic criteria for alcohol and cannabis use disorders in adolescents: implications for DSM-V.

[...]

Christopher S. Martin¹, Tammy Chung¹, Levent Kirisci¹, James W. Langenbucher²•Institutions (2)

University of Pittsburgh¹, Rutgers University²

01 Nov 2006-Journal of Abnormal Psychology

TL;DR: The results illustrate limitations of DSM-IV criteria for alcohol and cannabis use disorders when applied to adolescents and should be informed by statistical models such as those used in the development process for the fifth edition (DSM-V).

...read moreread less

Abstract: Item response theory (IRT) has advantages over classical test theory in evaluating diagnostic criteria. In this study, the authors used IRT to characterize the psychometric properties of Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV; American Psychiatric Association, 1994) alcohol and cannabis use disorder symptoms among 472 clinical adolescents. For both substances, DSM-IV symptoms fit a model specifying a unidimensional latent trait of problem severity. Threshold (severity) parameters did not distinguish abuse and dependence symptoms. Abuse symptoms of legal problems and hazardous use, and dependence symptoms of tolerance, unsuccessful attempts to quit, and physical-psychological problems, showed relatively poor discrimination of problem severity. There were gender differences in thresholds for hazardous use, legal problems, and physical-psychological problems. The results illustrate limitations of DSM-IV criteria for alcohol and cannabis use disorders when applied to adolescents. The development process for the fifth edition (DSM-V) should be informed by statistical models such as those used in this study.

...read moreread less

Journal Article•DOI•

A Comparison of Alternative Models for Testlets

[...]

Yanmei Li¹, Daniel M. Bolt², Jianbin Fu¹•Institutions (2)

American Institutes for Research¹, University of Wisconsin-Madison²

01 Jan 2006-Applied Psychological Measurement

TL;DR: In this article, a testlet-based item response theory (IRT) model is proposed to deal with the local dependence present among items within a common testlet, where testlets are made up of testlets.

...read moreread less

Abstract: When tests are made up of testlets, standard item response theory (IRT) models are often not appropriate due to the local dependence present among items within a common testlet. A testlet-based IRT...

...read moreread less

Journal Article•DOI•

Using Effect Sizes for Research Reporting: Examples Using Item Response Theory to Analyze Differential Item Functioning.

[...]

Lynne Steinberg¹, David Thissen²•Institutions (2)

University of Houston¹, University of North Carolina at Chapel Hill²

01 Dec 2006-Psychological Methods

TL;DR: The authors show that the size of the effect can be expressed by a presentation of the values of the parameter estimates derived from the fitted model, and develop a case study of the description of effect size for research reporting in the context of item response theory.

...read moreread less

Abstract: The psychological literature currently emphasizes reporting the "effect size" of research findings in addition to the outcome of any tests of significance. However, some confusion may result from the fact that there are three distinct uses of effect sizes in the psychological literature, namely, power analysis, research synthesis, and research reporting. The authors review these uses of effect sizes and develop a case study of the description of effect size for research reporting in the context of item response theory. For many parametric models, hypotheses are tested by comparing the values of directly interpretable parameters. The authors show that the size of the effect can be expressed by a presentation of the values of the parameter estimates derived from the fitted model. Studies that use item response theory to detect differential item functioning provide illustrations.

...read moreread less

Journal Article•DOI•

Measurement of Alcohol-Related Consequences among High School and College Students: Application of Item Response Models to the Rutgers Alcohol Problem Index.

[...]

Derrick J. Neal¹, Derrick J. Neal², William R. Corbin³, Kim Fromme²•Institutions (3)

Kent State University¹, University of Texas at Austin², Yale University³

01 Dec 2006-Psychological Assessment

TL;DR: A reduced 18-item measure demonstrating strong clinical utility is proposed, with scores of 8 or greater implying greater need for treatment.

...read moreread less

Abstract: The Rutgers Alcohol Problem Index (RAPI; H. R. White & E. W. Labouvie, 1989) is a frequently used measure of alcohol-related consequences in adolescents and college students, but psychometric evaluations of the RAPI are limited and it has not been validated with college students. This study used item response theory (IRT) to examine the RAPI on students (N = 895; 65% female, 35% male) assessed in both high school and college. A series of 2-parameter IRT models were computed, examining differential item functioning across gender and time points. A reduced 18-item measure demonstrating strong clinical utility is proposed, with scores of 8 or greater implying greater need for treatment.

...read moreread less

Journal Article•DOI•

Identification of differential item functioning using item response theory and the likelihood-based model comparison approach. Application to the Mini-Mental State Examination.

[...]

Maria Orlando Edelen¹, David Thissen, Jeanne A. Teresi², Marjorie Kleinman², Katja Ocepek-Welikson - Show less +1 more•Institutions (2)

RAND Corporation¹, Columbia University²

01 Nov 2006-Medical Care

TL;DR: IRT and the likelihood-based model comparison approach comprise a powerful tool for DIF detection that can aid in the development, refinement, and evaluation of measures for use in ethnically diverse populations.

...read moreread less

Abstract: Background An important part of examining the adequacy of measures for use in ethnically diverse populations is the evaluation of differential item functioning (DIF) among subpopulations such as those administered the measure in different languages. A number of methods exist for this purpose. Objective The objective of this study was to introduce and demonstrate the identification of DIF using item response theory (IRT) and the likelihood-based model comparison approach. Methods Data come from a sample of community-residing elderly who were part of a dementia case registry. A total of 1578 participants were administered either an English (n = 913) or Spanish (n = 665) version of the 21-item Mini-Mental State Examination. IRT was used to identify language DIF in these items with the likelihood-based model comparison approach. Results : Fourteen of the 21 items exhibited significant DIF according to language of administration. However, because the direction of the identified DIF was not consistent for one language version over the other, the impact at the scale level was negligible. Conclusions IRT and the likelihood-based model comparison approach comprise a powerful tool for DIF detection that can aid in the development, refinement, and evaluation of measures for use in ethnically diverse populations.

...read moreread less

Journal Article•DOI•

Application of the Bi-Factor Multidimensional Item Response Theory Model to Testlet-Based Tests.

[...]

Christine E. DeMars¹•Institutions (1)

James Madison University¹

01 Jul 2006-Journal of Educational Measurement

TL;DR: In this article, the authors compared four item response theory (IRT) models using data from tests where multiple items were grouped into testlets focused on a common stimulus, and found that the independent-items model also yielded greater root mean square error (RMSE) for item difficulty and underestimated the item slopes.

...read moreread less

Abstract: Four item response theory (IRT) models were compared using data from tests where multiple items were grouped into testlets focused on a common stimulus. In the bi-factor model each item was treated as a function of a primary trait plus a nuisance trait due to the testlet; in the testlet-effects model the slopes in the direction of the testlet traits were constrained within each testlet to be proportional to the slope in the direction of the primary trait; in the polytomous model the item scores were summed into a single score for each testlet; and in the independent-items model the testlet structure was ignored. Using the simulated data, reliability was overestimated somewhat by the independent-items model when the items were not independent within testlets. Under these nonindependent conditions, the independent-items model also yielded greater root mean square error (RMSE) for item difficulty and underestimated the item slopes. When the items within testlets were instead generated to be independent, the bi-factor model yielded somewhat higher RMSE in difficulty and slope. Similar differences between the models were illustrated with real data.

...read moreread less

Journal Article•DOI•

Comparison of the Performance of Varimax and Promax Rotations: Factor Structure Recovery for Dichotomous Items

[...]

Holmes Finch¹•Institutions (1)

Ball State University¹

01 Mar 2006-Journal of Educational Measurement

TL;DR: In this article, the authors compare the ability of two commonly used methods of rotation, Varimax and Promax, in terms of their ability to correctly link items to factors and to identify the presence of simple structure.

...read moreread less

Abstract: Nonlinear factor analysis is a tool commonly used by measurement specialists to identify both the presence and nature of multidimensionality in a set of test items, an important issue given that standard Item Response Theory models assume a unidimensional latent structure. Results from most factor-analytic algorithms include loading matrices, which are used to link items with factors. Interpretation of the loadings typically occurs after they have been rotated in order to amplify the presence of simple structure. The purpose of this simulation study is to compare the ability of two commonly used methods of rotation, Varimax and Promax, in terms of their ability to correctly link items to factors and to identify the presence of simple structure. Results suggest that the two approaches are equally able to recover the underlying factor structure, regardless of the correlations among the factors, though the oblique method is better able to identify the presence of a “simple structure.” These results suggest that for identifying which items are associated with which factors, either approach is effective, but that for identifying simple structure when it is present, the oblique method is preferable.

...read moreread less

Journal Article•DOI•

Testing and reducing skindex-29 using Rasch analysis: Skindex-17.

[...]

Tamar Nijsten¹, Francesca Sampogna, Mary-Margaret Chren², Damiano Abeni•Institutions (2)

Erasmus University Rotterdam¹, University of California, San Francisco²

01 Jun 2006-Journal of Investigative Dermatology

TL;DR: The Skindex-29 of 454 Italian dermatological patients was subjected to Rasch analysis to investigate threshold order, differential item functioning (DIF), and item and overall fit to the model.

...read moreread less

Journal Article•DOI•

Overview of quantitative measurement methods. Equivalence, invariance, and differential item functioning in health applications.

[...]

Jeanne A. Teresi¹•Institutions (1)

Columbia University¹

01 Nov 2006-Medical Care

TL;DR: An integrated approach to the quantitative methods used in this special issue to examine measurement equivalence are provided, finding that factor analytic and DIF detection methods provide unique information and can be viewed as complementary in informing about measurement equivalences.

...read moreread less

Abstract: Background:Reviewed in this article are issues relating to the study of invariance and differential item functioning (DIF). The aim of factor analyses and DIF, in the context of invariance testing, is the examination of group differences in item response conditional on an estimate of disability. Dis

...read moreread less

Collapse