Showing papers on "Differential item functioning published in 2011"

PDF

Open Access

Journal Article•DOI•

lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations

[...]

Seung W. Choi¹, Laura E. Gibbons², Paul K. Crane²•Institutions (2)

Northwestern University¹, University of Washington²

01 Mar 2011-Journal of Statistical Software

TL;DR: The current investigation advances the technique by developing a computational platform integrating both statistical and IRT procedures into a single program, and a Monte Carlo simulation approach was incorporated to derive empirical criteria for various DIF statistics and effect size measures.

...read moreread less

Abstract: Logistic regression provides a flexible framework for detecting various types of differential item functioning (DIF). Previous efforts extended the framework by using item response theory (IRT) based trait scores, and by employing an iterative process using group--specific item parameters to account for DIF in the trait scores, analogous to purification approaches used in other DIF detection frameworks. The current investigation advances the technique by developing a computational platform integrating both statistical and IRT procedures into a single program. Furthermore, a Monte Carlo simulation approach was incorporated to derive empirical criteria for various DIF statistics and effect size measures. For purposes of illustration, the procedure was applied to data from a questionnaire of anxiety symptoms for detecting DIF associated with age from the Patient--Reported Outcomes Measurement Information System.

...read moreread less

512 citations

Journal Article•DOI•

How Item Banks and Their Application Can Influence Measurement Practice in Rehabilitation Medicine: A PROMIS Fatigue Item Bank Example

[...]

Jin Shei Lai¹, David Cella¹, Seung W. Choi¹, Doerte U. Junghaenel², Christopher Christodoulou², Richard Gershon¹, Arthur A. Stone² - Show less +3 more•Institutions (2)

Northwestern University¹, Stony Brook University²

01 Oct 2011-Archives of Physical Medicine and Rehabilitation

TL;DR: This example shows that CAT and short forms derived from the PROMIS FIB can reliably estimate fatigue reported by the U.S. general population.

...read moreread less

269 citations

Journal Article•DOI•

The estimation of item response models with the lmer function from the lme4 package in R

[...]

Paul De Boeck, Marjan Bakker, Robert J. Zwitser, Michel G. Nivard, Abe D. Hofman, Francis Tuerlinckx, Ivailo Partchev - Show less +3 more

09 Mar 2011-Journal of Statistical Software

TL;DR: In this article, the potential of the lmer function from the lme4 package in R for item response (IRT) modeling is discussed, and three broad categories of models are described: item covariate models, person covariate model, and person-by-item model.

...read moreread less

Abstract: In this paper we elaborate on the potential of the lmer function from the lme4 package in R for item response (IRT) modeling. In line with the package, an IRT framework is described based on generalized linear mixed modeling. The aspects of the framework refer to (a) the kind of covariates -- their mode (person, item, person-by-item), and their being external vs. internal to responses, and (b) the kind of effects the covariates have -- fixed vs. random, and if random, the mode across which the effects are random (persons, items). Based on this framework, three broad categories of models are described: Item covariate models, person covariate models, and person-by-item covariate models, and within each category three types of more specific models are discussed. The models in question are explained and the associated lmer code is given. Examples of models are the linear logistic test model with an error term, differential item functioning models, and local item dependency models. Because the lme4 package is for univariate generalized linear mixed models, neither the two-parameter, and three-parameter models, nor the item response models for polytomous response data, can be estimated with the lmer function.

...read moreread less

237 citations

Journal Article•DOI•

Construct Validity of the Multidimensional Structure of Bullying and Victimization: An Application of Exploratory Structural Equation Modeling.

[...]

Herbert W. Marsh¹, Benjamin Nagengast¹, Alexandre J. S. Morin², Roberto H Parada³, Rhonda Craven³, Linda R. Hamilton³ - Show less +2 more•Institutions (3)

University of Oxford¹, Université de Sherbrooke², University of Western Sydney³

01 Aug 2011-Journal of Educational Psychology

TL;DR: The authors used exploratory structural equation modeling and exploratory factor analysis to identify well-differentiated dimensions of bullying and victimization that meet standards of good measurement: goodness of fit, measurement invariance, lack of differential item functioning, and well differentiated factors that are not so highly correlated as to detract from their discriminant validity.

...read moreread less

Abstract: Existing research posits multiple dimensions of bullying and victimization but has not identified well-differentiated facets of these constructs that meet standards of good measurement: goodness of fit, measurement invariance, lack of differential item functioning, and well-differentiated factors that are not so highly correlated as to detract from their discriminant validity and substantive usefulness in school settings. Here we demonstrate exploratory structural equation modeling, an integration of confirmatory factor analysis and exploratory factor analysis. On the basis of responses to the 6-factor Adolescent Peer Relations Instrument (verbal, social, physical facets of bullying and victimization), we tested invariance of factor loadings, factor variances--covariances, item uniquenesses, item intercepts (a lack of differential item functioning), and latent means across gender, year in school, and time. Using a combination of relations with student characteristics and a multitrait--multimethod analysis, we showed that the 6 bully/victim factors have discriminant validity over time and in relation to gender, year in school, and relevant psychosocial correlates (e.g., depression, 11 components of academic and nonacademic self-concept, locus of control, attitudes toward bullies and victims). However, bullies and victims are similar in many ways, and longitudinal panel models of the positive correlations between bully and victim factors suggest reciprocal effects such that each is a cause and an effect of the other

...read moreread less

205 citations

Journal Article•DOI•

Generalized full-information item bifactor analysis.

[...]

Li Cai¹, Ji Seung Yang¹, Mark Hansen¹•Institutions (1)

University of California, Los Angeles¹

01 Sep 2011-Psychological Methods

TL;DR: An efficient full-information maximum marginal likelihood estimator is derived by extending Gibbons and Hedeker's bifactor dimension reduction method so that the optimization of the marginal log-likelihood requires only 2-dimensional integration regardless of the dimensionality of the latent variables.

...read moreread less

Abstract: Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than one group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker’s (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood only requires two-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy.

...read moreread less

198 citations

Journal Article•

Actualización psicométrica y funcionamiento diferencial de los ítems en el State Trait Anxiety Inventory (STAI)

[...]

Alejandro Guillén-Riquelme¹, Gualberto Buela-Casal¹•Institutions (1)

University of Granada¹

31 Dec 2011-Psicothema

TL;DR: In this article, the authors reviewed the current psychometric properties of the State Trait Anxiety Inventory (STAI) and compared the original and current values with a t-Test.

...read moreread less

Abstract: Psychometric revision and differential item functioning in the State Trait Anxiety Inventory (STAI). One of the psychological problems with highest prevalence is anxiety. The State Trait Anxiety Inventory is one of the instruments to measure it. This questionnaire assesses Trait Anxiety (understood as a personality factor that predisposes one to suffer from anxiety) and State Anxiety (refers to environment factors that protect from or generate anxiety). The questionnaire was adapted in Spain in 1982. Therefore, the goal of the study is to review the current psychometric properties of the STAI. A total of 1036 adults took part in the study. Cronbach's alpha reliability was .90 for Trait and .94 for State Anxiety. Factor analysis showed similar results compared with the original data. Moreover, differential item functioning (DIF) was carried out to explore sex bias. Only one of the 40 items showed DIF problems. Lastly, a t-Test was run, comparing the original and current values; whereas Trait Anxiety varied in 1 point, State Anxiety had differences of up to 6 points. In general, this result shows that the STAI has maintained adequate psychometric properties and has also been sensitive to increased environmental stimuli that produce stress.

...read moreread less

190 citations

Journal Article•DOI•

Testing Measurement Invariance: A Comparison of Multiple-Group Categorical CFA and IRT

[...]

Eun Sook Kim¹, Myeongsun Yoon¹•Institutions (1)

Texas A&M University¹

14 Apr 2011-Structural Equation Modeling

TL;DR: In this article, two major approaches in testing measurement invariance for ordinal measures: multiple-group categorical confirmatory factor analysis (MCCFA) and item response theory (IRT) were investigated.

...read moreread less

Abstract: This study investigated two major approaches in testing measurement invariance for ordinal measures: multiple-group categorical confirmatory factor analysis (MCCFA) and item response theory (IRT). Unlike the ordinary linear factor analysis, MCCFA can appropriately model the ordered-categorical measures with a threshold structure. A simulation study under various conditions was conducted for the comparison of MCCFA and IRT with respect to the power to detect the lack of invariance across groups. Both MCCFA and IRT showed reasonable power to identify the noninvariant item when differential item functioning (DIF) was large. The false positive rates were relatively high in both methods, however. The adjustment of critical values improved the performance of MCCFA by reducing false positive rates substantially and yet yielding adequate power. Alternative model fit indexes of MCCFA were also examined and they were found to be reliable to detect DIF, in general.

...read moreread less

176 citations

Journal Article•DOI•

Methodological Measurement Fruitfulness of Exploratory Structural Equation Modeling (ESEM): New Approaches to Key Substantive Issues in Motivation and Engagement.

[...]

Herbert W. Marsh¹, Gregory Arief D. Liem¹, Andrew J. Martin¹, Alexandre J. S. Morin¹, Benjamin Nagengast² - Show less +1 more•Institutions (2)

University of Sydney¹, University of Oxford²

28 Jun 2011-Journal of Psychoeducational Assessment

TL;DR: This article proposed exploratory structural equation modeling (ESEM), an integration of the best aspects of CFA and traditional exploratory factor analyses (EFA) to fit the data much better and results in substantially more differentiated (less correlated) factors than corresponding CFA models.

...read moreread less

Abstract: The most popular measures of multidimensional constructs typically fail to meet standards of good measurement: goodness of fit, measurement invariance, lack of differential item functioning, and well-differentiated factors that are not so highly correlated as to detract from their discriminant validity. Part of the problem, the authors argue, is undue reliance on overly restrictive independent cluster models of confirmatory factor analysis (ICM-CFA) in which each item loads on one, and only one, factor. Here the authors demonstrate exploratory structural equation modeling (ESEM), an integration of the best aspects of CFA and traditional exploratory factor analyses (EFA). On the basis of responses to the 11-factor Motivation and Engagement Scale (n = 7,420, Mage = 14.22), we demonstrate that ESEM fits the data much better and results in substantially more differentiated (less correlated) factors than corresponding CFA models. Guided by a 13-model taxonomy of ESEM full-measurement (mean structure) invarianc...

...read moreread less

169 citations

Journal Article•DOI•

Evaluation of the PROMIS physical function item bank in orthopaedic patients.

[...]

Man Hung¹, Daniel O. Clegg¹, Tom Greene¹, Charles L. Saltzman¹•Institutions (1)

University of Utah¹

01 Jun 2011-Journal of Orthopaedic Research

TL;DR: A single physical function dimension accounts for most of the item variance in the PPFIB, suggesting that the items are measuring predominately one single construct.

...read moreread less

161 citations

Journal Article•DOI•

Construction of the eight-item patient-reported outcomes measurement information system pediatric physical function scales: built using item response theory.

[...]

Esi Morgan DeWitt¹, Brian D. Stucky², David Thissen², Debra E. Irwin², Michelle M. Langer³, James W. Varni⁴, Jin Shei Lai⁵, Karin Yeatts², Darren A. DeWalt² - Show less +5 more•Institutions (5)

Duke University¹, University of North Carolina at Chapel Hill², National Board of Medical Examiners³, Texas A&M University⁴, Northwestern University⁵

01 Jul 2011-Journal of Clinical Epidemiology

TL;DR: In this article, the authors developed two PF item pools that comprised 32 mobility and 38 upper extremity items and evaluated the scale dimensionality and sources of local dependence (LD) with factor analysis.

...read moreread less

152 citations

Journal Article•DOI•

Psychometric analysis of the Empathy Quotient (EQ)

[...]

Catherine Allison¹, Simon Baron-Cohen¹, Sally Wheelwright¹, Mark H Stone², Steven Muncer³ - Show less +1 more•Institutions (3)

University of Cambridge¹, Aurora University², Teesside University³

01 Nov 2011-Personality and Individual Differences

TL;DR: The EQ is an appropriate measure of the construct of empathy and can be measured along a single dimension and suggested that a hierarchical factor of empathy underlies these sub-factors.

...read moreread less

Journal Article•DOI•

Effect size indices for analyses of measurement equivalence: understanding the practical importance of differences between groups.

[...]

Christopher D. Nye¹, Fritz Drasgow¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 2011-Journal of Applied Psychology

TL;DR: An effect size index is proposed for confirmatory factor analytic studies of measurement equivalence to address limitations of commonly recommended criteria for evaluating results from these analyses.

...read moreread less

Abstract: Because of the practical, theoretical, and legal implications of differential item functioning (DIF) for organizational assessments, studies of measurement equivalence are a necessary first step before scores can be compared across individuals from different groups. However, commonly recommended criteria for evaluating results from these analyses have several important limitations. The present study proposes an effect size index for confirmatory factor analytic (CFA) studies of measurement equivalence to address 1 of these limitations. The application of this index is illustrated with personality data from American English, Greek, and Chinese samples. Results showed a range of nonequivalence across these samples, and these differences were linked to the observed effects of DIF on the outcomes of the assessment (i.e., group-level mean differences and adverse impact). Practitioners and organizational researchers confront a vast number of questions that involve comparing scores on assessment instruments across groups. Are workers more satisfied in organi- zations with empowerment programs? Are successful salespersons more extraverted? Are employees in a multinational organization more satisfied in one country than employees in another? More- over, because of the legal and practical implications of using selection assessments that advantage one group over another, group comparisons may be particularly salient during the hiring process. For all of these comparisons to be meaningful, it is essential that the tests and scales provide equivalent measurement across groups. Equivalent measurement is obtained when individ- uals with the same standing on the trait assessed by the test or scale, but sampled from different groups, have equal expected observed scores (Drasgow, 1984). As such, measurement invari- ance can be examined by a differential item functioning (DIF) analysis using item-response theory (IRT) or with confirmatory factor analytic (CFA) mean and covariance structure (MACS) analysis. The latter method is the focus of this article. Although several articles have proposed various decision rules for determining if measurement nonequivalence exists with MACS analysis (Cheung & Rensvold, 2002; Hu & Bentler, 1999; Meade, Johnson, & Braddy, 2008), these rules generally involve empiri- cally derived cutoffs or statistical significance tests. As such, the analysis does not address the practical importance of observed differences between groups and does not provide users with infor- mation about the effects of nonequivalence on the organizational outcomes of an assessment. In the broader psychological literature, effect size statistics have been proposed to overcome this limita- tion (Cohen, 1990, 1994; Kirk, 2006; Schmidt, 1996). However, effect size indices for CFA evaluations of measurement equiva- lence have not yet been developed. In the present study, we propose such an index and examine its application to real-world data. To illustrate its practical importance, we also demonstrate the effects of measurement nonequivalence on the observed outcomes (e.g., means, adverse impact) of group-level comparisons. This information will enable researchers and practitioners to further evaluate the theoretical and practical importance of observed dif- ferences.

...read moreread less

Journal Article•DOI•

Psychometric properties of the Fatigue Severity Scale-Rasch analyses of individual responses in a Norwegian stroke cohort

[...]

Anners Lerdal, Anders Kottorp¹•Institutions (1)

Karolinska Institutet¹

01 Oct 2011-International Journal of Nursing Studies

TL;DR: In patients with stroke, the FSS-7 showed better psychometric properties and had better potential to detect changes in fatigue over time than the Fss-9 version, suggesting satisfactory grounds for removal of items #1 and #2 for its application.

...read moreread less

Journal Article•DOI•

The future of measuring patient-reported outcomes in rheumatology: Patient-Reported Outcomes Measurement Information System (PROMIS).

[...]

Dinesh Khanna¹, Eswar Krishnan², Esi Morgan DeWitt³, Puja P. Khanna¹, Brennan Spiegel⁴, Ron D. Hays⁴ - Show less +2 more•Institutions (4)

University of Michigan¹, Stanford University², Cincinnati Children's Hospital Medical Center³, University of California, Los Angeles⁴

01 Nov 2011-Arthritis Care and Research

TL;DR: The National Institutes of Health's Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative is a cooperative research program designed to develop, evaluate, and standardize item banks to measure patient-reported outcomes (PROs) across different medical conditions as well as the US population.

...read moreread less

Abstract: The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS®) Roadmap initiative (www.nihpromis.org) is a cooperative research program designed to develop, evaluate, and standardize item banks to measure patient-reported outcomes (PROs) across different medical conditions as well as the US population (1). The goal of PROMIS is to develop reliable and valid item banks using item response theory (IRT) that can be administered in a variety of formats including short forms and computerized adaptive tests (CAT)(1-3). IRT is often referred to as “modern psychometric theory,” in contrast to “classic test theory,” or CTT. The basic idea behind both IRT and CTT is that there is some latent construct, or “trait,” underlying an illness experience. This construct cannot be directly measured, but can be indirectly measured by creating items that are scaled and scored. For example, “fatigue,” “pain,” “disability,” or even “happiness” are latent constructs, i.e. subjective feelings – we cannot take a picture, snap an X-Ray to view them, or run a blood test to check for them. However, we know they exist. People can experience more or less of these constructs, thus it is helpful to try to translate that experience into several levels represented by scores. IRT models the associations between items and the latent construct. Specifically, IRT models describe relationships between a respondent's underlying level on a construct and the probability of particular item responses. Tests developed with CTT (such as the Health Assessment Questionnaire-Disability Index(4), the Scleroderma Gastrointestinal Tract instrument(5)) require administering all items, even though only some are appropriate for the persons' trait level. Some items are too high for those with low trait levels (e.g., “can you walk 100 yards” to a patient in a wheelchair) or too low for those with high trait levels (e.g., “can you get up from the chair?” to a runner). In contrast, IRT methods make it possible to estimate person trait levels with any subset of items appropriate for the persons' trait levels in an item pool. As such, any set of items from the pool could be administered as a fixed form or, for greatest efficiency, administered as a CAT. CAT is an approach to administering the subset of items in an item bank that are most informative for measuring the health construct in order to achieve a target standard error of measurement. A good item bank will have items that represent a range of content and difficulty, provide high level of information, and have items that perform equivalently in different subgroups of the target population. How does CAT work? Without prior information, the first item administered in a CAT is typically one of medium trait level. For example, “In the past 7 days I was grouchy” with multi-level response from “never” to “always.” After each response, the person's trait level and associated standard error are estimated. The next item administered to someone not endorsing the first item, is an “easier” item. If the person endorses the first item, the next item administered is a “harder” item. CAT is terminated when the standard error falls below an acceptable value. This provides an estimate of one's score with the minimal number of questions and no loss of measurement precision. In addition, scores from different studies using different items can be compared using a common scale. IRT models estimate the underlying scale score (theta) from the items. All items are calibrated on the same metric and independently and collectively provide an estimate of theta. Hence, it is possible to estimate the score using any subset of items and to estimate the standard error of the estimated score. This allows assessment of health outcomes across patients with differing medical conditions (such as compare scores of someone with arthritis to someone with heart disease) at various degrees of physical and other impairments, both at the lowest and highest ends of trait levels.

...read moreread less

Journal Article•DOI•

The pelvic girdle questionnaire: a condition-specific instrument for assessing activity limitations and symptoms in people with pelvic girdle pain.

[...]

Britt Stuge¹, Andrew M. Garratt, Hanne Krogstad Jenssen¹, Margreth Grotle¹•Institutions (1)

Oslo University Hospital¹

01 Jul 2011-Physical Therapy

TL;DR: The PGQ had acceptably high reliability and validity in people with PGP both during pregnancy and postpartum, it is simple to administer, and it is feasible for use in clinical practice.

...read moreread less

Abstract: Background No appropriate measures have been specifically developed for pelvic girdle pain (PGP). There is a need for suitable outcome measures that are reliable and valid for people with PGP for use in research and clinical practice. Objective The objective of this study was to develop a condition-specific measure, the Pelvic Girdle Questionnaire (PGQ), for use during pregnancy and postpartum. Design This was a methodology study. Methods Items were developed from a literature review and information from a focus group of people who consulted physical therapists for PGP. Face validity and content validity were assessed by classifying the items according to the World Health Organization's International Classification of Functioning, Disability and Health . After a pilot study, the PGQ was administered to participants with clinically verified PGP by means of a postal questionnaire in 2 surveys. The first survey included 94 participants (52 pregnant), and the second survey included 87 participants (43 pregnant). Rasch analysis was used for item reduction, and the PGQ was assessed for unidimensionality, item fit, redundancy, and differential item functioning. Test-retest reliability was assessed with a random sample of 42 participants. Results The analysis resulted in a questionnaire consisting of 20 activity items and 5 symptom items on a 4-point response scale. The items in both subscales showed a good fit to the Rasch model, with acceptable internal consistency, satisfactory fit residuals, and no disordered threshold. Test-retest reliability showed high intraclass correlation coefficient estimates: .93 (95% confidence interval=0.86–0.96) for the PGQ activity subscale and .91 (95% confidence interval=0.84–0.95) for the PGQ symptom subscale. Limitations The PGQ should be compared with low back pain questionnaires as part of a concurrent evaluation of measurement properties, including validity and responsiveness to change. Conclusions The PGQ is the first condition-specific measure developed for people with PGP. The PGQ had acceptably high reliability and validity in people with PGP both during pregnancy and postpartum, it is simple to administer, and it is feasible for use in clinical practice.

...read moreread less

Journal Article•DOI•

Are cross-cultural comparisons of personality profiles meaningful? Differential item and facet functioning in the Revised NEO Personality Inventory.

[...]

A. Timothy Church¹, Juan M. Alvarez¹, Nhu T. Q. Mai¹, Brian F. French¹, Marcia S. Katigbak¹, Fernando A. Ortiz² - Show less +2 more•Institutions (2)

Washington State University¹, Gonzaga University²

01 Nov 2011-Journal of Personality and Social Psychology

TL;DR: Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory and the results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles.

...read moreread less

Abstract: Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%-50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles.

...read moreread less

Journal Article•DOI•

Rasch analysis of the hospital anxiety and depression scale (HADS) for use in motor neurone disease.

[...]

Chris Gibbons¹, Chris Gibbons², Roger J Mills², Everard W. Thornton¹, John Ealing, John D Mitchell, Pamela J. Shaw³, Kevin Talbot⁴, Alan Tennant⁵, Carolyn A Young² - Show less +6 more•Institutions (5)

University of Liverpool¹, Walton Centre², University of Sheffield³, John Radcliffe Hospital⁴, University of Leeds⁵

29 Sep 2011-Health and Quality of Life Outcomes

TL;DR: A modified H ADS-A and HADS-D are unidimensional, free of DIF and have good fit to the Rasch model in this population of patients with MND, suggesting they are suitable for use in MND clinics or research.

...read moreread less

Abstract: Background: The Hospital Anxiety and Depression Scale (HADS) is commonly used to assess symptoms of anxiety and depression in motor neurone disease (MND). The measure has never been specifically validated for use within this population, despite questions raised about the scale’s validity. This study seeks to analyse the construct validity of the HADS in MND by fitting its data to the Rasch model. Methods: The scale was administered to 298 patients with MND. Scale assessment included model fit, differential item functioning (DIF), unidimensionality, local dependency and category threshold analysis. Results: Rasch analyses were carried out on the HADS total score as well as depression and anxiety subscales (HADS-T, D and A respectively). After removing one item from both of the seven item scales, it was possible to produce modified HADS-A and HADS-D scales which fit the Rasch model. An 11-item higher-order HADS-T total scale was found to fit the Rasch model following the removal of one further item. Conclusion: Our results suggest that a modified HADS-A and HADS-D are unidimensional, free of DIF and have good fit to the Rasch model in this population. As such they are suitable for use in MND clinics or research. The use of the modified HADS-T as a higher-order measure of psychological distress was supported by our data. Revised cut-off points are given for the modified HADS-A and HADS-D subscales.

...read moreread less

Journal Article•DOI•

Testing for Nonuniform Differential Item Functioning With Multiple Indicator Multiple Cause Models

[...]

Carol M. Woods¹, Kevin J. Grimm²•Institutions (2)

University of Kansas¹, University of California, Davis²

17 Jun 2011-Applied Psychological Measurement

TL;DR: In this paper, a latent variable interaction is added to the MIMIC model to test for non-uniform DIF, and the approach is tested in simulations with small focal-group N and illustrated with an empirical example using a scale about agoraphobic cognitions.

...read moreread less

Abstract: In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A latent variable interaction is added to the MIMIC model to test for nonuniform DIF. The approach is tested in simulations with small focal-group N and illustrated with an empirical example using a scale about agoraphobic cognitions. MIMIC-interaction models are compared with MIMIC models without the interaction as well as likelihood ratio DIF testing using item response theory (IRT-LR-DIF). The most important finding is that when the latent moderated structural equations approach is used to estimate the interaction, the Type I error in MIMIC-interaction DIF models is severely inflated.

...read moreread less

Journal Article•DOI•

Brief assessment of schizotypy: Developing short forms of the Wisconsin Schizotypy Scales

[...]

Beate P. Winterstein¹, Paul J. Silvia¹, Thomas R. Kwapil¹, James C. Kaufman², Roni Reiter-Palmon, Benjamin Wigert³ - Show less +2 more•Institutions (3)

University of North Carolina at Greensboro¹, California State University, San Bernardino², University of Nebraska Omaha³

01 Dec 2011-Personality and Individual Differences

TL;DR: In this paper, the authors present 15-item short forms of the Wisconsin Schizotypy scales, which are based on psychometric analyses using item response theory, and the items are listed in an Appendix A. Based on data from a sample of young adults (n = 1144).

...read moreread less

Journal Article•DOI•

The Assessment of Physiotherapy Practice (APP) is a valid measure of professional competence of physiotherapy students: a cross-sectional study with Rasch analysis

[...]

Megan Dalton¹, Megan Dalton², Megan Davidson³, Jenny Lyn Keating¹•Institutions (3)

Monash University¹, Griffith University², La Trobe University³

01 Jan 2011-Journal of Physiotherapy

TL;DR: Rasch analysis supports the interpretation that a student's APP score is an indication of their underlying level of professional competence in workplace practice.

...read moreread less

PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES Are Cross-Cultural Comparisons of Personality Profiles Meaningful? Differential Item and Facet Functioning in the Revised NEO Personality Inventory

[...]

A. Timothy Church, Juan M. Alvarez, Nhu T. Q. Mai, Brian F. French, Marcia S. Katigbak, Fernando A. Ortiz - Show less +2 more

01 Jan 2011

TL;DR: Costa et al. as discussed by the authors used multigroup confirmatory factor analysis to detect differential item functioning in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N 261), Philippines (N 268), and Mexico (N 775).

...read moreread less

Abstract: Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N 261), Philippines (N 268), and Mexico (N 775). About 40%–50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles.

...read moreread less

Journal Article•DOI•

What’s the DIF? Why differential item functioning analyses are an important part of instrument development and validation.

[...]

Cindy M. Walker¹•Institutions (1)

University of Wisconsin–Milwaukee¹

19 May 2011-Journal of Psychoeducational Assessment

TL;DR: In this article, the authors discuss the importance of conducting differential item functioning (DIF) analyses using a priori hypotheses whenever possible, and demonstrate how to test for DIF using logistic regression and DIFPACK.

...read moreread less

Abstract: The purpose of this manuscript was to help researchers better understand the causes and implications of differential item functioning (DIF), as well as the importance of testing for DIF in the process of test development and validation. The underlying theoretical reason for the presence of DIF is explicated, followed by a discussion of how to test for the presence of DIF using logistic regression and DIFPACK, which includes SIBTEST, PSIBTEST and Crossing SIBTEST. This manuscript stresses the importance of conducting DIF analyses using a priori hypotheses whenever possible. However, the example that is provided, to show researchers and practitioners how to conduct a DIF analysis, utilizes an exploratory DIF analyses paradigm which may often be needed in practical DIF applications. This example uses PSIBTEST to test for DIF, using data from an international assessment that includes a mixture of polytomous and dichotomous items. In addition to demonstrating how to test for DIF, this manuscript demonstrates h...

...read moreread less

Journal Article•DOI•

Comparisons of methamphetamine psychotic and schizophrenic symptoms: a differential item functioning analysis.

[...]

Manit Srisurapanont¹, Suwanna Arunpongpaisal², Kiyoshi Wada, John Marsden³, Robert Ali⁴, Ronnachai Kongsakon⁵ - Show less +2 more•Institutions (5)

Chiang Mai University¹, Khon Kaen University², University of London³, University of Adelaide⁴, Mahidol University⁵

01 Jun 2011-Progress in Neuro-psychopharmacology & Biological Psychiatry

TL;DR: The results suggest that, at the same level of syndrome severity, the severity of psychotic symptoms, including the negative ones, observed in MA psychotic and schizophrenic patients are almost the same.

...read moreread less

Abstract: The concept of negative symptoms in methamphetamine (MA) psychosis (e.g., poverty of speech, flatten affect, and loss of drive) is still uncertain. This study aimed to use differential item functioning (DIF) statistical techniques to differentiate the severity of psychotic symptoms between MA psychotic and schizophrenic patients. Data of MA psychotic and schizophrenic patients were those of the participants in the WHO Multi-Site Project on Methamphetamine-Induced Psychosis (or WHO-MAIP study) and the Risperidone Long-Acting Injection in Thai Schizophrenic Patients (or RLAI-Thai study), respectively. To confirm the unidimensionality of psychotic syndromes, we applied the exploratory and confirmatory factor analyses (EFA and CFA) on the eight items of Manchester scale. We conducted the DIF analysis of psychotic symptoms observed in both groups by using nonparametric kernel-smoothing techniques of item response theory. A DIF composite index of 0.30 or greater indicated the difference of symptom severity. The analyses included the data of 168 MA psychotic participants and the baseline data of 169 schizophrenic patients. For both data sets, the EFA and CFA suggested a three-factor model of the psychotic symptoms, including negative syndrome (poverty of speech, psychomotor retardation and flatten/incongruous affect), positive syndrome (delusions, hallucinations and incoherent speech) and anxiety/depression syndrome (anxiety and depression). The DIF composite indexes comparing the severity differences of all eight psychotic symptoms were lower than 0.3. The results suggest that, at the same level of syndrome severity (i.e., negative, positive, and anxiety/depression syndromes), the severity of psychotic symptoms, including the negative ones, observed in MA psychotic and schizophrenic patients are almost the same.

...read moreread less

Journal Article•DOI•

Differential Item Functioning on the Five Facet Mindfulness Questionnaire Is Minimal in Demographically Matched Meditators and Nonmeditators

[...]

Ruth A. Baer¹, Douglas B. Samuel², Emily L. B. Lykins¹•Institutions (2)

University of Kentucky¹, Yale University²

01 Mar 2011-Assessment

TL;DR: Findings suggest that DIF based on items’ scoring direction is not problematic when the Five Facet Mindfulness Questionnaire is used to compare demographically similar meditators and nonmeditators.

...read moreread less

Abstract: A recent study of the Five Facet Mindfulness Questionnaire reported high levels of differential item functioning (DIF) for 18 of its 39 items in meditating and nonmeditating samples that were not demographically matched. In particular, meditators were more likely to endorse positively worded items whereas nonmeditators were more likely to deny negatively worded (reverse-scored) items. The present study replicated these analyses in demographically matched samples of meditators and nonmeditators (n = 115 each) and found that evidence for DIF was minimal. There was little or no evidence for differential relationships between positively and negatively worded items for meditators and nonmeditators. Findings suggest that DIF based on items’ scoring direction is not problematic when the Five Facet Mindfulness Questionnaire is used to compare demographically similar meditators and nonmeditators.

...read moreread less

Journal Article•DOI•

Do Somatic and Cognitive Symptoms of Traumatic Brain Injury Confound Depression Screening

[...]

Karon F. Cook¹, Charles H. Bombardier¹, Alyssa M. Bamer¹, Seung W. Choi², Kurt Kroenke³, Jesse R. Fann¹ - Show less +2 more•Institutions (3)

University of Washington¹, Northwestern University², Indiana University³

01 May 2011-Archives of Physical Medicine and Rehabilitation

TL;DR: The results do not support the hypothesis that cumulative DIF for PHQ-9 items spuriously inflates the numbers of persons with TBI screened as potentially having major depressive disorder, and all symptoms can be counted toward the diagnosis of major depressive Disorder without special concern about overdiagnosis or unnecessary treatment.

...read moreread less

Journal Article•DOI•

Psychometric Evaluation of a Receptive Vocabulary Test for Greek Elementary Students.

[...]

Panagiotis G. Simos¹, Georgios D. Sideridis¹, Athanassios Protopapas, Angeliki Mouzaki¹•Institutions (1)

University of Crete¹

18 Jul 2011-Assessment for Effective Intervention

TL;DR: In this paper, the authors applied modern statistical approaches in the adaptation and assessment of the psychometric properties of the Peabody Picture Vocabulary Test-Revised (PPVT-R) Greek.

...read moreread less

Abstract: Assessment of lexical/semantic knowledge is performed with a variety of tests varying in response requirements. The present study exemplifies the application of modern statistical approaches in the adaptation and assessment of the psychometric properties of the Peabody Picture Vocabulary Test‐Revised (PPVT-R) Greek. Confirmatory factor analyses applied to data from a large sample of elementary school students (N = 585) indicated the existence of a single vocabulary dimension and differential item functioning procedures pointed to minimal bias due to gender or ethnic group. Rasch model‐derived indices of item difficulty and discrimination were used to develop a short form of the test, which was administered to a second sample of 900 students. Convergent and discriminant validity were assessed through comparisons with the Wechsler Intelligence Scales for Children‐III Vocabulary and Block design subtests. Short- and long-term stability of individual scores over a 6-month period were very high, and the utility of the test as part of routine educational assessment is attested by its strong longitudinal predictive value with reading comprehension measures. It is concluded that the Greek version of the PPVT-R constitutes a reliable and valid assessment of vocabulary for Greek students and immigrants who speak Greek.

...read moreread less

Journal Article•DOI•

Measurement Equivalence of Ordinal Items: A Comparison of Factor Analytic, Item Response Theory, and Latent Class Approaches

[...]

Miloš Kankaraš¹, Jeroen K. Vermunt¹, Guy Moors¹•Institutions (1)

Tilburg University¹

22 Apr 2011-Sociological Methods & Research

TL;DR: In this article, three distinctive methods of assessing measurement equivalence of ordinal items, namely, confirmatory factor analysis, differential item functioning using item response theory, and latent class factor analysis make different modeling assumptions and adopt different procedures.

...read moreread less

Abstract: Three distinctive methods of assessing measurement equivalence of ordinal items, namely, confirmatory factor analysis, differential item functioning using item response theory, and latent class factor analysis, make different modeling assumptions and adopt different procedures. Simulation data are used to compare the performance of these three approaches in detecting the sources of measurement inequivalence. For this purpose, the authors simulated Likert-type data using two nonlinear models, one with categorical and one with continuous latent variables. Inequivalence was set up in the slope parameters (loadings) as well as in the item intercept parameters in a form resembling agreement and extreme response styles. Results indicate that the item response theory and latent class factor models can relatively accurately detect and locate inequivalence in the intercept and slope parameters both at the scale and the item levels. Confirmatory factor analysis performs well when inequivalence is located in the slo...

...read moreread less

Journal Article•DOI•

An Investigation of Differential Item Functioning in the MELAB Listening Test

[...]

Vahid Aryadoust¹, Christine C. M. Goh¹, Lee Ong Kim¹•Institutions (1)

National Institute of Education¹

01 Dec 2011-Language Assessment Quarterly

TL;DR: Differential item functioning (DIF) analysis is a way of determining whether test items function differently across subgroups of test takers after controlling for ability level as mentioned in this paper, and is used to evaluate tests' validity arguments.

...read moreread less

Abstract: Differential item functioning (DIF) analysis is a way of determining whether test items function differently across subgroups of test takers after controlling for ability level. DIF results are used to evaluate tests' validity arguments. This study uses Rasch measurement to examine the Michigan English Language Assessment Battery listening test for DIF across gender subgroups. After establishing the unidimensionality and local independence of the data, the authors used two methods to test for DIF: (a) a t-test uniform DIF analysis, which showed that two test items displayed substantive DIF, and favored different gender subgroups; and (b) nonuniform DIF analysis, which revealed several test items with significant DIF, many of which favored low-ability male test takers. A possible explanation for gender-ability DIF is that lower ability male test takers are more likely to attempt lucky guesses, particularly on multiple-choice items with unattractive distracters, and that having only two distracters makes th...

...read moreread less

Journal Article•DOI•

Differential Item Functioning of Pathological Gambling Criteria: An Examination of Gender, Race/Ethnicity, and Age

[...]

Paul Sacco¹, Luis R. Torres², Renee M. Cunningham-Williams³, Carol M. Woods³, G. Jay Unick¹ - Show less +1 more•Institutions (3)

University of Maryland, Baltimore¹, University of Houston², Washington University in St. Louis³

01 Jun 2011-Journal of Gambling Studies

TL;DR: This study tested for the presence of differential item functioning (DIF) in DSM-IV Pathological Gambling Disorder (PGD) criteria based on gender, race/ethnicity and age using a nationally representative sample of adults from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC).

...read moreread less

Abstract: This study tested for the presence of differential item functioning (DIF) in DSM-IV Pathological Gambling Disorder (PGD) criteria based on gender, race/ethnicity and age. Using a nationally representative sample of adults from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), indicating current gambling (n = 10,899), Multiple Indicator-Multiple Cause (MIMIC) models tested for DIF, controlling for income, education, and marital status. Compared to the reference groups (i.e., Male, Caucasian, and ages 25–59 years), women (OR = 0.62; P < .001) and Asian Americans (OR = 0.33; P < .001) were less likely to endorse preoccupation (Criterion 1). Women were more likely to endorse gambling to escape (Criterion 5) (OR = 2.22; P < .001) but young adults (OR = 0.62; P < .05) were less likely to endorse it. African Americans (OR = 2.50; P < .001) and Hispanics were more likely to endorse trying to cut back (Criterion 3) (OR = 2.01; P < .01). African Americans were more likely to endorse the suffering losses (OR = 2.27; P < .01) criterion. Young adults were more likely to endorse chasinglosses (Criterion 9) (OR = 1.81; P < .01) while older adults were less likely to endorse this criterion (OR = 0.76; P < .05). Further research is needed to identify factors contributing to DIF, address criteria level bias, and examine differential test functioning.

...read moreread less

Journal Article•DOI•

An item factor analysis and item response theory-based revision of the Everyday Discrimination Scale.

[...]

Brian D. Stucky¹, Nisha C. Gottfredson¹, Abigail T Panter¹, Charles E. Daye¹, Walter R. Allen², Linda F. Wightman³ - Show less +2 more•Institutions (3)

University of North Carolina at Chapel Hill¹, University of California, Los Angeles², University of North Carolina at Greensboro³

01 Apr 2011-Cultural Diversity & Ethnic Minority Psychology

TL;DR: The results indicate that the revised-EDS is unidimensional, with minimal differential item functioning, and retains predictive validity consistent with the original scale.

...read moreread less

Abstract: The Everyday Discrimination Scale (EDS), a widely used measure of daily perceived discrimination, is purported to be unidimensional, to function well among African Americans, and to have adequate construct validity. Two separate studies and data sources were used to examine and cross-validate the psychometric properties of the EDS. In Study 1, an exploratory factor analysis was conducted on a sample of African American law students (N = 589), providing strong evidence of local dependence, or nuisance multidimensionality within the EDS. In Study 2, a separate nationally representative community sample (N = 3,527) was used to model the identified local dependence in an item factor analysis (i.e., bifactor model). Next, item response theory (IRT) calibrations were conducted to obtain item parameters. A five-item, revised-EDS was then tested for gender differential item functioning (in an IRT framework). Based on these analyses, a summed score to IRT-scaled score translation table is provided for the revised-EDS. Our results indicate that the revised-EDS is unidimensional, with minimal differential item functioning, and retains predictive validity consistent with the original scale.

...read moreread less

Collapse