Showing papers on "Differential item functioning published in 2006"

PDF

Open Access

Journal Article•DOI•

Using the Patient Health Questionnaire-9 to Measure Depression among Racially and Ethnically Diverse Primary Care Patients

[...]

Frederick Y. Huang¹, Henry Chung², Kurt Kroenke³, Kevin L. Delucchi¹, Robert L. Spitzer - Show less +1 more•Institutions (3)

University of California, San Francisco¹, New York University², Indiana University³

01 Jun 2006-Journal of General Internal Medicine

TL;DR: The analyses indicate that in African American, Chinese American, Latino, and non-Hispanic white patient groups the PHQ-9 measures a common concept of depression and can be effective for the detection and monitoring of depression in these diverse populations.

...read moreread less

Abstract: OBJECTIVE: The Patient Health Questionnaire depression scale (PHQ-9) is a well-validated, Diagnostic and Statistical Manual of Mental Disorders—Fourth Edition (DSM-IV) criterion-based measure for diagnosing depression, assessing severity and monitoring treatment response. The performance of most depression scales including the PHQ-9, however, has not been rigorously evaluated in different racial/ethnic populations. Therefore, we compared the factor structure of the PHQ-9 between different racial/ethnic groups as well as the rates of endorsement and differential item functioning (DIF) of the 9 items of the PHQ-9. The presence of DIF would indicate that responses to an individual item differ significantly between groups, controlling for the level of depression.

...read moreread less

619 citations

Journal Article•DOI•

An essay on measurement and factorial invariance.

[...]

William Meredith¹, Jeanne A. Teresi²•Institutions (2)

University of California, Berkeley¹, Columbia University²

01 Nov 2006-Medical Care

TL;DR: The purpose of this essay is to review the definitions and assumptions associated with factorial invariance, placing this formulation in the context of bias, fairness, and equity.

...read moreread less

Abstract: Background:Analysis of subgroups such as different ethnic, language, or education groups selected from among a parent population is common in health disparities research. One goal of such analyses is to examine measurement equivalence, which includes both qualitative review of the meaning of items a

...read moreread less

572 citations

Journal Article•DOI•

The EUROHIS-QOL 8-item index: psychometric results of a cross-cultural field study.

[...]

Silke Schmidt, Holger Mühlan, Mick Power¹•Institutions (1)

University of Edinburgh¹

01 Aug 2006-European Journal of Public Health

TL;DR: The short EUROHIS-QOL 8-item index showed good cross-cultural field study performance and a satisfactory convergent and discriminant validity, and can therefore be recommended for use in public health research.

...read moreread less

Abstract: Background: Survey research including multiple health indicators requires brief indices for use in cross-cultural studies, which have, however, rarely been tested in terms of their psychometric quality. Recently, the EUROHIS-QOL 8-item index was developed as an adaptation of the WHOQOL-100 and the WHOQOL-BREF. The aim of the current study was to test the psychometric properties of the EUROHIS-QOL 8-item index. Methods: In a survey on 4849 European adults, the EUROHIS-QOL 8-item index was assessed across 10 countries, with equal samples adjusted for selected sociodemographic data. Participants were also investigated with a chronic condition checklist, measures on general health perception, mental health, health-care utilization and social support. Results: Findings indicated good internal consistencies across a range of countries, showing acceptable convergent validity with physical and mental health measures, and the measure discriminates well between individuals that report having a longstanding condition and healthy individuals across all countries. Differential item functioning was less frequently observed in those countries that were geographically and culturally closer to the UK, but acceptable across all countries. A universal one-factor structure with a good fit in structural equation modelling analyses (SEM) was identified with, however, limitations in model fit for specific countires. Conclusions: The short EUROHIS-QOL 8-item index showed good cross-cultural field study performance and a satisfactory convergent and discriminant validity, and can therefore be recommended for use in public health research. In future studies the measure should also be tested in multinational clinical studies, particularly in order to test its sensitivity.

...read moreread less

414 citations

Journal Article•DOI•

Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy.

[...]

Stephen Stark¹, Oleksandr S. Chernyshenko², Fritz Drasgow³•Institutions (3)

University of South Florida¹, University of Canterbury², University of Illinois at Urbana–Champaign³

01 Nov 2006-Journal of Applied Psychology

TL;DR: A common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory and results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model.

...read moreread less

Abstract: In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact.

...read moreread less

358 citations

Journal Article•DOI•

Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

[...]

Paul K. Crane¹, Laura E. Gibbons, Lance Jolley, Gerald van Belle•Institutions (1)

University of Washington¹

01 Nov 2006-Medical Care

TL;DR: The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for Dif detection.

...read moreread less

Abstract: Introduction:We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic reg

...read moreread less

261 citations

Journal Article•DOI•

When does measurement invariance matter

[...]

Denny Borsboom¹•Institutions (1)

University of Amsterdam¹

01 Nov 2006-Medical Care

TL;DR: Through this work, it has become clear that differences in raw scores of different groups cannot be used to infer group differences in theoretical attributes unless the test scores accord with a particular set of model invariance restrictions.

...read moreread less

Abstract: question whether observed differences in psychometric test scores can be attributed to differences in the properties that such tests measure is relevant in many research domains; examples include the proper interpretation of differences in intelligence test scores across different generations of people,1 gender differences in affectivity,2 and crosscultural differences in personality. This question also has generated some of the most conspicuous controversies in the social and life sciences, where the highest temperature in the many heated discussions around the topic has, without a doubt, been reached in the debate on IQ-score differences between ethnic groups in the United States.4'5 Such debates are often unproductive because of a lack of unambiguous characterizations of concepts like "biased," "incomparable," and "culture-fair." Terms are easily coined, as is illustrated by Johnson's6 count of no less than 55 types of measurement equivalence; however, it is often less easy to spell out their meaning in terms of their empirical consequences. However, without at least some degree of precision in one's conception of a term like "equivalence," it is difficult to have a scientifically productive debate, or even to agree on what aspects of empirical data are relevant for answering the questions involved. It is for this reason that the establishment of concepts like measurement invariance and bias in an unambigous, formal framework with testable consequences7"9 represents a theoretical development of great importance. Through this work, it has become clear that differences in raw scores (eg, IQ-scores) of different groups (eg, blacks and whites) cannot be used to infer group differences in theoretical attributes (eg, general intelligence) unless the test scores accord with a particular set of model invariance restrictions. Namely, the same attribute must relate to the same set of observations in the same way in each group. Statistically, this means that the mathematical function that relates latent variables to the observations must be the same in each of the groups involved in the comparison.7'8 This idea has become known as the requirement of measurement invariance. The theoretical definitions of measurement invariance and bias are very general, and apply to different models, such as item response theory (IRT) and factor models, in roughly the same way.10'11 This does not hold for the empirical methods available for testing measurement invariance. In the past decades, psychometricians working on measurement invariance have produced many different statistical techniques to assess differential item functioning (DIF). These techniques usually employ different statistical assumptions, for instance, regarding the form of the relation between latent and observed variables and the shape of the population distribution on the latent variable, and employ different modeling strategies as well as selection criteria for flagging items as biased. For this reason, it is difficult to assess the consequences of choosing a particular technique; moreover, it is not always clear to what extent the choice of technique makes a difference with respect to the diagnosis of meaurement invariance and bias in applied situations. For this reason, the articles on DIF collected here (by Crane et al;12 Dorans and Kulick;13 Jones;14 Morales, Flowers, Gutierrez, Kleinman, and Teresi;15 Edelen Orlando et al16) represent a useful project in the application of bias detection methods. Each set of authors analyzes the Mini-Mental State Examination (MMSE) for measurement invariance using the same data, albeit with different methods. Together, the articles provide a

...read moreread less

249 citations

Book Chapter•DOI•

5 Differential Item Functioning and Item Bias

[...]

Randall D. Penfield, Gregory Camilli

01 Jan 2006-Handbook of Statistics

TL;DR: This chapter presents a description of many of the commonly employed methods in the detection of item bias, and outlines a set of six steps that practitioners can use in conducting DIF analyses.

...read moreread less

Abstract: Publisher Summary This chapter presents a description of many of the commonly employed methods in the detection of item bias. Because much of the statistical detection of item bias makes the use of differential item functioning (DIF) procedures, the majority of this chapter focuses on the description of statistical methods for the analysis of DIF. DIF detection procedures for dichotomous and polytomous items are presented in the chapter, along with the methods for the categorization of DIF effect in dichotomous items. It also presents several recent innovations in DIF detection, including Bayesian applications the detection of differential test functioning, and studies examining sources or explanations of DIF. While much of this chapter focuses on the statistical approaches to measuring DIF, conducting a comprehensive DIF analysis requires series of steps aimed at measuring DIF and ensuring that the obtained DIF statistics are interpreted appropriately. The chapter also outlines a set of six steps that practitioners can use in conducting DIF analyses. The steps are demonstrated using a real dataset.

...read moreread less

204 citations

Journal Article•DOI•

Different approaches to differential item functioning in health applications. Advantages, disadvantages and some neglected topics.

[...]

Jeanne A. Teresi¹•Institutions (1)

Columbia University¹

01 Nov 2006-Medical Care

TL;DR: An integrated approach to the examination of measurement equivalence, invariance, and DIF is necessary for measurement in an increasingly multi-ethnic society.

...read moreread less

Abstract: Background and objectives Reviewed in this article are topics related to the study of invariance and differential item functioning (DIF) that have received relatively little attention in the literature. Several factors influence DIF detection; these include (1) model fit, (2) model assumptions, (3) disability distributions, (4) purification, (5) cutoff values for magnitude measures, and (6) sample and scale size. Methods Approaches to DIF detection are discussed in terms of model assumptions, purification, magnitude and impact, and possible advantages and disadvantages of each method. Conclusions An integrated approach to the examination of measurement equivalence, invariance, and DIF is necessary for measurement in an increasingly multi-ethnic society. Ideally, qualitative analyses should be performed in an iterative fashion to inform about findings of DIF. However, if an already-developed measure is being evaluated, then the steps might be to focus first on dimensional invariance using factor analytic methods, followed by DIF analyses examining both significance and magnitude of DIF, accompanied by formal tests of the impact of DIF. The DIF analytic method selected in the second step might be determined based on the findings summarized in the table presented within this paper.

...read moreread less

173 citations

Journal Article•DOI•

The concept of life satisfaction across cultures: An IRT analysis

[...]

Shigehiro Oishi¹•Institutions (1)

University of Virginia¹

01 Aug 2006-Journal of Research in Personality

TL;DR: This paper examined measurement equivalence of the Satisfaction with Life Scale between American and Chinese samples using multigroup Structural Equation Modeling (SEM), Multiple Indicator Multiple Cause Model (MIMIC), and Item Response Theory (IRT).

...read moreread less

153 citations

Journal Article•DOI•

An Investigation of the Differential Effort Received by Items on a Low-Stakes Computer-Based Test.

[...]

Steven L. Wise

01 Apr 2006-Applied Measurement in Education

TL;DR: In this article, the authors investigated the use of response time to assess the amount of examinee effort received by individual test items and found that the strongest predictors of the effort required by items were item length (i.e., how much reading or scanning was required).

...read moreread less

Abstract: In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found that the strongest predictors of the effort received by items were item length (i.e., how much reading or scanning was required) and item position. In addition, it was found that by treating item responses resulting from rapid guesses as missing, item means and item-total correlations were differentially affected and test score reliability decreased, whereas validity increased. Several implications of these results for low-stakes testing are discussed.

...read moreread less

151 citations

Journal Article•DOI•

Getting the measure of spasticity in multiple sclerosis: the Multiple Sclerosis Spasticity Scale (MSSS-88).

[...]

Jeremy Hobart, Afsane Riazi, Alan J. Thompson, Irene Styles, Wendy Ingram, P. J. Vickery, M. Warner, P. J. Fox, John Zajicek - Show less +5 more

01 Jan 2006-Brain

TL;DR: The 88-item Multiple Sclerosis Spasticity Scale (MSSS-88) is a reliable and valid, patient-based, interval-level measure of the impact of spasticity in multiple sclerosis that has the potential to advance outcomes measurement in clinical trials and clinical practice, and provides a new perspective in the clinical evaluation ofSpasticity.

...read moreread less

Abstract: Spasticity is most commonly defined as an inappropriate, velocity dependent, increase in muscle tonic stretch reflexes, due to the amplified reactivity of motor segments to sensory input. It forms one component of the upper motor neuron syndrome and often leads to muscle stiffness and disability. Spasticity can, therefore, be measured through electrophysiological, biomechanical and clinical evaluation, the last most commonly using the Ashworth scale. None of these techniques incorporate the patient experience of spasticity, nor how it affects people's daily lives. Consequently, we set out to construct a rating scale to quantify the perspectives of the impact of spasticity on people with multiple sclerosis. Qualitative methods (in-depth patient interviews and focus groups, expert opinion and literature review) were used to develop a conceptual framework of spasticity impact, and to generate a pool of items with the potential to convert this framework into a rating scale with multiple dimensions. This item pool was administered, in the form of a questionnaire, to a sample of people with multiple sclerosis and spasticity. Guided by Rasch analysis, we constructed and validated a rating scale for each component of the conceptual framework. Decisions regarding item selection were based on the integration and assimilation of seven specific analyses including clinical meaning, ordering of thresholds, fit statistics and differential item functioning. The qualitative phase (17 patient interviews, 3 focus groups) generated 144 potential scale items and a conceptual model with eight components addressing symptoms (muscle stiffness, pain and discomfort and muscle spasms,), physical impact (activities of daily living, walking and body movements) and psychosocial impact (emotional health, social functioning). The first postal survey was sent to 272 people with multiple sclerosis and had a response rate of 88%. Findings supported the development of scales for each component but demonstrated that five item response options were too many. The 144-item questionnaire, reformatted with four-item response options, was administered with four validating instruments to an independent sample of 259 people with multiple sclerosis (response rate 78%). From the responses, an 88-item instrument with eight subscales was developed that satisfied criteria for reliable and valid measurement. Correlations with other measures were consistent with predictions. The 88-item Multiple Sclerosis Spasticity Scale (MSSS-88) is a reliable and valid, patient-based, interval-level measure of the impact of spasticity in multiple sclerosis. It has the potential to advance outcomes measurement in clinical trials and clinical practice, and provides a new perspective in the clinical evaluation of spasticity.

...read moreread less

Journal Article•DOI•

Using Effect Sizes for Research Reporting: Examples Using Item Response Theory to Analyze Differential Item Functioning.

[...]

Lynne Steinberg¹, David Thissen²•Institutions (2)

University of Houston¹, University of North Carolina at Chapel Hill²

01 Dec 2006-Psychological Methods

TL;DR: The authors show that the size of the effect can be expressed by a presentation of the values of the parameter estimates derived from the fitted model, and develop a case study of the description of effect size for research reporting in the context of item response theory.

...read moreread less

Abstract: The psychological literature currently emphasizes reporting the "effect size" of research findings in addition to the outcome of any tests of significance. However, some confusion may result from the fact that there are three distinct uses of effect sizes in the psychological literature, namely, power analysis, research synthesis, and research reporting. The authors review these uses of effect sizes and develop a case study of the description of effect size for research reporting in the context of item response theory. For many parametric models, hypotheses are tested by comparing the values of directly interpretable parameters. The authors show that the size of the effect can be expressed by a presentation of the values of the parameter estimates derived from the fitted model. Studies that use item response theory to detect differential item functioning provide illustrations.

...read moreread less

Journal Article•DOI•

Good practices for identifying differential item functioning.

[...]

Ronald K. Hambleton¹•Institutions (1)

University of Massachusetts Amherst¹

01 Nov 2006-Medical Care

TL;DR: The articles addressing differential item functioning (DIF) and factorial invariance in this special issue of Medical Care1−9 are uniformly excellent and readers will find that each article makes an important contribution to the measurement literature.

...read moreread less

Abstract: The articles addressing differential item functioning (DIF) and factorial invariance in this special issue of Medical Care1–9 are uniformly excellent and readers will find that each article makes an important contribution to the measurement literature. The suggestion to have researchers apply variou

...read moreread less

Journal Article•DOI•

Measurement of Alcohol-Related Consequences among High School and College Students: Application of Item Response Models to the Rutgers Alcohol Problem Index.

[...]

Derrick J. Neal¹, Derrick J. Neal², William R. Corbin³, Kim Fromme²•Institutions (3)

Kent State University¹, University of Texas at Austin², Yale University³

01 Dec 2006-Psychological Assessment

TL;DR: A reduced 18-item measure demonstrating strong clinical utility is proposed, with scores of 8 or greater implying greater need for treatment.

...read moreread less

Abstract: The Rutgers Alcohol Problem Index (RAPI; H. R. White & E. W. Labouvie, 1989) is a frequently used measure of alcohol-related consequences in adolescents and college students, but psychometric evaluations of the RAPI are limited and it has not been validated with college students. This study used item response theory (IRT) to examine the RAPI on students (N = 895; 65% female, 35% male) assessed in both high school and college. A series of 2-parameter IRT models were computed, examining differential item functioning across gender and time points. A reduced 18-item measure demonstrating strong clinical utility is proposed, with scores of 8 or greater implying greater need for treatment.

...read moreread less

Journal Article•DOI•

Evaluating Guidelines For Test Adaptations A Methodological Analysis of Translation Quality

[...]

Stephen G. Sireci¹, Yongwei Yang², James K. Harter², Eldin J. Ehrlich²•Institutions (2)

University of Massachusetts Amherst¹, Gallup²

01 Sep 2006-Journal of Cross-Cultural Psychology

TL;DR: This article used differential item functioning to evaluate the comparability of translated items at two different points in time, after the initial translation and 4 years later after the translations were revisited using a more rigorous translation model.

...read moreread less

Abstract: Guidelines for translating educational and psychological assessments for use across different languages and cultures have been developed by the International Test Commission and the Joint Committee on Standards for Educational and Psychological Testing. Common themes in these guidelines and standards are when translating items both judgmental and statistical techniques should be used to ensure item comparability across languages, and rigorous quality-control steps should be included in the translation process. In this study, the authors use differential item functioning methodology to evaluate the comparability of translated items at two different points in time—after the initial translation and 4 years later after the translations were revisited using a more rigorous translation model. The results indicated that the revised translations led to improvements in some but not all items. Improvements in the process of translating survey items, even when based on accepted professional standards, should be stat...

...read moreread less

Journal Article•DOI•

Identification of differential item functioning using item response theory and the likelihood-based model comparison approach. Application to the Mini-Mental State Examination.

[...]

Maria Orlando Edelen¹, David Thissen, Jeanne A. Teresi², Marjorie Kleinman², Katja Ocepek-Welikson - Show less +1 more•Institutions (2)

RAND Corporation¹, Columbia University²

01 Nov 2006-Medical Care

TL;DR: IRT and the likelihood-based model comparison approach comprise a powerful tool for DIF detection that can aid in the development, refinement, and evaluation of measures for use in ethnically diverse populations.

...read moreread less

Abstract: Background An important part of examining the adequacy of measures for use in ethnically diverse populations is the evaluation of differential item functioning (DIF) among subpopulations such as those administered the measure in different languages. A number of methods exist for this purpose. Objective The objective of this study was to introduce and demonstrate the identification of DIF using item response theory (IRT) and the likelihood-based model comparison approach. Methods Data come from a sample of community-residing elderly who were part of a dementia case registry. A total of 1578 participants were administered either an English (n = 913) or Spanish (n = 665) version of the 21-item Mini-Mental State Examination. IRT was used to identify language DIF in these items with the likelihood-based model comparison approach. Results : Fourteen of the 21 items exhibited significant DIF according to language of administration. However, because the direction of the identified DIF was not consistent for one language version over the other, the impact at the scale level was negligible. Conclusions IRT and the likelihood-based model comparison approach comprise a powerful tool for DIF detection that can aid in the development, refinement, and evaluation of measures for use in ethnically diverse populations.

...read moreread less

Journal Article•DOI•

Testing and reducing skindex-29 using Rasch analysis: Skindex-17.

[...]

Tamar Nijsten¹, Francesca Sampogna, Mary-Margaret Chren², Damiano Abeni•Institutions (2)

Erasmus University Rotterdam¹, University of California, San Francisco²

01 Jun 2006-Journal of Investigative Dermatology

TL;DR: The Skindex-29 of 454 Italian dermatological patients was subjected to Rasch analysis to investigate threshold order, differential item functioning (DIF), and item and overall fit to the model.

...read moreread less

Journal Article•DOI•

The role of Rasch analysis when conducting science education research utilizing multiple‐choice tests

[...]

William J. Boone¹, Kathryn Scantlebury²•Institutions (2)

Miami University¹, University of Delaware²

01 Mar 2006-Science Education

TL;DR: In this article, the strengths of the Rasch model as a psychometric tool and analysis technique are discussed, referring to person item maps, anchoring, differential item functioning, and person item fit.

...read moreread less

Abstract: Recent international studies note that countries whose students perform well on international science assessments report the need to change science education. Some countries use assessments for diagnostic purposes to assist teachers in addressing their students' needs. However, in the United States, standards-based reform has focused the national discussion on documenting students' attainment of high educational standards. Students' science achievement is one of those standards, and in many states, “high-stakes” tests determine the resultant achievement measures. Policymakers and administrators use those tests to rank school performance, to prevent students' graduation, and to evaluate teachers. With science test measures used in different ways, statistical confidence in the measures' validity and reliability is essential. Using a science achievement test from one state's systemic reform project as an example, this paper discusses the strengths of the Rasch model as a psychometric tool and analysis technique, referring to person item maps, anchoring, differential item functioning, and person item fit. Furthermore, the paper proposes that science educators should carefully inspect the tools they use to measure and document changes in educational systems. © 2005 Wiley Periodicals, Inc. Sci Ed90:253–269, 2006

...read moreread less

Journal Article•DOI•

Overview of quantitative measurement methods. Equivalence, invariance, and differential item functioning in health applications.

[...]

Jeanne A. Teresi¹•Institutions (1)

Columbia University¹

01 Nov 2006-Medical Care

TL;DR: An integrated approach to the quantitative methods used in this special issue to examine measurement equivalence are provided, finding that factor analytic and DIF detection methods provide unique information and can be viewed as complementary in informing about measurement equivalences.

...read moreread less

Abstract: Background:Reviewed in this article are issues relating to the study of invariance and differential item functioning (DIF). The aim of factor analyses and DIF, in the context of invariance testing, is the examination of group differences in item response conditional on an estimate of disability. Dis

...read moreread less

Journal Article•DOI•

Identification of measurement differences between English and Spanish language versions of the Mini-Mental State Examination : Detecting differential item functioning using MIMIC modeling

[...]

Richard N. Jones

01 Nov 2006-Medical Care

TL;DR: Failing to account for measurement differences may lead to spurious inferences regarding language group differences in level of underlying level of cognitive functioning, and the MIMIC model can be used to detect and adjust for such measurement differences in substantive research.

...read moreread less

Abstract: Background: Knowledge of the extent to which measurement of adult cognitive functioning differs between Spanish and English language administrations of the Mini-Mental State Examination (MMSE) is critical for inclusive, representative, and valid research of older adults in the United States Objectives: We sought to demonstrate the use of an item response theory (IRT) based structural equation model, that is, the MIMIC model (multiple indicators, multiple causes), to evaluate MMSE responses for evidence of differential item functioning (DIF) attributable to language of administration Subjects: We studied participants in a dementia case registry study (n = 1546), 42% of whom were examined with the Spanish language MMSE Results: Twelve of 21 items were identified as having significant uniform DIF The 4 most discrepant included orientation to season, orientation to state, repeat phrase, and follow command DIF accounted for two-thirds of the observed difference in underlying level of cognitive functioning between Spanish- and English-language administration groups Conclusions: Failing to account for measurement differences may lead to spurious inferences regarding language group differences in level of underlying level of cognitive functioning The MIMIC model can be used to detect and adjust for such measurement differences in substantive research

...read moreread less

Journal Article•DOI•

Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function

[...]

Dennis L. Hart, Karon F. Cook¹, Karon F. Cook², Jerome E. Mioduski, Cayla R. Teal³, Paul K. Crane¹ - Show less +2 more•Institutions (3)

University of Washington¹, United States Department of Veterans Affairs², Baylor College of Medicine³

01 Mar 2006-Journal of Clinical Epidemiology

TL;DR: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability.

...read moreread less

Journal Article•DOI•

A Multidimensional Partial Credit Model With Associated Item and Test Statistics: An Application to Mixed-Format Tests

[...]

Lihua Yao¹, Richard D. Schwarz¹•Institutions (1)

CTB/McGraw Hill¹

01 Nov 2006-Applied Psychological Measurement

TL;DR: To further evaluate these models and characterize item and test functioning, multidimensional representations of statistics such as information, difficulty, and discrimination for the M-3PL and M-2PPC are presented.

...read moreread less

Abstract: Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to date along with a compensatory multidimensional three-parameter logistic model for multiple-choice data (M-3PL). Estimation of these models using Markov chain Monte Carlo methods is discussed. To further evaluate these models and characterize item and test functioning, multidimensional representations of statistics such as information, difficulty, and discrimination for the M-3PL and M-2PPC are presented. Many assessment programs have a mixture of item types in which multiple choice and constructed response are administered together. An example is presented in which the dimensional structure of a test containing mixed item types is examined. Goodnessof-fit testing under various model formulations and derived statistics are discussed. Index terms: item response theory, partial credit model, MIRT, item information, item statistics, MCMC

...read moreread less

Journal Article•DOI•

Item Response Theory with Estimation of the Latent Population Distribution Using Spline-Based Densities.

[...]

Carol M. Woods¹, David Thissen²•Institutions (2)

Washington University in St. Louis¹, University of North Carolina at Chapel Hill²

01 Jun 2006-Psychometrika

TL;DR: A simulation study shows that the new procedure is feasible in practice, and that when the latent distribution is not well approximated as normal, two-parameter logistic (2PL) item parameter estimates and expected a posteriori scores (EAPs) can be improved over what they would be with the normal model.

...read moreread less

Abstract: The purpose of this paper is to introduce a new method for fitting item response theory models with the latent population distribution estimated from the data using splines. A spline-based density estimation system provides a flexible alternative to existing procedures that use a normal distribution, or a different functional form, for the population distribution. A simulation study shows that the new procedure is feasible in practice, and that when the latent distribution is not well approximated as normal, two-parameter logistic (2PL) item parameter estimates and expected a posteriori scores (EAPs) can be improved over what they would be with the normal model. An example with real data compares the new method and the extant empirical histogram approach.

...read moreread less

Journal Article•DOI•

Measuring fatigue in Parkinson's disease: a psychometric study of two brief generic fatigue questionnaires.

[...]

Peter Hagell¹, Arja Höglund¹, Jan Reimer¹, Brita Eriksson¹, Ingmari Knutsson¹, Håkan Widner¹, David Cella¹ - Show less +3 more•Institutions (1)

Lund University¹

01 Nov 2006-Journal of Pain and Symptom Management

TL;DR: These results support the measurement validity of both questionnaires in PD, although the FACIT-F displayed better measurement precision and modest psychometric advantages over the FSS.

...read moreread less

Journal Article•DOI•

Comparison of a Qualitative and a Quantitative Approach to Developing a Household Food Insecurity Scale for Bangladesh

[...]

Jennifer Coates¹, Parke Wilde¹, Patrick Webb¹, Beatrice Lorge Rogers¹, Robert F. Houser¹ - Show less +1 more•Institutions (1)

Tufts University¹

01 May 2006-Journal of Nutrition

TL;DR: This paper compares a qualitative and a quantitative method of item assessment for developing the content of a food insecurity scale for Bangladesh and lends added confidence to the use of either scale for identifying food-insecure households in different regions of Bangladesh.

...read moreread less

Abstract: This paper compares a qualitative and a quantitative (Rasch) method of item assessment for developing the content of a food insecurity scale for Bangladesh. Data are derived from the Bangladesh Food Insecurity Measurement and Validation Study, in which researchers collected 2 rounds of ethnographic information and 3 rounds of conventional household survey data between 2001 and 2003. The qualitative method of scale development relied on content experts and respondents themselves to evaluate household food insecurity items generated through ethnographic research. The quantitative method applied the Rasch model to assess the fit of the same items using representative survey data. The Rasch model was then used to test for differential item functioning (DIF) across diverse demographic and geographic subgroups. The qualitative assessment flagged and discarded 10 items, leaving 13. The Rasch assessment of infit and outfit flagged 3 items, and the Rasch DIF test discarded another 10 items, leaving a total of 10 items in the Rasch-derived scale. The 2 scales contained 8 of the same items. The qualitatively and quantitatively derived scales were highly correlated (r = 0.96, P < 0.01), and the 2 methods located 90% of households in the same food insecurity tercile. This convergence lends added confidence to the use of either scale for identifying food-insecure households in different regions of Bangladesh. Multiple methods should continue to be applied in a systematic and transparent way to lend additional credence to the results when they converge and to pinpoint directions for further clarification where they do not.

...read moreread less

Journal Article•DOI•

Validating, improving reliability, and estimating correlation of the four subscales in the WHOQOL-BREF using multidimensional Rasch analysis.

[...]

Wen Chung Wang¹, Grace Yao², Yih Jian Tsai, Jung-Der Wang², Ching-Lin Hsieh² - Show less +1 more•Institutions (2)

National Chung Cheng University¹, National Taiwan University²

01 May 2006-Quality of Life Research

TL;DR: The 19-item WHOQOL-BREF measures more succinct latent traits than the original design, and yields not only more accurate estimates for the correlation between domains but also substantially higher reliabilities, than the standard unidimensional approach.

...read moreread less

Abstract: Objective: This study examined the construct validity, and improved the test reliability and the estimation accuracy for the correlation between domains of the WHOQOL-BREF using multidimensional Rasch analysis. Method: A total of 13,083 adults were administered the 28-item WHOQOL-BREF Taiwan version, which consists of 4 subscales (domains). The multidimensional form of the partial credit model was used to examine the fit of the 4 subscales. For comparison, each subscale individually was also fitted to the unidimensional partial credit model. Standard item fit statistics and analysis of differential item functioning (DIF) were used to check model-data fit. Results: After excluding 2 overall items and deleting 7 DIF items, the remaining items of each subscale in the WHOQOL-BREF constituted a single construct. The test reliabilities and correlations between domains obtained from the multidimensional approach, (0.82–0.86) and (0.79–0.89), respectively, were much higher than those obtained from the unidimensional approach, (0.67–0.75) and (0.53–0.65), respectively. Conclusion: The 19-item WHOQOL-BREF measures more succinct latent traits than the original design. The multidimensional approach yields not only more accurate estimates for the correlation between domains but also substantially higher reliabilities, than the standard unidimensional approach.

...read moreread less

Journal Article•DOI•

Differential item functioning (DIF) and the Mini-Mental State Examination (MMSE) : Overview, sample, and issues of translation

[...]

Mildred Ramirez, Jeanne A. Teresi, Douglas Holmes, Barry J. Gurland, Rafael Lantigua - Show less +1 more

01 Nov 2006-Medical Care

TL;DR: The use of different methods to examine DIF in relation to English and Spanish language administration of the Mini-Mental State Examination has practical and theoretical implications in the context of health disparities.

...read moreread less

Abstract: Background:Various forms of differential item functioning (DIF) in the Mini-Mental State Examination (MMSE) have been identified. Items have been found to perform differently for individuals of different educational levels, racial/ethnic groups, and/or of groups whose first language is not English.

...read moreread less

Journal Article•DOI•

Evaluation of the Edinburgh Post Natal Depression Scale using Rasch analysis.

[...]

Julie F Pallant¹, Renée L Miller¹, Alan Tennant²•Institutions (2)

Swinburne University of Technology¹, University of Leeds²

12 Jun 2006-BMC Psychiatry

TL;DR: The results of this study suggest that EPDS, in its original 10 item form, is not a viable scale for the unidimensional measurement of depression, and Rasch analysis suggests that a revised eight item version (EPDS-8) would provide a more psychometrically robust scale.

...read moreread less

Abstract: Background: The Edinburgh Postnatal Depression Scale (EPDS) is a 10 item self-rating post-natal depression scale which has seen widespread use in epidemiological and clinical studies. Concern has been raised over the validity of the EPDS as a single summed scale, with suggestions that it measures two separate aspects, one of depressive feelings, the other of anxiety. Methods: As part of a larger cross-sectional study conducted in Melbourne, Australia, a community sample (324 women, ranging in age from 18 to 44 years: mean = 32 yrs, SD = 4.6), was obtained by inviting primiparous women to participate voluntarily in this study. Data from the EPDS were fitted to the Rasch measurement model and tested for appropriate category ordering, for item bias through Differential Item Functioning (DIF) analysis, and for unidimensionality through tests of the assumption of local independence. Results: Rasch analysis of the data from the ten item scale initially demonstrated a lack of fit to the model with a significant Item-Trait Interaction total chi-square (chi Square = 82.8, df = 40; p < .001). Removal of two items (items 7 and 8) resulted in a non-significant Item-Trait Interaction total chisquare with a residual mean value for items of -0.467 with a standard deviation of 0.850, showing fit to the model. No DIF existed in the final 8-item scale (EPDS-8) and all items showed fit to model expectations. Principal Components Analysis of the residuals supported the local independence assumption, and unidimensionality of the revised EPDS-8 scale. Revised cut points were identified for EPDS-8 to maintain the case identification of the original scale. Conclusion: The results of this study suggest that EPDS, in its original 10 item form, is not a viable scale for the unidimensional measurement of depression. Rasch analysis suggests that a revised eight item version (EPDS-8) would provide a more psychometrically robust scale. The revised cut points of 7/8 and 9/10 for the EPDS-8 show high levels of agreement with the original case identification for the EPDS-10.

...read moreread less

Journal Article•DOI•

Measuring outcomes of vision rehabilitation with the Veterans Affairs Low Vision Visual Functioning Questionnaire.

[...]

Joan A. Stelmack¹, Joan A. Stelmack², Janet P. Szlyk¹, Thomas R. Stelmack¹, Thomas R. Stelmack², Paulette Demers-Turco, R. Tracy Williams², R. Tracy Williams³, D'Anna Moran, Robert W. Massof⁴ - Show less +6 more•Institutions (4)

University of Illinois at Chicago¹, Illinois College of Optometry², Loyola University Chicago³, Johns Hopkins University School of Medicine⁴

01 Aug 2006-Investigative Ophthalmology & Visual Science

TL;DR: The VA LV VFQ-48 is a sensitive measure of changes that occur in visual ability as a result of vision rehabilitation and can be used to compare programs that offer different levels of intervention and serve patients across the continuum of vision loss.

...read moreread less

Abstract: PURPOSE. To evaluate the sensitivity to change, in patients who undergo vision rehabilitation, of the Veteran Affairs (VA) Low Vision Visual Functioning Questionnaire (LV VFQ-48), which was designed to measure the difficulty visually impaired persons have in performing daily activities and to evaluate vision rehabilitation outcomes. METHODS. Before and after rehabilitation, the VA LV VFQ-48 was administered by telephone interview to subjects from five sites in the VA and private sector. Visual acuity of these subjects ranged from near normal to total blindness. RESULTS. The VA LV VFQ exhibited significant differential item functioning (DIF) for 7 of 48 items (two mobility tasks, four reading tasks, and one distance-vision task). However, the DIF was small relative to baseline changes in item difficulty for all items. Therefore, the data were reanalyzed with the constraint that item difficulties do not change with rehabilitation, which assigns all changes to the person measure. Subjects in the inpatient Blind Rehabilitation Center (BRC) program showed the largest changes in person measures after vision rehabilitation (effect size = 1.9; t-test P < 0.0001). The subjects in the outpatient programs exhibited smaller changes in person measures after rehabilitation (effect size = 0.29; t-test P < 0.01). There was no significant change in person measures for the control group (test-retest before rehabilitation). CONCLUSIONS. In addition to being a valid and reliable measure of visual ability, the VA LV VFQ-48 is a sensitive measure of changes that occur in visual ability as a result of vision rehabilitation. Patients' self-reports of the difficulty they experience performing daily activities measured with this instrument can be used to compute a single number, the person measure that can serve as an outcome measure in clinical studies. The VA LV VFQ-48 can be used to compare programs that offer different levels of intervention and serve patients across the continuum of vision loss.

...read moreread less

Journal Article•DOI•

A Generalized DIF Effect Variance Estimator for Measuring Unsigned Differential Test Functioning in Mixed Format Tests

[...]

Randall D. Penfield¹, James Algina²•Institutions (2)

University of Miami¹, University of Florida²

01 Dec 2006-Journal of Educational Measurement

TL;DR: Camilli and Penfield as discussed by the authors proposed two estimators of the DIF effect variance for tests containing dichotomous and polytomous items, which are direct extensions of the noniterative estimators developed by Camilli and penfield (1997) for tests composed of dichotomyous items and a small simulation study is reported in which the statistical properties of the generalized variance estimators are assessed.

...read moreread less

Abstract: One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items The proposed estimators are direct extensions of the noniterative estimators developed by Camilli and Penfield (1997) for tests composed of dichotomous items A small simulation study is reported in which the statistical properties of the generalized variance estimators are assessed, and guidelines are proposed for interpreting values of DIF effect variance estimators

...read moreread less