scispace - formally typeset
Search or ask a question

Showing papers on "Differential item functioning published in 2021"


Journal ArticleDOI
21 Mar 2021
TL;DR: In this article, the authors used multigroup confirmatory factor analysis (CFA) and Rasch differential item functioning (DIF) to examine the measurement invariance of the FCV-19S across country, gender and age (children aged below 18 years, young to middle-aged adults aged between 18 and 60 years, and older people aged above 60 years).
Abstract: AIM: The threats of novel coronavirus disease 2019 (COVID-19) have caused fears worldwide. The Fear of COVID-19 Scale (FCV-19S) was recently developed to assess the fear of COVID-19. Although many studies found that the FCV-19S is psychometrically sound, it is unclear whether the FCV-19S is invariant across countries. The present study aimed to examine the measurement invariance of the FCV-19S across eleven countries. DESIGN: Cross-sectional study. METHODS: Using data collected from prior research on Bangladesh (N = 8,550), United Kingdom (N = 344), Brazil (N = 1,843), Taiwan (N = 539), Italy (N = 249), New Zealand (N = 317), Iran (N = 717), Cuba (N = 772), Pakistan (N = 937), Japan (N = 1,079) and France (N = 316), comprising a total 15,663 participants, the present study used the multigroup confirmatory factor analysis (CFA) and Rasch differential item functioning (DIF) to examine the measurement invariance of the FCV-19S across country, gender and age (children aged below 18 years, young to middle-aged adults aged between 18 and 60 years, and older people aged above 60 years). RESULTS: The unidimensional structure of the FCV-19S was confirmed. Multigroup CFA showed that FCV-19S was partially invariant across country and fully invariant across gender and age. DIF findings were consistent with the findings from multigroup CFA. Many DIF items were displayed for country, few DIF items were displayed for age, and no DIF items were displayed for gender. CONCLUSION: Based on the results of the present study, the FCV-19S is a good psychometric instrument to assess fear of COVID-19 during the pandemic period. Moreover, the use of FCV-19S is supported in at least ten countries with satisfactory psychometric properties.

90 citations


Journal ArticleDOI
TL;DR: The findings suggest that the C-EDE-QS could be a useful tool to assess key attitudes and behavioral features of eating disorder psychopathologies in the Chinese context.
Abstract: As a 12-item Short Form of the Eating Disorder Examination Questionnaire (EDE-QS), the EDE-QS was developed based on Rasch modeling to address certain weaknesses of the EDE-Q, and it has been demonstrated to be a psychometrically sound measure. Thus, the current study aimed to obtain a Chinese version of the EDE-QS and validate its psychometric properties in the Chinese context. According to standard procedures, the Chinese version of the EDE-QS (C-EDE-QS) was obtained. A total of 1068 Chinese college students finished the survey. The psychometric properties of the C-EDE-QS were examined under the frameworks of both classic test theory and Rasch modeling. The one-factor structure of the C-EDE-QS was confirmed in confirmatory factor analysis; the C-EDE-QS showed good reliability with a Cronbach’s α of 0.89; and the total scores of the C-EDE-QS were significantly correlated with eating disturbances and psychological distress in expected magnitudes and directions. Rasch analysis supported the unidimensional construct of the C-EDE-QS and the four-point rating scale structure. However, results revealed differential item functioning (DIF) across gender groups. The findings suggest that the C-EDE-QS could be a useful tool to assess key attitudes and behavioral features of eating disorder psychopathologies in the Chinese context. V, descriptive (cross-sectional) study.

57 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce a procedure to analyze and control for common method variance in one's data, based on a series of factor analysis models with a random intercept, which yields constructs and factor scores free of method effects.
Abstract: Measurement invariance is necessary before any substantive cross-national comparisons can be made. The statistical workhorse for conducting measurement invariance analyses is the multigroup confirmatory factor analysis model. This model works well if a few items exhibit clearly differential item functioning, but it is not able to capture, model, and control for measurement bias that affects all items, i.e., this model cannot account for common method variance. The presence of common method variance in cross-national data leads to poorly fitting models which in turn often results in biased, if not incorrect, results. We introduce a procedure to analyze and control for common method variance in one’s data, based on a series of factor analysis models with a random intercept. The modeling framework yields constructs and factor scores free of method effects. We use marker variables to support the validity of the interpretation of the random intercept as method factor. An empirical application dealing with material values in Spain, the UK, and Brazil is provided. We compare results with those obtained for the standard multigroup confirmatory factor analysis model.

33 citations


Journal ArticleDOI
TL;DR: This article demonstrates how the current three-step approach to latent class modeling should be modified to account for MNI, and proposes a model-building strategy that makes the new methodology practically applicable also when it is unknown which of the external variables cause DIF.
Abstract: The practice of latent class (LC) modeling using a bias-adjusted three-step approach has become widely popular. However, the current three-step approach has one important drawback – its key assumpt...

30 citations


Journal ArticleDOI
TL;DR: A novel measurement scale of healthy ageing using worldwide cohorts, due to its reliability and global representativeness, has the potential to contribute to worldwide research on healthy ageing.
Abstract: Background Research efforts to measure the concept of healthy ageing have been diverse and limited to specific populations. This diversity limits the potential to compare healthy ageing across countries and/or populations. In this study, we developed a novel measurement scale of healthy ageing using worldwide cohorts. Methods In the Ageing Trajectories of Health-Longitudinal Opportunities and Synergies (ATHLOS) project, data from 16 international cohorts were harmonized. Using ATHLOS data, an item response theory (IRT) model was used to develop a scale with 41 items related to health and functioning. Measurement heterogeneity due to intra-dataset specificities was detected, applying differential item functioning via a logistic regression framework. The model accounted for specificities in model parameters by introducing cohort-specific parameters that rescaled scores to the main scale, using an equating procedure. Final scores were estimated for all individuals and converted to T-scores with a mean of 50 and a standard deviation of 10. Results A common scale was created for 343 915 individuals above 18 years of age from 16 studies. The scale showed solid evidence of concurrent validity regarding various sociodemographic, life and health factors, and convergent validity with healthy life expectancy (r = 0.81) and gross domestic product (r = 0.58). Survival curves showed that the scale could also be predictive of mortality. Conclusions The ATHLOS scale, due to its reliability and global representativeness, has the potential to contribute to worldwide research on healthy ageing.

26 citations


Journal ArticleDOI
TL;DR: In this paper, a survey containing the 13-item Patient Activation Measure was completed by 942 patients with CKD, not treated with dialysis, and data quality was assessed by mean, item response, missing values, floor and ceiling effects, internal consistency (Cronbach's alpha and average interitem correlation), and item-rest correlations.
Abstract: BACKGROUND AND OBJECTIVES Despite the increasing prioritization of the promotion of patient activation in nephrology, its applicability to people with CKD is not well established. Before the Patient Activation Measure is universally adopted for use in CKD, it is important to critically evaluate this measure. The aim of this study was to describe the psychometric properties of the Patient Activation Measure in CKD. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS A survey containing the 13-item Patient Activation Measure was completed by 942 patients with CKD, not treated with dialysis. Data quality was assessed by mean, item response, missing values, floor and ceiling effects, internal consistency (Cronbach's alpha and average interitem correlation), and item-rest correlations. Rasch modeling was used to assess item performance and scaling (item statistics, person and item reliability, rating scale diagnostics, factorial test of residuals, and differential item functioning). RESULTS The item response was high, with a small number of missing values (<1%). Floor effect was small (range 1%-5%), but the ceiling effect was above 15% for nine items (range 15%-38%). The Patient Activation Measure demonstrated good internal consistency overall (Cronbach α=0.925, and average interitem correlation 0.502). The difficulty of the Patient Activation Measure items ranged from -0.90 to 0.86. Differential item functioning was found for disease type (item 3) and age (item 12). The person separation index was 9.48 and item separation index was 3.21. CONCLUSIONS The 13-item Patient Activation Measure appears to be a suitably reliable and valid instrument for assessing patient activation in CKD. In the absence of a kidney-specific instrument, our results support the 13-item Patient Activation Measure as a promising measure to assess activation in those with CKD, although consideration for several items is warranted. The high ceiling effect may be a problem when using the 13-item Patient Activation Measure to measure changes over time.

23 citations


Journal ArticleDOI
TL;DR: The theoretical background for PROM translation, adaptation, and cross‐cultural validation is presented, and how PROMs used in sports medicine research have been translated and adapted are assessed.
Abstract: Translating patient-reported outcome measures (PROMs) can alter the meaning of items and undermine the PROM's psychometric properties (quantified as cross-cultural differential item functioning [DIF]). The aim of this paper was to present the theoretical background for PROM translation, adaptation, and cross-cultural validation, and assess how PROMs used in sports medicine research have been translated and adapted. We also assessed DIF for the Knee Injury and Osteoarthritis Outcome Score (KOOS) across Danish, Norwegian, and Swedish versions. We conducted a search in PubMed and Scopus to identify the method of translation, adaptation, and validation of PROMs relevant to musculoskeletal research. Additionally, 150 preoperative KOOS questionnaires were obtained from the Scandinavian knee ligament reconstruction registries, and cross-cultural DIF was evaluated using confirmatory factor analysis and Rasch analysis. There were 392 studies identified, describing the translation of 61 PROMs. Ninety-four percent were performed with forward-backward technique. Forty-nine percent used cognitive interviews to ensure appropriate wording, understandability, and adaptation to the target culture. Only two percent were validated according to modern test theory. No study assessed cross-cultural DIF. One KOOS subscale showed no cross-cultural DIF, two had DIF with respect to some (but not all) items, and thus conversion tables could be constructed, and two KOOS subscales could not be pooled. Most PROM translations are of undocumented quality, despite the common conclusion that they are valid and reliable. Scores from three of five KOOS subscales can be pooled across the Danish, Norwegian, and Swedish versions, but two of these must be adjusted for DIF.

21 citations


Journal ArticleDOI
TL;DR: The authors introduce a framework for discussing the differences between countries in student achievement in international large-scale assessments in education, where the primary goal of the assessment is to compare the performance of countries in education.
Abstract: One of the primary goals of international large-scale assessments in education is the comparison of country means in student achievement. This article introduces a framework for discussing differen...

18 citations


Journal ArticleDOI
TL;DR: An instrument that measures in-service teachers’ self-efficacy and outcome expectancy beliefs for teaching CT was developed and validated and did not reflect bias with gender, race, or teaching experience.
Abstract: Despite a growing recognition that K-12 teachers should be prepared to teach students computational thinking (CT) skills across disciplines, there is a lack of valid instrumentation that measures teachers’ efficacy beliefs to do so. This study addresses this problem by developing and validating an instrument that measures in-service teachers’ self-efficacy beliefs for teaching CT. In parallel, we conducted a regression analysis to predict teachers’ self-efficacy and outcome expectancy beliefs for teaching CT based on demographic traits of the respondents. We surveyed a total of 330 K-12 in-service teachers. A combination of classical test theory and item response theory Rasch was used to validate the instrument. Our results yielded a valid and reliable tool measuring teaching efficacy beliefs for CT. Based on the differential item functioning analysis, the instrument did not reflect bias with gender, race, or teaching experience. Additionally, a regression analysis did not reveal significant predictors using teachers’ demographic characteristics. This suggests a need for looking at other factors that may significantly predict K-12 teachers’ teaching efficacy beliefs for CT to inform theory and practice around successful CT teaching and learning. Furthermore, we provide implications for the instrument we have developed.

16 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the psychometric properties of the Italian version of the Bergen Facebook Addiction Scale (BFAS) among adolescents and young adults, and found that the scale assesses medium and high levels of the trait, and that it is useful in order to discriminate different levels of Problematic Facebook use within this range of trait, in which the scale is sufficiently informative.
Abstract: The Bergen Facebook Addiction Scale (BFAS) is widely used, but psychometric evidence by applying Item Response Theory (IRT) is lacking. Considering the advantages of this psychometric approach, the aim of study was to investigate the psychometric properties of the Italian version of the BFAS among adolescents and young adults. Participants were 1134 (50% males, Mean age = 20.7, SD = 3.5, range = 14-33 years) Italian high school students and undergraduates. The unidimensionality of the scale was confirmed (χ2/df = 2.8, CFI = 0.99, TLI = 0.98, and RMSEA = 0.04 [C.I. = 0.02-0.06]) and IRT analysis showed that the scale assesses medium and high levels of the trait, and that it is useful in order to discriminate different levels of Problematic Facebook use (PFU) within this range of trait, in which the scale is sufficiently informative. The relationships of BFAS θ scores with theoretically related constructs provided support to the validity of the scale. In accordance with previous studies, BFAS scores were positively correlated with Problematic Internet use and problematic Social Network use, negatively correlated with self-esteem, and positively related to loneliness. The Differential Item Functioning (DIF) analysis showed that BFAS is invariant across gender, and only one item had uniform and small-in-size DIF. Additionally, we tested age invariance. Since only 17% of the BFAS items were non-invariant, we determined that the BFAS exhibited minor non-invariance as a whole. An analysis of the adequacy of the polythetic and monothetic criteria to define the range of the trait indicative of problematic use was also conducted. Overall, this study offers evidence that BFAS is a valuable and useful scale for measuring high levels of PFU among Italian adolescents and young adults.

15 citations


Journal ArticleDOI
01 Jun 2021-BMJ Open
TL;DR: In this article, the authors developed a psychometrically reliable instrument to assess psychological distress during the COVID-19 pandemic across Arab countries, which is called COVID19 Psychological Distress Scale (CPDS).
Abstract: Objective To develop a psychometrically reliable instrument to assess psychological distress during the COVID-19 pandemic across Arab countries. Design The new instrument was developed through the review of relevant literature. We adapted multiple items from the following tools: The Fear of COVID-19 Scale, Social Phobia Inventory, Health Anxiety Inventory, Swine Influenza Anxiety Scale and the Arabic Scale of Death Anxiety to design our new assessment tool which is called COVID-19 Psychological Distress Scale (CPDS). For psychometric analyses and validation, we conducted a cross-sectional study that solicited data through a web-based survey using the newly developed CPDS. Setting and participants This validation study was conducted in four Arab countries, including Algeria, Kuwait, Saudi Arabia and Yemen. A total of 1337 participants from these countries have voluntarily responded to our survey questionnaire that included the newly developed scale. Results The final version of the CPDS comprised 12 items. Participants from Algeria (n=447), Kuwait (n=437), Saudi Arabia (n=160) and Yemen (n=293) have completed the 12-item CPDS. Exploratory factor analysis (used on the Algerian sample) suggested a two-factor structure of the CPDS. The two-factor structure was then supported by the confirmatory factor analysis with an independent sample. Additionally, Rasch analyses showed that all the items fit well in their embedded construct; only one item showed somewhat substantial differential item functioning across gender and country. Conclusion The 12-item CPDS was found to be measurement invariant across country and gender. The CPDS, with its promising psychometric properties, might help healthcare professionals to identify people with COVID-19-induced psychological distress.

Journal ArticleDOI
TL;DR: In this paper, the authors applied the Graded Response Model (GRM) within the Item Response Theory (IRT) framework, and analyzed the psychometric properties of the Creative Self-Efficacy scale.
Abstract: Applying the Graded Response Model (GRM) within the Item Response Theory (IRT) framework, the present study analyzes the psychometric properties of the Creative Self-Efficacy scale (Karwowski, 2012, 2014; Karwowski, Lebuda, & Wiśniewska, 2018). With an ethnically diverse sample of U.S. college students, the results suggested that the 6 items of the CSE scale were well fitted to a latent unidimensional structure. The scale also had adequate measurement precision or reliability, high levels of item discrimination, and an appropriate range of item difficulty. Gender-based differential item functioning (DIF) analyses confirmed that there were no differences in the measurement results of the scale concerning gender. Additionally, Openness to Experience was found to be positively related to the CSE scale scores, providing some support for the scale’s convergent validity. Collectively, these results confirmed the psychometric soundness of the CSE scale for measuring creative self-efficacy and also identified avenues for future research.

Journal ArticleDOI
TL;DR: In this paper, a non-technical summary of classical test theory and modern test theory (MTT) is provided, and three MTT methods were used to validate the two subscales (Symptoms and quality of life) from the Knee Injury and Osteoarthritis Outcome Score (KOOS).
Abstract: The aim was to provide an overview of the different statistical methods for validation of Patient reported outcome measures, ranging from simple statistical methods available in all software packages to advanced statistical models that require specialized software. A non-technical summary of classical test theory (CTT) and modern test theory (MTT) is provided. Specifically, confirmatory factor analysis, item response theory, and Rasch analysis is outlined. One CTT and three MTT methods were used to validate the two subscales (Symptoms and Quality of Life) from the Knee Injury and Osteoarthritis Outcome Score (KOOS). For each methodology, two analyses were considered: (i) a uni-dimensional analysis ignoring the pre-specified dimensionality, and (ii) a two-dimensional analysis using the pre-specified dimensionality. While CTT did not adequately address central issues regarding the validity of the KOOS subscales, the three MTT methods yielded very similar results. In conclusion, MTT methods offer analysis of all relevant properties related to the validity of Patient reported outcome measures, while this is not the case for CTT. Claims about sufficient validity based on CTT methods are inadequate and should not be trusted.

Journal ArticleDOI
TL;DR: In this article, the authors evaluated the psychometric properties of refractive error-specific quality of life (QoL) item banks and assessed their performance using computerised adaptive testing (CAT) simulations.

Journal ArticleDOI
TL;DR: In this article, the authors examined the measurement properties of the autism spectrum quality of life form (ASQoL), a new measure of QoL designed specifically for autistic people, using data from 700 autistic adults recruited from the Simons Foundation's SPARK cohort.
Abstract: Although many interventions and services for autistic people have the ultimate goal of improving quality of life (QoL), there is relatively little research on how best to assess this construct in the autistic population, and existing scales designed for non-autistic individuals may not assess all meaningful facets of QoL in the autistic population. To address this need, the autism spectrum QoL form (ASQoL) was recently developed as a measure of the autism-relevant quality of life. However, the psychometrics of the ASQoL have not been examined beyond the authors' initial validation study, and important properties such as measurement invariance/differential item functioning (DIF) have not yet been tested. Using data from 700 autistic adults recruited from the Simons Foundation's SPARK cohort, the current study sought to perform a comprehensive independent psychometric evaluation of the ASQoL using item response theory, comparing its performance to a newly-proposed brief measure of general QoL (the WHOQOL-4). Our models revealed substantial DIF by sex and gender in the ASQoL, which caused ASQoL scores to grossly underestimate the self-reported QoL of autistic women. Based on a comparison of latent variable means, we demonstrated that observed sex/gender differences in manifest ASQoL scores were the result of statistical artifacts, a claim that was further supported by the lack of significant group differences on the sex/gender-invariant WHOQOL-4. Our findings indicate that the ASQoL composite score is psychometrically problematic in its current form, and substantial revisions may be necessary before valid and meaningful inferences can be made regarding autism-relevant aspects of QoL. LAY SUMMARY: Quality of life (QoL) is an extremely important outcome for autistic people, but many of the tools that are used to measure it does not take into account how QoL may be different for autistic people. Using data from 700 autistic adults, we examined the measurement properties of the autism spectrum quality of life form (ASQoL), a new measure of QoL designed specifically for autistic people. Our results indicate that the ASQoL shows a pronounced sex/gender bias, which causes it to underestimate QoL in autistic women. This bias needs to be eliminated before the ASQoL can be successfully used to measure QoL in the autistic population.

Journal ArticleDOI
TL;DR: Williams et al. as mentioned in this paper conducted an in-depth psychometric analysis of the TAS-20 in a large sample of 743 cognitively able autistic adults recruited from the Simons Foundation SPARK participant pool and 721 general population controls enrolled in large international psychological study.
Abstract: Background Alexithymia, a personality trait characterized by difficulties interpreting emotional states, is commonly elevated in autistic adults, and a growing body of literature suggests that this trait underlies several cognitive and emotional differences previously attributed to autism. Although questionnaires such as the 20-item Toronto Alexithymia Scale (TAS-20) are frequently used to measure alexithymia in the autistic population, few studies have investigated the psychometric properties of these questionnaires in autistic adults, including whether differential item functioning (I-DIF) exists between autistic and general population adults. Methods This study is a revised version of a previous article that was retracted due to copyright concerns (Williams and Gotham in Mol Autism 12:1-40). We conducted an in-depth psychometric analysis of the TAS-20 in a large sample of 743 cognitively able autistic adults recruited from the Simons Foundation SPARK participant pool and 721 general population controls enrolled in a large international psychological study. The factor structure of the TAS-20 was examined using confirmatory factor analysis, and item response theory was used to generate a subset of the items that were strong indicators of a "general alexithymia" factor. Correlations between alexithymia and other clinical outcomes were used to assess the nomological validity of the new alexithymia score in the SPARK sample. Results The TAS-20 did not exhibit adequate model fit in either the autistic or general population samples. Empirically driven item reduction was undertaken, resulting in an 8-item general alexithymia factor score (GAFS-8, with "TAS" no longer referenced due to copyright) with sound psychometric properties and practically ignorable I-DIF between diagnostic groups. Correlational analyses indicated that GAFS-8 scores, as derived from the TAS-20, meaningfully predict autistic trait levels, repetitive behaviors, and depression symptoms, even after controlling for trait neuroticism. The GAFS-8 also presented no meaningful decrement in nomological validity over the full TAS-20 in autistic participants. Limitations Limitations of the current study include a sample of autistic adults that was majority female, later diagnosed, and well educated; clinical and control groups drawn from different studies with variable measures; only 16 of the TAS-20 items being administered to the non-autistic sample; and an inability to test several other important psychometric characteristics of the GAFS-8, including sensitivity to change and I-DIF across multiple administrations. Conclusions These results indicate the potential of the GAFS-8 to robustly measure alexithymia in both autistic and non-autistic adults. A free online score calculator has been created to facilitate the use of norm-referenced GAFS-8 latent trait scores in research applications (available at https://asdmeasures.shinyapps.io/alexithymia ).

Journal ArticleDOI
TL;DR: The developed ultra-brief (SSOSH-3) and revised ( SSOSH-7) versions of the Self-Stigma of Seeking Help scale were highly correlated with the original SSOSH across samples and demonstrated significant correlations with help-seeking constructs and in similar magnitude to the originalSSOSH.
Abstract: The current research developed ultra-brief (SSOSH-3) and revised (SSOSH-7) versions of the Self-Stigma of Seeking Help scale. Item response theory was used to examine the amount of information each item provided across the latent variable scale and test whether items functioned differently across women and men. In a sample of 857 community adults, results supported removal of three reverse-scored items to create the SSOSH-7. The three most informative items were retained to create the SSOSH-3. Differential item functioning testing supported the use of both versions across women and men. Results replicated in an undergraduate student sample (n = 661). In both samples, the SSOSH-3 (αs = .82-.87) and SSOSH-7 (αs = .87-.89) demonstrated evidence of internal consistency. The SSOSH-3 (rs ≥ .89) and SSOSH-7 (rs ≥ .97) were highly correlated with the original SSOSH across samples and demonstrated significant correlations with help-seeking constructs and in similar magnitude to the original SSOSH.

Journal ArticleDOI
TL;DR: In this article, construction-based figural matrices tasks are of particular interest when it comes to high-stakes testing, due to their high item difficulties and excellent psychometric properties.
Abstract: . Due to their high item difficulties and excellent psychometric properties, construction-based figural matrices tasks are of particular interest when it comes to high-stakes testing. An im...

Journal ArticleDOI
TL;DR: This research sought to assess the psychometric properties of the French versions of the Body checking Questionnaire and the Body Checking Cognitions Scale among community samples and supported the criterion-related validity of ratings on both measures with measures of global self-esteem, physical appearance, social physique anxiety, fear of negative appearance evaluation, and disturbed eating attitudes and behaviors.
Abstract: This research sought to assess the psychometric properties of the French versions of the Body Checking Questionnaire and the Body Checking Cognitions Scale among community samples. A total sample of 922 adolescents and adults was involved in a series of two studies. The results from the first study supported factor validity and reliability of responses obtained on these two measures, and showed that both measures were best represented by a bifactor-exploratory structural equation modeling representation of the data. The results from the second study replicated these conclusions, while also supporting the measurement invariance of the bifactor-exploratory structural equation modeling solution and the equivalence of the correlations among the two measures (i.e., convergent validity) across samples. This second study also supported the criterion-related validity of ratings on both measures with measures of global self-esteem, physical appearance, social physique anxiety, fear of negative appearance evaluation, and disturbed eating attitudes and behaviors. Finally, the results of this last study also supported the measurement invariance and lack of differential item functioning of both measures in relation to sex, age, diagnosis of eating disorders, and body mass index.

Journal ArticleDOI
TL;DR: In this paper, the authors compared mean-mean linking, log-mean-means linking, invariance alignment, Haberman linking, asymmetric and symmetric Haebara linking, different recalibration linking methods, anchored item parameters, and concurrent calibration.
Abstract: This article investigates the comparison of two groups based on the two-parameter logistic item response model. It is assumed that there is random differential item functioning in item difficulties and item discriminations. The group difference is estimated using separate calibration with subsequent linking, as well as concurrent calibration. The following linking methods are compared: mean-mean linking, log-mean-mean linking, invariance alignment, Haberman linking, asymmetric and symmetric Haebara linking, different recalibration linking methods, anchored item parameters, and concurrent calibration. It is analytically shown that log-mean-mean linking and mean-mean linking provide consistent estimates if random DIF effects have zero means. The performance of the linking methods was evaluated through a simulation study. It turned out that (log-)mean-mean and Haberman linking performed best, followed by symmetric Haebara linking and a newly proposed recalibration linking method. Interestingly, linking methods frequently found in applications (i.e., asymmetric Haebara linking, recalibration linking used in a variant in current large-scale assessment studies, anchored item parameters, concurrent calibration) perform worse in the presence of random differential item functioning. In line with the previous literature, differences between linking methods turned out be negligible in the absence of random differential item functioning. The different linking methods were also applied in an empirical example that performed a linking of PISA 2006 to PISA 2009 for Austrian students. This application showed that estimated trends in the means and standard deviations depended on the chosen linking method and the employed item response model.

Journal ArticleDOI
TL;DR: In this article, the reliability and validity of the Norwegian version of the Patient-Reported Outcome Measurement System®-Profile 57 (PROMIS-57) questionnaire in a general population sample, n = 1.408, were explored.
Abstract: The aims of this cross-sectional study were to explore reliability and validity of the Norwegian version of the Patient-Reported Outcome Measurement System®—Profile 57 (PROMIS-57) questionnaire in a general population sample, n = 408, and to examine Item Response properties and factor structure. Reliability measures were obtained from factor analysis and item response theory (IRT) methods. Correlations between PROMIS-57 and RAND-36-item health survey (RAND36) were examined for concurrent and discriminant validity. Factor structure and IRT assumptions were examined with factor analysis methods. IRT Item and model fit and graphic plots were inspected, and differential item functioning (DIF) for language, age, gender, and education level were examined. PROMIS-57 demonstrated excellent reliability and satisfactory concurrent and discriminant validity. Factor structure of seven domains was supported. IRT assumptions were met for unidimensionality, local independence, monotonicity, and invariance with no DIF of consequence for language or age groups. Estimated common variance (ECV) per domain and confirmatory factor analysis (CFA) model fit supported unidimensionality for all seven domains. The GRM IRT Model demonstrates acceptable model fit. The psychometric properties and factor structure of Norwegian PROMIS-57 were satisfactory. Hence, the 57-item questionnaire along with PROMIS-29, and the corresponding 8 and 4 item short forms for physical function, anxiety, depression, fatigue, sleep disturbance, social participation ability and pain interference, are considered suitable for use in research and clinical care in Norwegian populations. Further studies on longitudinal reliability and sensitivity in patient populations and for Norwegian item calibration and/or reference scores are needed.

Journal ArticleDOI
TL;DR: The Persian YFAS-C is a valid instrument that assists healthcare providers in assessing food addiction among Iranian adolescents and significant and moderate correlations were found between it and other psychometric scales assessing eating symptomatology and general psychopathology.
Abstract: To examine whether the child/adolescent version of the Yale Food Addiction Scale (YFAS-C) is valid to assess the Iranian adolescents who are overweight. After using an internationally standardized method to translate the YFAS-C into Persian, 1186 overweight/obese adolescents aged between 13 and 18 years participated in the present study [666 males; mean age = 15.5 (SD = 1.9) years; zBMI = 2.5 (1.0) kg/m2]. All the participants completed the Persian YFAS-C alongside Persian versions of the following scales: Eating Disorder Examination Questionnaire (EDEQ), Clinical Impairment Assessment (CIA), Binge Eating Scale (BES), Eating Attitudes Test (EAT-26), and Depression, Anxiety, Stress Scale (DASS-21). At the scale level, confirmatory factor analysis verified the single-factor structure of the Persian YFAS-C. Additionally, the Persian YFAS-C had promising properties regarding internal consistency (KR20 = 0.81), test–retest reliability (intraclass correlation coefficient = 0.83), separation reliability (person separation reliability = 0.77; item separation reliability = 0.98), and separation index (person separation index = 2.04; item separation index = 8.01). At the item level, all items had satisfactory properties in factor loadings, corrected item-total correlation, test–retest reliability, and infit and outfit mean square. Moreover, no substantial differential item functioning (DIF) was found concerning gender (male vs. female) or weight status (overweight vs. obesity). Significant and moderate correlations were found between the Persian YFAS-C and other psychometric scales assessing eating symptomatology and general psychopathology (r = 0.352 to 0.484). The Persian YFAS-C is a valid instrument that assists healthcare providers in assessing food addiction among Iranian adolescents. Level V, cross-sectional descriptive study.

Journal ArticleDOI
TL;DR: In this paper, a cross-sectional study design was conducted with 327 randomly selected farmers and pastoralists in five districts in three regions in Ethiopia, where the structured questionnaire consisted of 48 items to evaluate knowledge (24), attitude (9), and prevention practices (15) related to zoonotic diseases risks from livestock birth products.

Journal ArticleDOI
TL;DR: A multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners to solve problems showed better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality.
Abstract: This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.

Journal ArticleDOI
TL;DR: The CCT procedure as mentioned in this paper is based on a between-subjects design comparing samples from two different cultures who complete a measure in either the same or a different language version, and demonstrates in a simulation study and illustrate in an empirical example with actual cross-cultural data how performing multiple pairwise comparisons across (a) groups differing in culture but not in language, (b) groups differed in language but not IN culture, and (c) group differing in both culture and language allows to pinpoint the source of item non-invariance with high specificity.
Abstract: Comparability of measurement across different cultural groups is an essential prerequisite for any cross-cultural assessment. However, cross-cultural measurement invariance is rarely achieved and detecting the source of noninvariance is often challenging. In particular, when different language versions of a measure are administered to different cultural groups, noninvariance on certain items may originate either from translation inconsistencies (translation bias) or from actual differences between cultural groups (culture bias). If, on the other hand, a measure is administered in a common language version (e.g., English), item noninvariance may also result from comprehension issues of nonnative speakers (comprehension bias). Here, we outline a procedure suitable for dissociating these sources of item noninvariance, termed the culture, comprehension, and translation bias (CCT) procedure. The CCT procedure is based on a between-subjects design comparing samples from two different cultures who complete a measure in either the same or a different language version. We demonstrate in a simulation study and illustrate in an empirical example with actual cross-cultural data how performing multiple pairwise comparisons across (a) groups differing in culture but not in language, (b) groups differing in language but not in culture, and (c) groups differing in both culture and language allows to pinpoint the source of item noninvariance with high specificity. The CCT procedure thus provides a valuable tool for improving cross-cultural assessment through directing the process of item translation and cultural adaptation. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

Journal ArticleDOI
TL;DR: In this article, the Experiences of Sex Work Stigma (ESWS) scale was developed to assess multiple domains of sex work stigma and can be used to examine the effects of intersectional stigma on HIV-related outcomes across settings.
Abstract: While HIV stigma has received significant attention, limited work has been conducted on the measurement of intersecting stigmas. We developed the Experiences of Sex Work Stigma (ESWS) scale in the Dominican Republic (DR) and Tanzania. We conducted in-depth interviews with 20 female sex workers (FSW) per country to identify scale domains followed by cognitive debriefing interviews to assess content validity. Items were administered in a survey to FSW in DR (n = 211) and Tanzania (n = 205). Factor analysis established four sex work stigma domains including: shame (internalized), dignity (resisted), silence (anticipated) and treatment (enacted). Reliability across domains ranged from 0.81 to 0.93. Using item response theory (IRT) we created context-specific domain scores accounting for differential item functioning between countries. ESWS domains were associated with internalized HIV stigma, depression, anxiety, sexual partner violence and social cohesion across contexts. The ESWS is the first reliable and valid scale to assess multiple domains of sex work stigma and can be used to examine the effects of this form of intersectional stigma on HIV-related outcomes across settings.

Journal ArticleDOI
TL;DR: The authors investigated the potential for a shared first-language (shared-L1) effect on second language (L2) listening test scores using differential item functioning (DIF) analyses.
Abstract: In this study we investigated the potential for a shared-first-language (shared-L1) effect on second language (L2) listening test scores using differential item functioning (DIF) analyses. We did t...

Journal ArticleDOI
TL;DR: A detailed psychometric analysis of the Biographical Inventory of Creative Behaviors (BICB), a 34-item yes/no checklist of common creative activities that has become one of the most popular self-report measures of everyday creative behaviors is presented in this article.

Journal ArticleDOI
05 Mar 2021-Medicine
TL;DR: In this article, the reliability and validity of the modified Barthel Index as an evaluation tool of activities of daily living in ischemic stroke patients by applying the Rasch analysis was examined.

Journal ArticleDOI
Wei Chen1, Yuxin Liang1, Xingyu Yin1, Xingrong Zhou1, Rongfen Gao1 
TL;DR: The Fear of COVID-19 Scale (FCV-19S) is a new one-dimensional scale used to measure fear of an individual about the COVID19 as mentioned in this paper.
Abstract: The Fear of COVID-19 Scale (FCV-19S) is a new one-dimensional scale used to measure fear of an individual about the COVID-19. Given the seriousness of the COVID-19 situation in China when our study was taking place, our aim was to translate and examine the applicability of the FCV-19S in Chinese students. The sample used for validation comprised 2,445 Chinese students. The psychometrical characteristics of the Chinese FCV-19S (FCV-19S-C) were tested using Rasch analysis. Principal component analysis (PCA) proved the unidimensional structure of the model. Both infit and outfit mean square (MNSQ) values (0.69-1.31) and point-measure correlations (0.82-0.86) indicated a good model fit. Person-item separation and reliability values indicated good reliability of the scale. The person-item map revealed an acceptable level of match between the persons and the items. Differential item functioning of the FCV-19S-C showed no differences with respect to age or gender. FCV-19S-C scores were significantly associated with anxiety, stress, depression, ego-resilience, and general health. The FCV-19S-C was proven to be effective in measuring fear of Chinese students about the COVID-19.