scispace - formally typeset
Search or ask a question

Showing papers on "Item response theory published in 2008"


Journal ArticleDOI
TL;DR: The Japanese versions of the K6 and K10 demonstrated screening performances essentially equivalent to those of the original English versions, and Stratum‐specific likelihood ratios (SSLRs) were strikingly similar between the Japanese and the original versions.
Abstract: Two new screening scales for psychological distress, the K6 and K10, have been developed using the item response theory and shown to outperform existing screeners in English. We developed their Japanese versions using the standard back-translaton method and included them in the World Mental Health Survey Japan (WMH-J), which is a psychiatric epidemiologic study conducted in seven communities across Japan with 2436 participants. The WMH-J used the WMH Survey Initiative version of the Composite International Diagnostic Interview (CIDI) to assess the 30-day Diagnostic and Statistical Manual of Mental Disorders--Fourth Edition (DSM-IV). Performance of the two screening scales in detecting DSM-IV mood and anxiety disorders, as assessed by the areas under receiver operating characteristic curves (AUCs), was excellent, with values as high as 0.94 (95% confidence interval = 0.88 to 0.99) for K6 and 0.94 (0.88 to 0.995) for K10. Stratum-specific likelihood ratios (SSLRs), which express screening test characteristics and can be used to produce individual-level predicted probabilities of being a case from screening scale scores and pretest probabilities in other samples, were strikingly similar between the Japanese and the original versions. The Japanese versions of the K6 and K10 thus demonstrated screening performances essentially equivalent to those of the original English versions.

970 citations


Journal ArticleDOI
TL;DR: A review of efforts to assess the invariance of measurement instruments across different respondent groups using confirmatory factor analysis (CFA) is provided for the years since the Vandenberg and Lance [Vandenberg, R. J., & Lance, C. E. as mentioned in this paper ].

537 citations


Journal ArticleDOI
TL;DR: The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.

394 citations


Journal ArticleDOI
TL;DR: This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels.
Abstract: Probabilistic models with one or more latent variables are designed to report on a corresponding number of skills or cognitive attributes. Multidimensional skill profiles offer additional information beyond what a single test score can provide, if the reported skills can be identified and distinguished reliably. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods such as Markov chain Monte Carlo, since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The paper uses one member of a larger class of diagnostic models, a compensatory diagnostic model for dichotomous and partial credit data. Many well-known models, such as univariate and multivariate versions of the Rasch model and the two-parameter logistic item response theory model, the generalized partial credit model, as well as a variety of skill profile models, are special cases of this GDM. In addition to an introduction to this model, the paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL Internet-based testing.

380 citations


Journal ArticleDOI
TL;DR: In this paper, a definitional boundary of the space of diagnostic classification models (DCM) is developed, core DCM within this space are reviewed, and their defining features are compared and contrasted with those of other latent variable models.
Abstract: Diagnostic classification models (DCM) are frequently promoted by psychometricians as important modelling alternatives for analyzing response data in situations where multivariate classifications of respondents are made on the basis of multiple postulated latent skills. In this review paper, a definitional boundary of the space of DCM is developed, core DCM within this space are reviewed, and their defining features are compared and contrasted with those of other latent variable models. The models to which DCM are compared include unrestricted latent class models, multidimensional factor analysis models, and multidimensional item response theory models. Attention is paid to both statistical considerations of model structure, as well as substantive considerations of model use.

273 citations


Journal ArticleDOI
TL;DR: This model integrates an advanced item response theory measurement model with a structural hierarchical model for studying antecedents of ERS and improves on existing procedures by allowing different items to be differentially useful for measuring ERS.
Abstract: Extreme response style (ERS) is an important threat to the validity of survey-based marketing research In this article, the authors present a new item response theory–based model for measuring ERS This model contributes to the ERS literature in two ways First, the method improves on existing procedures by allowing different items to be differentially useful for measuring ERS and by accommodating the possibility that an item's usefulness differs across groups (eg, countries) Second, the model integrates an advanced item response theory measurement model with a structural hierarchical model for studying antecedents of ERS The authors simultaneously estimate a person's ERS score and individual- and group-level (country) drivers of ERS Through simulations, they show that the new method improves on traditional procedures They further apply the model to a large data set consisting of 12,506 consumers from 26 countries on four continents The findings show that the model extensions are necessar

260 citations


Journal ArticleDOI
TL;DR: It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF.
Abstract: It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF.

205 citations


Journal ArticleDOI
TL;DR: In this paper, the relations among several alternative parameterizations of the binary factor analysis model and the 2-parameter item response theory model are discussed, and general formulas are provided.
Abstract: The relations among several alternative parameterizations of the binary factor analysis model and the 2-parameter item response theory model are discussed. It is pointed out that different parameterizations of factor analysis model parameters can be transformed into item response model theory parameters, and general formulas are provided. Illustrative data analysis is provided to demonstrate the transformations.

197 citations


Journal ArticleDOI
TL;DR: In this article, the development of a theoretically based, empirically tested instrument designed to measure the mathematical knowledge and skills of children from three to seven years of age, emphasising its submission to the Rasch model, was described.
Abstract: There are only a few instruments to assess mathematics knowledge and skills in children as young as three to four years of age, and these instruments are limited in scope of content. We describe the development of a theoretically based, empirically tested instrument designed to measure the mathematical knowledge and skills of children from three to seven years of age, emphasising its submission to the Rasch model. After using the data to refine the instrument, they fit the model well, with high reliability. These data also provided empirical support for the developmental progressions for most topics. We conclude with a description of the research’s contribution to theory and empirical research regarding young children’s development of specific mathematical competencies.

183 citations


Journal ArticleDOI
TL;DR: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden.
Abstract: Objective: This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments. Methods: Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing. Results: Tests of competing models based on item response theory supported the scale’s bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients—one with bipolar disorder and one without—on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT. Conclusions: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden. (Psychiatric Services 59:361–368, 2008)

161 citations


Journal ArticleDOI
TL;DR: Results show that it is possible to discriminate between latent class models and factor models even if responses are categorical, and testing for class invariance of parameters is important in the context of measurement invariance and when using mixture models to approximate nonnormal distributions.
Abstract: Factor mixture models (FMM's) are latent variable models with categorical and continuous latent variables which can be used as a model-based approach to clustering. A previous paper covered the results of a simulation study showing that in the absence of model violations, it is usually possible to choose the correct model when fitting a series of models with different numbers of classes and factors within class. The response format in the first study was limited to normally distributed outcomes. The current paper has two main goals, firstly, to replicate parts of the first study with 5-point Likert scale and binary outcomes, and secondly, to address the issue of testing class invariance of thresholds and loadings. Testing for class invariance of parameters is important in the context of measurement invariance and when using mixture models to approximate non-normal distributions. Results show that it is possible to discriminate between latent class models and factor models even if responses are categorical. Comparing models with and without class-specific parameters can lead to incorrectly accepting parameter invariance if the compared models differ substantially with respect to the number of estimated parameters. The simulation study is complemented with an illustration of a factor mixture analysis of ten binary depression items obtained from a female subsample of the Virginia Twin Registry.

Journal ArticleDOI
TL;DR: This paper studies three models for cognitive diagnosis, each illustrated with an application to fraction subtraction data, and employs Markov chain Monte Carlo algorithms to fit the models and presents simulation results to examine the performance of these algorithms.
Abstract: This paper studies three models for cognitive diagnosis, each illustrated with an application to fraction subtraction data. The objective of each of these models is to classify examinees according to their mastery of skills assumed to be required for fraction subtraction. We consider the DINA model, the NIDA model, and a new model that extends the DINA model to allow for multiple strategies of problem solving. For each of these models the joint distribution of the indicators of skill mastery is modeled using a single continuous higher-order latent trait, to explain the dependence in the mastery of distinct skills. This approach stems from viewing the skills as the specific states of knowledge required for exam performance, and viewing these skills as arising from a broadly defined latent trait resembling the θ of item response models. We discuss several techniques for comparing models and assessing goodness of fit. We then implement these methods using the fraction subtraction data with the aim of selecting the best of the three models for this application. We employ Markov chain Monte Carlo algorithms to fit the models, and we present simulation results to examine the performance of these algorithms.

Journal ArticleDOI
TL;DR: This paper describes classical latent variable models such as factor analysis, item response theory, latent class models and structural equation models and their usefulness in medical research is demonstrated using real data.
Abstract: Latent variable models are commonly used in medical statistics, although often not referred to under this name. In this paper we describe classical latent variable models such as factor analysis, item response theory, latent class models and structural equation models. Their usefulness in medical research is demonstrated using real data. Examples include measurement of forced expiratory flow, measurement of physical disability, diagnosis of myocardial infarction and modelling the determinants of clients' satisfaction with counsellors' interviews.

Journal ArticleDOI
TL;DR: The authors present findings from the analysis of repeated measures of internalizing symptomatology that were pooled from three existing developmental studies and describe and demonstrate each step in the analysis and conclude with a discussion of potential limitations and directions for future research.
Abstract: There are a number of significant challenges researchers encounter when studying development over an extended period of time, including subject attrition, the changing of measurement structures across groups and developmental periods, and the need to invest substantial time and money. Integrative data analysis is an emerging set of methodologies that allows researchers to overcome many of the challenges of single-sample designs through the pooling of data drawn from multiple existing developmental studies. This approach is characterized by a host of advantages, but this also introduces several new complexities that must be addressed prior to broad adoption by developmental researchers. In this article, the authors focus on methods for fitting measurement models and creating scale scores using data drawn from multiple longitudinal studies. The authors present findings from the analysis of repeated measures of internalizing symptomatology that were pooled from three existing developmental studies. The authors describe and demonstrate each step in the analysis and conclude with a discussion of potential limitations and directions for future research.

Journal ArticleDOI
TL;DR: In this article, the dimensionality of the German version of Rosenberg's Self-Esteem scale (RSES) was analyzed in a nationally representative population sample of 4,988 subjects (46.4% males; aged 14-92 years).
Abstract: This study analyzed the dimensionality of the German version of Rosenberg’s Self-Esteem scale (RSES) in a nationally representative population sample of 4,988 subjects (46.4% males; aged 14–92 years). Using confirmatory factor analysis, one- and two-dimensional models were tested. Results suggest that the RSES is a two-dimensional scale comprising the highly correlated components positive and negative self-evaluation, which constitute a unitary construct of global self-esteem at the second-order level. In order to obtain a more conclusive solution, an item response theory (IRT) analysis (partial credit model) was conducted. Results lend support to a one-dimensional view of the RSES. Furthermore, psychometric properties and norm values based on the representative sample are reported. Analyses revealed extremely high response probabilities for all items, as a consequence of which self-esteem cannot be differentiated at the upper end of the range.

Journal ArticleDOI
TL;DR: In this paper, the utility of S-X2 to polytomous IRT models, including the generalized partial credit model, partial credit models, and rating scale model, was investigated in terms of empirical Type I error rates and power.
Abstract: Orlando and Thissen's S-X2item fit index has performed better than traditional item fit statistics such as Yen's Q1 and McKinley and Mill's G2 for dichotomous item response theory (IRT) models. This study extends the utility of S-X2to polytomous IRT models, including the generalized partial credit model, partial credit model, and rating scale model. The performance of the generalized S-X2in assessing item model fit was studied in terms of empirical Type I error rates and power and compared to G2. The results suggest that the generalized S-X2is promising for polytomous items in educational and psychological testing programs.

Journal ArticleDOI
TL;DR: Several topics relevant to the measurement of cognitive abilities across groups from diverse ancestral origins are considered, including fairness and bias, equivalence, diagnostic validity, item response theory, and differential item functioning.
Abstract: The measurement of cognitive abilities across diverse cultural, racial, and ethnic groups has a contentious history, with broad political, legal, economic, and ethical repercussions Advances in psychometric methods and converging scientific ideas about genetic variation afford new tools and theoretical contexts to move beyond the reflective analysis of between-group test score discrepancies Neuropsychology is poised to benefit from these advances to cultivate a richer understanding of the factors that underlie cognitive test score disparities To this end, the present article considers several topics relevant to the measurement of cognitive abilities across groups from diverse ancestral origins, including fairness and bias, equivalence, diagnostic validity, item response theory, and differential item functioning

Journal ArticleDOI
TL;DR: Cocalibration allows direct comparison of cognitive functioning in studies using any of these four tests, and standard scoring appears to be a poor choice for analysis of longitudinal cognitive testing data.

Journal ArticleDOI
TL;DR: This research used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the ability and speed parameters in the population of test takers to retrofit an empirical prior distribution for the ability parameter on each occurrence of a new response time.
Abstract: Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the ability and speed parameters in the population of test takers. The framework allows the author to retrofit an empirical prior distribution for the ability parameter on each occurrence of a new response time. In an example with an adaptive version of the Law School Admission Test (LSAT), the author shows how this additional update of the posterior distribution of the ability leads to a substantial improvement of the ability estimator. Two ways of applying the procedure in real-world adaptive testing are discussed.

Journal ArticleDOI
TL;DR: Exploring the benefits of examiner training and employing “true” scores generated using Item Response Theory analyses prior to making pass/fail decisions are recommended.
Abstract: Physician-patient communication is a clinical skill that can be learned and has a positive impact on patient satisfaction and health outcomes. A concerted effort at all medical schools is now directed at teaching and evaluating this core skill. Student communication skills are often assessed by an Objective Structure Clinical Examination (OSCE). However, it is unknown what sources of error variance are introduced into examinee communication scores by various OSCE components. This study primarily examined the effect different examiners had on the evaluation of students' communication skills assessed at the end of a family medicine clerkship rotation. The communication performance of clinical clerks from Classes 2005 and 2006 were assessed using six OSCE stations. Performance was rated at each station using the 28-item Calgary-Cambridge guide. Item Response Theory analysis using a Multifaceted Rasch model was used to partition the various sources of error variance and generate a "true" communication score where the effects of examiner, case, and items are removed. Variance and reliability of scores were as follows: communication scores (.20 and .87), examiner stringency/leniency (.86 and .91), case (.03 and .96), and item (.86 and .99), respectively. All facet scores were reliable (.87-.99). Examiner variance (.86) was more than four times the examinee variance (.20). About 11% of the clerks' outcome status shifted using "true" rather than observed/raw scores. There was large variability in examinee scores due to variation in examiner stringency/leniency behaviors that may impact pass-fail decisions. Exploring the benefits of examiner training and employing "true" scores generated using Item Response Theory analyses prior to making pass/fail decisions are recommended.

Journal ArticleDOI
TL;DR: The supplemented EM (SEM) algorithm is applied to address two goodness-of-fit testing problems in psychometrics and provides a convenient computational procedure that leads to an asymptotically chi-squared goodness- of-fit statistic for the 'two-stage EM' procedure of fitting covariance structure models in the presence of missing data.
Abstract: The supplemented EM (SEM) algorithm is applied to address two goodness-of-fit testing problems in psychometrics. The first problem involves computing the information matrix for item parameters in item response theory models. This matrix is important for limited-information goodness-of-fit testing and it is also used to compute standard errors for the item parameter estimates. For the second problem, it is shown that the SEM algorithm provides a convenient computational procedure that leads to an asymptotically chi-squared goodness-of-fit statistic for the ‘two-stage EM’ procedure of fitting covariance structure models in the presence of missing data. Both simulated and real data are used to illustrate the proposed procedures.

Journal ArticleDOI
TL;DR: Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation, and recommended investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.
Abstract: We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985)Self-Perception Profile for Children in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.

Journal ArticleDOI
TL;DR: In this paper, a number of data imputation methods have been developed outside of the Item Response Theory (IRT) framework and have been shown to be effective tools for dealing with missing data.
Abstract: Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same time, a number of data imputation methods have been developed outside of the IRT framework and been shown to be effective tools for dealing with missing data. The current study takes several of these methods that have been found to be useful in other contexts and investigates their performance with IRT data that contain missing values. Through a simulation study, it is shown that these methods exhibit varying degrees of effectiveness in terms of imputing data that in turn produce accurate sample estimates of item difficulty and discrimination parameters. Psychometricians and other measurement professionals are familiar with the phenomenon of missing item responses for both cognitive and affective assessments. For example, examinees may leave one or more items unanswered either inadvertently or because they do not know the answer and are afraid to guess. Respondents to a questionnaire might feel inhibited in answering items dealing with a sensitive topic, leading to missing data. Much research has been conducted regarding the im

Journal ArticleDOI
TL;DR: This paper applied mixture item response theory (IRT) models to personality tests and found that a three-class mixture version of the nominal response model was the best fitting model for the Extroversion and Neuroticism scales of the Amsterdam Biographical Questionnaire.
Abstract: Mixture item response theory (IRT) models aid the interpretation of response behavior on personality tests and may provide possibilities for improving prediction. Heterogeneity in the population is modeled by identifying homogeneous subgroups that conform to different measurement models. In this study, mixture IRT models were applied to the Extroversion and Neuroticism scales of the Amsterdam Biographical Questionnaire, and a three-class mixture version of the nominal response model was identified as the best fitting model. The latent classes differed with respect to social desirability and ethnic background. Within latent classes, response tendencies demonstrated a differential use of the ``?'' category. An important issue is whether applying mixture IRT models results in a better prediction of relevant external criteria compared to a one-class model. For the Neuroticism scale the prediction improved, but not for the Extraversion scale. The results demonstrate the possible advantage of applying mixture I...

Journal ArticleDOI
TL;DR: The 30-item Fugl-Meyer assessment shows a longitudinally stable item difficulty order and is valid for measuring volitional arm motor ability over time and had no practical consequences on the longitudinal measurement of person ability.

Journal ArticleDOI
TL;DR: In this article, the authors analyse the psychometric properties of the PSP-scale by means of the Rasch model, with a focus on the operating characteristics of the items, and show that the scale adequately meets measurement criteria of invariance and proper categorisation of items.
Abstract: The PsychoSomatic Problems (PSP)-scale is built upon eight items intended to tap information about psychosomatic problems among schoolchildren and adolescents in general populations. The purpose of the study is to analyse the psychometric properties of the PSP-scale by means of the Rasch model, with a focus on the operating characteristics of the items. Cross-sectional adolescent data collected in Sweden at six points in time between 1988 and 2005 are used for the analysis. In all more than 15,000 students aged 15–16 are included in the analysis. Data were examined with respect to invariance across the latent trait, Differential Item Functioning (DIF), item categorisation and unidimensionality. The results show that the PSP-scale adequately meets measurement criteria of invariance and proper categorisation of the items. Also the targeting is good and the reliability is high. Since the scale works invariantly across years of investigation it is appropriate for re-current monitoring of psychosomatic health complaints in general populations of adolescents. Taking DIF into account through principles of equating provides a scale that shows no statistically significant signs of gender-DIF enabling invariant comparisons also between boys and girls.


Journal ArticleDOI
TL;DR: The results from simulation studies as well as actual data suggest that IRT-based models with continuous latent traits can be developed and compared with the unidimensional IRT model, the proposed models better describe the actual data.
Abstract: As item response models gain increased popularity in large-scale educational and measurement testing situations, many studies have been conducted on the development and applications of unidimensional and multidimensional models. Recently, attention has been paid to IRT-based models with an overall ability dimension underlying several ability dimensions specific for individual test items, where the focus is mainly on models with dichotomous latent traits. The purpose of this study is to propose such models with continuous latent traits under the Bayesian framework. The proposed models are further compared with the conventional IRT models using Bayesian model choice techniques. The results from simulation studies as well as actual data suggest that (a) such models can be developed; (b) compared with the unidimensional IRT model, the proposed models better describe the actual data; and (c) the use of the proposed IRT models and the multiunidimensional model should be based on different beliefs about the unde...

Journal ArticleDOI
TL;DR: In this paper, a combination of two item response theory (IRT) models is used for the observed response data and one for the missing data indicator, which is modeled using a sequential model with linear restrictions on the item parameters.
Abstract: In tests with time limits, items at the end are often not reached. Usually, the pattern of missing responses depends on the ability level of the respondents; therefore, missing data are not ignorable in statistical inference. This study models data using a combination of two item response theory (IRT) models: one for the observed response data and one for the missing data indicator. The missing data indicator is modeled using a sequential model with linear restrictions on the item parameters. The models are connected by the assumption that the respondents' latent proficiency parameters have a joint multivariate normal distribution. Model parameters are estimated by maximum marginal likelihood. Simulations show that treating missing data as ignorable can lead to considerable bias in parameter estimates. Including an IRT model for the missing data indicator removes this bias. The method is illustrated with data from an intelligence test with a time limit.

Journal ArticleDOI
TL;DR: This study uses Item Response Theory (IRT) methods to evaluate the range of the latent trait assessed with a normal personality measure and a measure of psychopathy as one example of an abnormal personality construct, and finds that the measures overlapped substantially in terms of the regions ofThe latent trait for which they provide information.
Abstract: Correlational and factor-analytic methods indicate that abnormal and normal personality constructs may be tapping the same underlying latent trait. However, they do not systematically demonstrate that measures of abnormal personality capture more extreme ranges of the latent trait than measures of normal range personality. Item Response Theory (IRT) methods, in contrast, do provide this information. In the present study, we use IRT methods to evaluate the range of the latent trait assessed with a normal personality measure and a measure of psychopathy as one example of an abnormal personality construct. Contrary to the expectation that the measure of psychopathy would be more extreme than the measure of normal personality traits, the measures overlapped substantially in terms of the regions of the latent trait for which they provide information. Moreover, both types of inventories were limited in terms of measurement bandwidth, such that they did not provide information across the entire latent trait continuum. Implications and future directions are discussed.