scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 2010"


Journal ArticleDOI
TL;DR: In this paper, Neudecker et al. used the implicit function theorem to develop an improved scaling correction leading to a new scaled difference test statistic that avoids negative chi-square values.
Abstract: A scaled difference test statistic \(\tilde{T}{}_{d}\) that can be computed from standard software of structural equation models (SEM) by hand calculations was proposed in Satorra and Bentler (Psychometrika 66:507–514, 2001). The statistic \(\tilde{T}_{d}\) is asymptotically equivalent to the scaled difference test statistic \(\bar{T}_{d}\) introduced in Satorra (Innovations in Multivariate Statistical Analysis: A Festschrift for Heinz Neudecker, pp. 233–247, 2000), which requires more involved computations beyond standard output of SEM software. The test statistic \(\tilde{T}_{d}\) has been widely used in practice, but in some applications it is negative due to negativity of its associated scaling correction. Using the implicit function theorem, this note develops an improved scaling correction leading to a new scaled difference statistic \(\bar{T}_{d}\) that avoids negative chi-square values.

1,281 citations


Journal ArticleDOI
TL;DR: It is shown that when the dimensionality is high, MH-RM has advantages over existing methods such as numerical quadrature based EM algorithm.
Abstract: A Metropolis–Hastings Robbins–Monro (MH-RM) algorithm for high-dimensional maximum marginal likelihood exploratory item factor analysis is proposed. The sequence of estimates from the MH-RM algorithm converges with probability one to the maximum likelihood solution. Details on the computer implementation of this algorithm are provided. The accuracy of the proposed algorithm is demonstrated with simulations. As an illustration, the proposed algorithm is applied to explore the factor structure underlying a new quality of life scale for children. It is shown that when the dimensionality is high, MH-RM has advantages over existing methods such as numerical quadrature based EM algorithm. Extensions of the algorithm to other modeling frameworks are discussed.

241 citations


Journal ArticleDOI
TL;DR: A two-tier item factor analysis model is developed that reduces the dimensionality of the latent variable space, and consequently significant computational savings, and an EM algorithm for full-information maximum marginal likelihood estimation is developed.
Abstract: Motivated by Gibbons et al.’s (Appl. Psychol. Meas. 31:4–19, 2007) full-information maximum marginal likelihood item bifactor analysis for polytomous data, and Rijmen, Vansteelandt, and De Boeck’s (Psychometrika 73:167–182, 2008) work on constructing computationally efficient estimation algorithms for latent variable models, a two-tier item factor analysis model is developed in this research. The modeling framework subsumes standard multidimensional IRT models, bifactor IRT models, and testlet response theory models as special cases. Features of the model lead to a reduction in the dimensionality of the latent variable space, and consequently significant computational savings. An EM algorithm for full-information maximum marginal likelihood estimation is developed. Simulations and real data demonstrations confirm the accuracy and efficiency of the proposed methods. Three real data sets from a large-scale educational assessment, a longitudinal public health survey, and a scale development study measuring patient reported quality of life outcomes are analyzed as illustrations of the model’s broad range of applicability.

226 citations


Journal ArticleDOI
TL;DR: A modification of the proposed normal-theory Hawkins test for complete data is proposed to improve its performance, and its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete.
Abstract: Test of homogeneity of covariances (or homoscedasticity) among several groups has many applications in statistical analysis. In the context of incomplete data analysis, tests of homoscedasticity among groups of cases with identical missing data patterns have been proposed to test whether data are missing completely at random (MCAR). These tests of MCAR require large sample sizes n and/or large group sample sizes n(i), and they usually fail when applied to non-normal data. Hawkins (1981) proposed a test of multivariate normality and homoscedasticity that is an exact test for complete data when n(i) are small. This paper proposes a modification of this test for complete data to improve its performance, and extends its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete. Moreover, it is shown that the statistic used in the Hawkins test in conjunction with a nonparametric k-sample test can be used to obtain a nonparametric test of homoscedasticity that works well for both normal and non-normal data. It is explained how a combination of the proposed normal-theory Hawkins test and the nonparametric test can be employed to test for homoscedasticity, MCAR, and multivariate normality. Simulation studies show that the newly proposed tests generally outperform their existing competitors in terms of Type I error rejection rates. Also, a power study of the proposed tests indicates good power. The proposed methods use appropriate missing data imputations to impute missing data. Methods of multiple imputation are described and one of the methods is employed to confirm the result of our single imputation methods. Examples are provided where multiple imputation enables one to identify a group or groups whose covariance matrices differ from the majority of other groups.

133 citations


Journal ArticleDOI
TL;DR: In this article, a multidimensional item response theory (MIRT) model is fitted using a stabilized Newton-Raphson algorithm, and a new statistical approach is proposed to assess when subscores using the MIRT model have any added value over the total score or the subscores based on classical test theory.
Abstract: Recently, there has been increasing interest in reporting subscores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models (e.g., Reckase in Appl. Psychol. Meas. 21:25–36, 1997; C.R. Rao and S. Sinharay (Eds), Handbook of Statistics, vol. 26, pp. 607–642, North-Holland, Amsterdam, 2007; Beguin & Glas in Psychometrika, 66:471–488, 2001). A MIRT model is fitted using a stabilized Newton–Raphson algorithm (Haberman in The Analysis of Frequency Data, University of Chicago Press, Chicago, 1974; Sociol. Methodol. 18:193–211, 1988) with adaptive Gauss–Hermite quadrature (Haberman, von Davier, & Lee in ETS Research Rep. No. RR-08-45, ETS, Princeton, 2008). A new statistical approach is proposed to assess when subscores using the MIRT model have any added value over (i) the total score or (ii) subscores based on classical test theory (Haberman in J. Educ. Behav. Stat. 33:204–229, 2008; Haberman, Sinharay, & Puhan in Br. J. Math. Stat. Psychol. 62:79–95, 2008). The MIRT-based methods are applied to several operational data sets. The results show that the subscores based on MIRT are slightly more accurate than subscore estimates derived by classical test theory.

92 citations


Journal ArticleDOI
TL;DR: In this paper, three plausible assumptions of conditional independence in a hierarchical model for responses and response times on test items are identified, and a Lagrange multiplier test of the null hypothesis against a parametric alternative is derived.
Abstract: Three plausible assumptions of conditional independence in a hierarchical model for responses and response times on test items are identified. For each of the assumptions, a Lagrange multiplier test of the null hypothesis of conditional independence against a parametric alternative is derived. The tests have closed-form statistics that are easy to calculate from the standard estimates of the person parameters in the model. In addition, simple closed-form estimators of the parameters under the alternatives of conditional dependence are presented, which can be used to explore model modification. The tests were applied to a data set from a large-scale computerized exam and showed excellent power to detect even minor violations of conditional independence.

83 citations


Journal ArticleDOI
TL;DR: A broad class of semiparametric Bayesian SEMs, which allow mixed categorical and continuous manifest variables while also allowing the latent variables to have unknown distributions is proposed, based on centered Dirichlet process and CDP mixture models.
Abstract: Structural equation models (SEMs) with latent variables are widely useful for sparse covariance structure modeling and for inferring relationships among latent variables. Bayesian SEMs are appealing in allowing for the incorporation of prior information and in providing exact posterior distributions of unknowns, including the latent variables. In this article, we propose a broad class of semiparametric Bayesian SEMs, which allow mixed categorical and continuous manifest variables while also allowing the latent variables to have unknown distributions. In order to include typical identifiability restrictions on the latent variable distributions, we rely on centered Dirichlet process (CDP) and CDP mixture (CDPM) models. The CDP will induce a latent class model with an unknown number of classes, while the CDPM will induce a latent trait model with unknown densities for the latent traits. A simple and efficient Markov chain Monte Carlo algorithm is developed for posterior computation, and the methods are illustrated using simulated examples, and several applications.

82 citations


Journal ArticleDOI
TL;DR: A novel combination of various Markov chain Monte Carlo estimation routines to estimate parameters of a wide variety of confirmatory item factor analysis models and it is demonstrated that it is possible to obtain accurate parameter estimates using MCMC in a relatively user-friendly package.
Abstract: Item factor analysis has a rich tradition in both the structural equation modeling and item response theory frameworks. The goal of this paper is to demonstrate a novel combination of various Markov chain Monte Carlo (MCMC) estimation routines to estimate parameters of a wide variety of confirmatory item factor analysis models. Further, I show that these methods can be implemented in a flexible way which requires minimal technical sophistication on the part of the end user. After providing an overview of item factor analysis and MCMC, results from several examples (simulated and real) will be discussed. The bulk of these examples focus on models that are problematic for current “gold-standard” estimators. The results demonstrate that it is possible to obtain accurate parameter estimates using MCMC in a relatively user-friendly package.

77 citations


Journal ArticleDOI
TL;DR: It is shown how quadratic-form statistics can be constructed that are more powerful than X2 and yet, have approximate chi-square null distribution in finite samples with large models.
Abstract: Maydeu-Olivares and Joe (J. Am. Stat. Assoc. 100:1009–1020, 2005; Psychometrika 71:713–732, 2006) introduced classes of chi-square tests for (sparse) multidimensional multinomial data based on low-order marginal proportions. Our extension provides general conditions under which quadratic forms in linear functions of cell residuals are asymptotically chi-square. The new statistics need not be based on margins, and can be used for one-dimensional multinomials. We also provide theory that explains why limited information statistics have good power, regardless of sparseness. We show how quadratic-form statistics can be constructed that are more powerful than X2 and yet, have approximate chi-square null distribution in finite samples with large models. Examples with models for truncated count data and binary item response data are used to illustrate the theory.

58 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a nested logit item response model for multiple-choice data, where the application of a solution strategy precedes consideration of response options. But this model does not accommodate collapsibility across distractor categories, making it easier to allow decisions about including distractor information to occur on an item-by-item or application-byapplication basis.
Abstract: Nested logit item response models for multiple-choice data are presented. Relative to previous models, the new models are suggested to provide a better approximation to multiple-choice items where the application of a solution strategy precedes consideration of response options. In practice, the models also accommodate collapsibility across all distractor categories, making it easier to allow decisions about including distractor information to occur on an item-by-item or application-by-application basis without altering the statistical form of the correct response curves. Marginal maximum likelihood estimation algorithms for the models are presented along with simulation and real data analyses.

52 citations


Journal ArticleDOI
TL;DR: In this article, a general theory on the use of correlation weights in linear prediction has been proposed and the conditions under which correlation weights perform well in population regression models using OLS weights as a comparison, defined cases in which the two weighting systems yield maximally correlated composites and when they yield minimally similar weights.
Abstract: A general theory on the use of correlation weights in linear prediction has yet to be proposed In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models Using OLS weights as a comparison, we define cases in which the two weighting systems yield maximally correlated composites and when they yield minimally similar weights We then derive the least squares weights (for any set of predictors) that yield the largest drop in R 2 (the coefficient of determination) when switching to correlation weights Our findings suggest that two characteristics of a model/data combination are especially important in determining the effectiveness of correlation weights: (1) the condition number of the predictor correlation matrix, R xx , and (2) the orientation of the correlation weights to the latent vectors of R xx

Journal ArticleDOI
TL;DR: In this article, a conditional, pairwise, pseudo-likelihood is derived for any number of persons which are independent of all item parameters and of the maximum scores of all items.
Abstract: Rasch models are characterised by sufficient statistics for all parameters. In the Rasch unidimensional model for two ordered categories, the parameterisation of the person and item is symmetrical and it is readily established that the total scores of a person and item are sufficient statistics for their respective parameters. In contrast, in the unidimensional polytomous Rasch model for more than two ordered categories, the parameterisation is not symmetrical. Specifically, each item has a vector of item parameters, one for each category, and each person only one person parameter. In addition, different items can have different numbers of categories and, therefore, different numbers of parameters. The sufficient statistic for the parameters of an item is itself a vector. In estimating the person parameters in presently available software, these sufficient statistics are not used to condition out the item parameters. This paper derives a conditional, pairwise, pseudo-likelihood and constructs estimates of the parameters of any number of persons which are independent of all item parameters and of the maximum scores of all items. It also shows that these estimates are consistent. Although Rasch’s original work began with equating tests using test scores, and not with items of a test, the polytomous Rasch model has not been applied in this way. Operationally, this is because the current approaches, in which item parameters are estimated first, cannot handle test data where there may be many scores with zero frequencies. A small simulation study shows that, when using the estimation equations derived in this paper, such a property of the data is no impediment to the application of the model at the level of tests. This opens up the possibility of using the polytomous Rasch model directly in equating test scores.

Journal ArticleDOI
TL;DR: In this paper, it was shown that Bennett et al. S is an upper bound of Cohen's κ if the k×k table is weakly marginal symmetric, where S is a function of the marginal probabilities.
Abstract: The paper presents inequalities between four descriptive statistics that can be expressed in the form [P−E(P)]/[1−E(P)], where P is the observed proportion of agreement of a k×k table with identical categories, and E(P) is a function of the marginal probabilities. Scott’s π is an upper bound of Goodman and Kruskal’s λ and a lower bound of both Bennett et al. S and Cohen’s κ. We introduce a concept for the marginal probabilities of the k×k table called weak marginal symmetry. Using the rearrangement inequality, it is shown that Bennett et al. S is an upper bound of Cohen’s κ if the k×k table is weakly marginal symmetric.

Journal ArticleDOI
TL;DR: In this work, an extension of GSCA is proposed to effectively deal with various types of interactions among latent variables, and it can easily accommodate both exogenous and endogenous latent interactions.
Abstract: Generalized structured component analysis (GSCA) is a component-based approach to structural equation modeling. In practice, researchers may often be interested in examining the interaction effects of latent variables. However, GSCA has been geared only for the specification and testing of the main effects of variables. Thus, an extension of GSCA is proposed to effectively deal with various types of interactions among latent variables. In the proposed method, a latent interaction is defined as a product of interacting latent variables. As a result, this method does not require the construction of additional indicators for latent interactions. Moreover, it can easily accommodate both exogenous and endogenous latent interactions. An alternating least-squares algorithm is developed to minimize a single optimization criterion for parameter estimation. A Monte Carlo simulation study is conducted to investigate the parameter recovery capability of the proposed method. An application is also presented to demonstrate the empirical usefulness of the proposed method.

Journal ArticleDOI
TL;DR: Optimal design theory was applied and different error structures were used within a general linear model for the analysis of fMRI data, and the maximin criterion was applied to find designs which are robust against misspecification of model parameters.
Abstract: Blocked designs in functional magnetic resonance imaging (fMRI) are useful to localize functional brain areas. A blocked design consists of different blocks of trials of the same stimulus type and is characterized by three factors: the length of blocks, i.e., number of trials per blocks, the ordering of task and rest blocks, and the time between trials within one block. Optimal design theory was applied to find the optimal combination of these three design factors. Furthermore, different error structures were used within a general linear model for the analysis of fMRI data, and the maximin criterion was applied to find designs which are robust against misspecification of model parameters.

Journal ArticleDOI
TL;DR: It is found that time delay embedding, i.e., structuring data prior to analysis by constructing a data matrix of overlapping samples, increases the precision of parameter estimates and in turn statistical power compared to standard independent rows of panel data.
Abstract: This paper investigates the precision of parameters estimated from local samples of time dependent functions. We find that time delay embedding, i.e., structuring data prior to analysis by constructing a data matrix of overlapping samples, increases the precision of parameter estimates and in turn statistical power compared to standard independent rows of panel data. We show that the reason for this effect is that the sign of estimation bias depends on the position of a misplaced data point if there is no a priori knowledge about initial conditions of the time dependent function. Hence, we reason that the advantage of time delayed embedding is likely to hold true for a wide variety of functions. We support these conclusions both by mathematical analysis and two simulations.

Journal ArticleDOI
TL;DR: Simulation study shows that both types of SE estimates are very good when θ is in the middle range of the latent trait distribution, but the upward-corrected SEs are more accurate than the traditional ones whenθ takes more extreme values.
Abstract: In this paper we propose an upward correction to the standard error (SE) estimation of θ(ML), the maximum likelihood (ML) estimate of the latent trait in item response theory (IRT). More specifically, the upward correction is provided for the SE of θ(ML) when item parameter estimates obtained from an independent pretest sample are used in IRT scoring. When item parameter estimates are employed, the resulting latent trait estimate is called pseudo maximum likelihood (PML) estimate. Traditionally the SE of θ(ML) is obtained on the basis of test information only, as if the item parameters are known. The upward correction takes into account the error that is carried over from the estimation of item parameters, in addition to the error in latent trait recovery itself. Our simulation study shows that both types of SE estimates are very good when θ is in the middle range of the latent trait distribution, but the upward-corrected SEs are more accurate than the traditional ones when θ takes more extreme values.

Journal ArticleDOI
TL;DR: In this paper, a class of finite mixture multilevel multidimensional ordinal IRT models for large scale cross-cultural research is proposed for confirmatory research settings, where item parameters are a mixture distribution to accommodate situations where different groups of countries have different measurement operations.
Abstract: We present a class of finite mixture multilevel multidimensional ordinal IRT models for large scale cross-cultural research. Our model is proposed for confirmatory research settings. Our prior for item parameters is a mixture distribution to accommodate situations where different groups of countries have different measurement operations, while countries within these groups are still allowed to be heterogeneous. A simulation study is conducted that shows that all parameters can be recovered. We also apply the model to real data on the two components of affective subjective well-being: positive affect and negative affect. The psychometric behavior of these two scales is studied in 28 countries across four continents.

Journal ArticleDOI
TL;DR: In this article, the authors show that a broad class of polytomous IRT models have a weaker form of SOL, denoted weak SOL, and argue that weak SOL justifies ordering respondents on the latent trait using the total test score and, therefore, the use of nonparametric polygons for ordinal measurement.
Abstract: In contrast to dichotomous item response theory (IRT) models, most well-known polytomous IRT models do not imply stochastic ordering of the latent trait by the total test score (SOL). This has been thought to make the ordering of respondents on the latent trait using the total test score questionable and throws doubt on the justifiability of using nonparametric polytomous IRT models for ordinal measurement. We show that a broad class of polytomous IRT models has a weaker form of SOL, denoted weak SOL, and argue that weak SOL justifies ordering respondents on the latent trait using the total test score and, therefore, the use of nonparametric polytomous IRT models for ordinal measurement.

Journal ArticleDOI
TL;DR: This paper analyzed the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor, and derived the upper bound on the mean squared error of this estimator and demonstrate that it has less variance than ordinary least squares estimates.
Abstract: “Improper linear models” (see Dawes, Am. Psychol. 34:571–582, 1979), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of “improper” linear models as “proper” statistical models with a single predictor. We derive the upper bound on the mean squared error of this estimator and demonstrate that it has less variance than ordinary least squares estimates. We examine common choices of the weighting vector used in the literature, e.g., single variable heuristics and equal weighting, and illustrate their performance in various test cases.

Journal ArticleDOI
TL;DR: This paper integrates the methods of adaptive design, sequential estimation, and measurement error models to solve online item calibration problems and results show that the proposed method is very promising in terms of both estimation accuracy and efficiency.
Abstract: Item calibration is an essential issue in modern item response theory based psychological or educational testing. Due to the popularity of computerized adaptive testing, methods to efficiently calibrate new items have become more important than that in the time when paper and pencil test administration is the norm. There are many calibration processes being proposed and discussed from both theoretical and practical perspectives. Among them, the online calibration may be one of the most cost effective processes. In this paper, under a variable length computerized adaptive testing scenario, we integrate the methods of adaptive design, sequential estimation, and measurement error models to solve online item calibration problems. The proposed sequential estimate of item parameters is shown to be strongly consistent and asymptotically normally distributed with a prechosen accuracy. Numerical results show that the proposed method is very promising in terms of both estimation accuracy and efficiency. The results of using calibrated items to estimate the latent trait levels are also reported.

Journal ArticleDOI
TL;DR: It is demonstrated how simple Bayesian hierarchical models can be built for several RT sequences, differentiating between subject-specific and condition-specific effects.
Abstract: Human response time (RT) data are widely used in experimental psychology to evaluate theories of mental processing. Typically, the data constitute the times taken by a subject to react to a succession of stimuli under varying experimental conditions. Because of the sequential nature of the experiments there are trends (due to learning, fatigue, fluctuations in attentional state, etc.) and serial dependencies in the data. The data also exhibit extreme observations that can be attributed to lapses, intrusions from outside the experiment, and errors occurring during the experiment. Any adequate analysis should account for these features and quantify them accurately. Recognizing that Bayesian hierarchical models are an excellent modeling tool, we focus on the elaboration of a realistic likelihood for the data and on a careful assessment of the quality of fit that it provides. We judge quality of fit in terms of the predictive performance of the model. We demonstrate how simple Bayesian hierarchical models can be built for several RT sequences, differentiating between subject-specific and condition-specific effects.

Journal ArticleDOI
TL;DR: This work used regime-switching models to probabilistically classify each individual’s time series into latent “regimes” characterized by similar error variance and dynamic patterns and found that the association between EMG signals and self-reported affect ratings did in fact vary over time.
Abstract: Facial electromyography (EMG) is a useful physiological measure for detecting subtle affective changes in real time. A time series of EMG data contains bursts of electrical activity that increase in magnitude when the pertinent facial muscles are activated. Whereas previous methods for detecting EMG activation are often based on deterministic or externally imposed thresholds, we used regime-switching models to probabilistically classify each individual’s time series into latent “regimes” characterized by similar error variance and dynamic patterns. We also allowed the association between EMG signals and self-reported affect ratings to vary between regimes and found that the relationship between these two markers did in fact vary over time. The potential utility of using regime-switching models to detect activation patterns in EMG data and to summarize the temporal characteristics of EMG activities is discussed.

Journal ArticleDOI
TL;DR: In this article, an approach to determining a practically meaningful extent of model deviation is proposed, and the approximate distribution of the Wald test is derived under the extent of the model deviation of interest.
Abstract: This paper is concerned with supplementing statistical tests for the Rasch model so that additionally to the probability of the error of the first kind (Type I probability) the probability of the error of the second kind (Type II probability) can be controlled at a predetermined level by basing the test on the appropriate number of observations. An approach to determining a practically meaningful extent of model deviation is proposed, and the approximate distribution of the Wald test is derived under the extent of model deviation of interest.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate the asymptotic accumulative standard error of equating (ASEE) for linear equating methods, including chained linear, Tucker, and Levine, under the nonequivalent groups with anchor test (NEAT) design.
Abstract: After many equatings have been conducted in a testing program, equating errors can accumulate to a degree that is not negligible compared to the standard error of measurement. In this paper, the author investigates the asymptotic accumulative standard error of equating (ASEE) for linear equating methods, including chained linear, Tucker, and Levine, under the nonequivalent groups with anchor test (NEAT) design. A recursive formula for the ASEE is provided for a series of equatings that makes use of only historical summary statistics. This formula can serve as a new tool to measure the magnitude of equating errors that have accumulated over a series of equatings, and to help monitor and design testing programs.

Journal ArticleDOI
TL;DR: In this paper, a new class of parametric models that generalize the multivariate probit model and the errors-in-variables model is developed to model and analyze ordinal data.
Abstract: A new class of parametric models that generalize the multivariate probit model and the errors-in-variables model is developed to model and analyze ordinal data. A general model structure is assumed to accommodate the information that is obtained via surrogate variables. A hybrid Gibbs sampler is developed to estimate the model parameters. To obtain a rapidly converged algorithm, the parameter expansion technique is applied to the correlation structure of the multivariate probit models. The proposed model and method of analysis are demonstrated with real data examples and simulation studies.

Journal ArticleDOI
Giles Hooker1
TL;DR: In this article, the impact of prior structure on paradoxical results in multidimensional item response theory was studied and a computationally feasible means to check whether they can occur in any given test, and demonstrate a class of prior covariance matrices that can avoid them.
Abstract: This paper presents a study of the impact of prior structure on paradoxical results in multidimensional item response theory. Paradoxical results refer to the possibility that an incorrect response could be beneficial to an examinee. We demonstrate that when three or more ability dimensions are being used, paradoxical results can be induced by using priors in which all abilities are positively correlated where they would not occur if the abilities were modeled as being independent. In the case of separable tests, we demonstrate the mathematical causes of paradoxical results, develop a computationally feasible means to check whether they can occur in any given test, and demonstrate a class of prior covariance matrices that can be guaranteed to avoid them.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the existence of paradoxical results in tests composed of item bundles when compensatory models are used and demonstrate that paradoxical result can occur when bundle effects are modeled as nuisance parameters for each subject.
Abstract: Hooker, Finkelman, and Schwartzman (Psychometrika, 2009, in press) defined a paradoxical result as the attainment of a higher test score by changing answers from correct to incorrect and demonstrated that such results are unavoidable for maximum likelihood estimates in multidimensional item response theory. The potential for these results to occur leads to the undesirable possibility of a subject's best answer being detrimental to them. This paper considers the existence of paradoxical results in tests composed of item bundles when compensatory models are used. We demonstrate that paradoxical results can occur when bundle effects are modeled as nuisance parameters for each subject. However, when these nuisance parameters are modeled as random effects, or used in a Bayesian analysis, it is possible to design tests comprised of many short bundles that avoid paradoxical results and we provide an algorithm for doing so. We also examine alternative models for handling dependence between item bundles and show that using fixed dependency effects is always guaranteed to avoid paradoxical results.


Journal ArticleDOI
TL;DR: A novel approach to tackle the complexities involved in addressing missing data and other related issues for performing CCC analysis within a longitudinal data setting is developed.
Abstract: Measures of agreement are used in a wide range of behavioral, biomedical, psychosocial, and health-care related research to assess reliability of diagnostic test, psychometric properties of instrument, fidelity of psychosocial intervention, and accuracy of proxy outcome. The concordance correlation coefficient (CCC) is a popular measure of agreement for continuous outcomes. In modern-day applications, data are often clustered, making inference difficult to perform using existing methods. In addition, as longitudinal study designs become increasingly popular, missing data have become a serious issue, and the lack of methods to systematically address this problem has hampered the progress of research in the aforementioned fields. In this paper, we develop a novel approach to tackle the complexities involved in addressing missing data and other related issues for performing CCC analysis within a longitudinal data setting. The approach is illustrated with both real and simulated data.