scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 2013"


Journal ArticleDOI
TL;DR: This work suggests a default log-gamma(2,λ) penalty with λ→0, which ensures that the maximum penalized likelihood estimate is approximately one standard error from zero when the maximum likelihood estimates is zero, thus remaining consistent with the data while being nondegenerate.
Abstract: Group-level variance estimates of zero often arise when fitting multilevel or hierarchical linear models, especially when the number of groups is small. For situations where zero variances are implausible a priori, we propose a maximum penalized likelihood approach to avoid such boundary estimates. This approach is equivalent to estimating variance parameters by their posterior mode, given a weakly informative prior distribution. By choosing the penalty from the log-gamma family with shape parameter greater than 1, we ensure that the estimated variance will be positive. We suggest a default log-gamma(2,λ) penalty with λ → 0, which ensures that the maximum penalized likelihood estimate is approximately one standard error from zero when the maximum likelihood estimate is zero, thus remaining consistent with the data while being nondegenerate. We also show that the maximum penalized likelihood estimator with this default penalty is a good approximation to the posterior median obtained under a noninformative prior.Our default method provides better estimates of model parameters and standard errors than the maximum likelihood or the restricted maximum likelihood estimators. The log-gamma family can also be used to convey substantive prior information. In either case-pure penalization or prior information-our recommended procedure gives nondegenerate estimates and in the limit coincides with maximum likelihood as the number of groups increases.

334 citations


Journal ArticleDOI
TL;DR: This work introduces and compares four approaches to dealing with missing data in mediation analysis including listwise delete, pairwise deletion, multiple imputation (MI), and a two-stage maximum likelihood (TS-ML) method.
Abstract: Despite wide applications of both mediation models and missing data techniques, formal discussion of mediation analysis with missing data is still rare. We introduce and compare four approaches to dealing with missing data in mediation analysis including listwise deletion, pairwise deletion, multiple imputation (MI), and a two-stage maximum likelihood (TS-ML) method. An R package bmem is developed to implement the four methods for mediation analysis with missing data in the structural equation modeling framework, and two real examples are used to illustrate the application of the four methods. The four methods are evaluated and compared under MCAR, MAR, and MNAR missing data mechanisms through simulation studies. Both MI and TS-ML perform well for MCAR and MAR data regardless of the inclusion of auxiliary variables and for AV-MNAR data with auxiliary variables. Although listwise deletion and pairwise deletion have low power and large parameter estimation bias in many studied conditions, they may provide useful information for exploring missing mechanisms.

68 citations


Journal ArticleDOI
TL;DR: How fMRI data is analyzed at each of these levels is described and the noise sources introduced at each level are described.
Abstract: Functional magnetic resonance imaging (fMRI) is a noninvasive method for measuring brain function by correlating temporal changes in local cerebral blood oxygenation with behavioral measures. fMRI is used to study individuals at single time points, across multiple time points (with or without intervention), as well as to examine the variation of brain function across normal and ill populations. fMRI may be collected at multiple sites and then pooled into a single analysis. This paper describes how fMRI data is analyzed at each of these levels and describes the noise sources introduced at each level.

56 citations


Journal ArticleDOI
TL;DR: The representation of the Hawkes process is considered both as a conditional intensity function and as a cluster Poisson process, which treats the probability of an action in continuous time via non-stationary distributions with arbitrarily long historical dependency.
Abstract: We apply the Hawkes process to the analysis of dyadic interaction. The Hawkes process is applicable to excitatory interactions, wherein the actions of each individual increase the probability of further actions in the near future. We consider the representation of the Hawkes process both as a conditional intensity function and as a cluster Poisson process. The former treats the probability of an action in continuous time via non-stationary distributions with arbitrarily long historical dependency, while the latter is conducive to maximum likelihood estimation using the EM algorithm. We first outline the interpretation of the Hawkes process in the dyadic context, and then illustrate its application with an example concerning email transactions in the work place.

50 citations


Journal ArticleDOI
TL;DR: Tests of measurement invariance based on stochastic processes of casewise derivatives of the likelihood function can be viewed as generalizations of the Lagrange multiplier test, and are especially useful for identifying subgroups of individuals that violate measurement invariant along a continuous auxiliary variable without prespecified thresholds.
Abstract: The issue of measurement invariance commonly arises in factor-analytic contexts, with methods for assessment including likelihood ratio tests, Lagrange multiplier tests, and Wald tests. These tests all require advance definition of the number of groups, group membership, and offending model parameters. In this paper, we study tests of measurement invariance based on stochastic processes of casewise derivatives of the likelihood function. These tests can be viewed as generalizations of the Lagrange multiplier test, and they are especially useful for: (i) identifying subgroups of individuals that violate measurement invariance along a continuous auxiliary variable without prespecified thresholds, and (ii) identifying specific parameters impacted by measurement invariance violations. The tests are presented and illustrated in detail, including an application to a study of stereotype threat and simulations examining the tests’ abilities in controlled conditions.

48 citations


Journal ArticleDOI
TL;DR: A generalized semiparametric SEM is developed that is able to handle mixed data types and to simultaneously model different functional relationships among latent variables using a Bayesian model-comparison statistic called the complete deviance information criterion (DIC).
Abstract: In behavioral, biomedical, and psychological studies, structural equation models (SEMs) have been widely used for assessing relationships between latent variables. Regression-type structural models based on parametric functions are often used for such purposes. In many applications, however, parametric SEMs are not adequate to capture subtle patterns in the functions over the entire range of the predictor variable. A different but equally important limitation of traditional parametric SEMs is that they are not designed to handle mixed data types—continuous, count, ordered, and unordered categorical. This paper develops a generalized semiparametric SEM that is able to handle mixed data types and to simultaneously model different functional relationships among latent variables. A structural equation of the proposed SEM is formulated using a series of unspecified smooth functions. The Bayesian P-splines approach and Markov chain Monte Carlo methods are developed to estimate the smooth functions and the unknown parameters. Moreover, we examine the relative benefits of semiparametric modeling over parametric modeling using a Bayesian model-comparison statistic, called the complete deviance information criterion (DIC). The performance of the developed methodology is evaluated using a simulation study. To illustrate the method, we used a data set derived from the National Longitudinal Survey of Youth.

45 citations


Journal ArticleDOI
TL;DR: This paper presents a new approach for score monitoring and assessment of scale drift that involves quality control charts, model-based approaches, and time series techniques to accommodate the following needs of monitoring scale scores: continuous monitoring, adjustment of customary variations, identification of abrupt shifts, and Assessment of autocorrelation.
Abstract: Maintaining a stable score scale over time is critical for all standardized educational assessments. Traditional quality control tools and approaches for assessing scale drift either require special equating designs, or may be too time-consuming to be considered on a regular basis with an operational test that has a short time window between an administration and its score reporting. Thus, the traditional methods are not sufficient to catch unusual testing outcomes in a timely manner. This paper presents a new approach for score monitoring and assessment of scale drift. It involves quality control charts, model-based approaches, and time series techniques to accommodate the following needs of monitoring scale scores: continuous monitoring, adjustment of customary variations, identification of abrupt shifts, and assessment of autocorrelation. Performance of the methodologies is evaluated using manipulated data based on real responses from 71 administrations of a large-scale high-stakes language assessment.

43 citations


Journal ArticleDOI
TL;DR: The use of feature-based ICA appears to be a valid tool for extracting intrinsic networks and will become a useful and important approach in the study of the macro-connectome, particularly in the context of data fusion.
Abstract: There is increasing use of functional imaging data to understand the macro-connectome of the human brain. Of particular interest is the structure and function of intrinsic networks (regions exhibiting temporally coherent activity both at rest and while a task is being performed), which account for a significant portion of the variance in functional MRI data. While networks are typically estimated based on the temporal similarity between regions (based on temporal correlation, clustering methods, or independent component analysis [ICA]), some recent work has suggested that these intrinsic networks can be extracted from the inter-subject covariation among highly distilled features, such as amplitude maps reflecting regions modulated by a task or even coordinates extracted from large meta analytic studies. In this paper our goal was to explicitly compare the networks obtained from a first-level ICA (ICA on the spatio-temporal functional magnetic resonance imaging (fMRI) data) to those from a second-level ICA (i.e., ICA on computed features rather than on the first-level fMRI data). Convergent results from simulations, task-fMRI data, and rest-fMRI data show that the second-level analysis is slightly noisier than the first-level analysis but yields strikingly similar patterns of intrinsic networks (spatial correlations as high as 0.85 for task data and 0.65 for rest data, well above the empirical null) and also preserves the relationship of these networks with other variables such as age (for example, default mode network regions tended to show decreased low frequency power for first-level analyses and decreased loading parameters for second-level analyses). In addition, the best-estimated second-level results are those which are the most strongly reflected in the input feature. In summary, the use of feature-based ICA appears to be a valid tool for extracting intrinsic networks. We believe it will become a useful and important approach in the study of the macro-connectome, particularly in the context of data fusion.

40 citations


Journal ArticleDOI
TL;DR: An advantage of MLTM-D for diagnosis is that it may be more applicable to large-scale assessments with more heterogeneous items than are latent class models.
Abstract: This paper presents a noncompensatory latent trait model, the multicomponent latent trait model for diagnosis (MLTM-D), for cognitive diagnosis. In MLTM-D, a hierarchical relationship between components and attributes is specified to be applicable to permit diagnosis at two levels. MLTM-D is a generalization of the multicomponent latent trait model (MLTM; Whitely in Psychometrika, 45:479-494, 1980; Embretson in Psychometrika, 49:175-186, 1984) to be applicable to measures of broad traits, such as achievement tests, in which component structure varies between items. Conditions for model identification are described and marginal maximum likelihood estimators are presented, along with simulation data to demonstrate parameter recovery. To illustrate how MLTM-D can be used for diagnosis, an application to a large-scale test of mathematics achievement is presented. An advantage of MLTM-D for diagnosis is that it may be more applicable to large-scale assessments with more heterogeneous items than are latent class models.

38 citations


Journal ArticleDOI
TL;DR: The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data and appears to be useful in assessing the item fit for unidimensional IRT models.
Abstract: Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

36 citations


Journal ArticleDOI
TL;DR: The Deterministic, Gated Item Response Theory Model is proposed to identify cheaters who obtain significant score gain on tests due to item exposure/compromise by conditioning on the item status (exposed or unexposed items).
Abstract: The Deterministic, Gated Item Response Theory Model (DGM, Shu, Unpublished Dissertation. The University of North Carolina at Greensboro, 2010) is proposed to identify cheaters who obtain significant score gain on tests due to item exposure/compromise by conditioning on the item status (exposed or unexposed items). A “gated” function is introduced to decompose the observed examinees’ performance into two distributions (the true ability distribution determined by examinees’ true ability and the cheating distribution determined by examinees’ cheating ability). Test cheaters who have score gain due to item exposure are identified through the comparison of the two distributions. Hierarchical Markov Chain Monte Carlo is used as the model’s estimation framework. Finally, the model is applied in a real data set to illustrate how the model can be used to identify examinees having pre-knowledge on the exposed items.

Journal ArticleDOI
TL;DR: The utility of nonlinear RSSS models is illustrated by fitting a nonlinear dynamic factor analysis model with regime-specific cross-regression parameters to a set of experience sampling affect data.
Abstract: Nonlinear dynamic factor analysis models extend standard linear dynamic factor analysis models by allowing time series processes to be nonlinear at the latent level (e.g., involving interaction between two latent processes). In practice, it is often of interest to identify the phases—namely, latent “regimes” or classes—during which a system is characterized by distinctly different dynamics. We propose a new class of models, termed nonlinear regime-switching state-space (RSSS) models, which subsumes regime-switching nonlinear dynamic factor analysis models as a special case. In nonlinear RSSS models, the change processes within regimes, represented using a state-space model, are allowed to be nonlinear. An estimation procedure obtained by combining the extended Kalman filter and the Kim filter is proposed as a way to estimate nonlinear RSSS models. We illustrate the utility of nonlinear RSSS models by fitting a nonlinear dynamic factor analysis model with regime-specific cross-regression parameters to a set of experience sampling affect data. The parallels between nonlinear RSSS models and other well-known discrete change models in the literature are discussed briefly.

Journal ArticleDOI
TL;DR: The authors reviewed some aspects of psychometric projects that I have been involved in, emphasizing the nature of the work of the psychometricians involved, especially the balance between the statistical and scientific elements of that work.
Abstract: In this paper, I will review some aspects of psychometric projects that I have been involved in, emphasizing the nature of the work of the psychometricians involved, especially the balance between the statistical and scientific elements of that work. The intent is to seek to understand where psychometrics, as a discipline, has been and where it might be headed, in part at least, by considering one particular journey (my own). In contemplating this, I also look to psychometrics journals to see how psychometricians represent themselves to themselves, and in a complementary way, look to substantive journals to see how psychometrics is represented there (or perhaps, not represented, as the case may be). I present a series of questions in order to consider the issue of what are the appropriate foci of the psychometric discipline. As an example, I present one recent project at the end, where the roles of the psychometricians and the substantive researchers have had to become intertwined in order to make satisfactory progress. In the conclusion I discuss the consequences of such a view for the future of psychometrics.

Journal ArticleDOI
TL;DR: An overview of the observed-score equating process is provided from the perspective of a unifying equating framework and issues related to the test, common items, and sampling designs and their relationship to measurement and equating are discussed.
Abstract: In this paper, an overview of the observed-score equating (OSE) process is provided from the perspective of a unifying equating framework (von Davier in von Davier (Ed.), Statistical models for test equating, scaling, and linking, Springer, New York, pp. 1–17, 2011b). The framework includes all OSE approaches. Issues related to the test, common items, and sampling designs and their relationship to measurement and equating are discussed. Challenges to the equating process, model assumptions, and approaches to equating evaluation are also presented. The equating process is illustrated step-by-step with a real data example from a licensure test.

Journal ArticleDOI
TL;DR: This paper studies the identification of a particular case of the 3PL model, namely when the discrimination parameters are all constant and equal to 1, and shows that, after introducing two identification restrictions, the distribution G and the item parameters are identified provided an infinite quantity of items is available.
Abstract: In this paper, we study the identification of a particular case of the 3PL model, namely when the discrimination parameters are all constant and equal to 1. We term this model, 1PL-G model. The identification analysis is performed under three different specifications. The first specification considers the abilities as unknown parameters. It is proved that the item parameters and the abilities are identified if a difficulty parameter and a guessing parameter are fixed at zero. The second specification assumes that the abilities are mutually independent and identically distributed according to a distribution known up to the scale parameter. It is shown that the item parameters and the scale parameter are identified if a guessing parameter is fixed at zero. The third specification corresponds to a semi-parametric 1PL-G model, where the distribution G generating the abilities is a parameter of interest. It is not only shown that, after fixing a difficulty parameter and a guessing parameter at zero, the item parameters are identified, but also that under those restrictions the distribution G is not identified. It is finally shown that, after introducing two identification restrictions, either on the distribution G or on the item parameters, the distribution G and the item parameters are identified provided an infinite quantity of items is available.

Journal ArticleDOI
TL;DR: Harmonic regression, a seasonal-adjustment method, can be useful in monitoring scale stability when the number of years available is limited and when the observations are unevenly spaced.
Abstract: Monitoring a very frequently administered educational test with a relatively short history of stable operation imposes a number of challenges. Test scores usually vary by season, and the frequency of administration of such educational tests is also seasonal. Although it is important to react to unreasonable changes in the distributions of test scores in a timely fashion, it is not a simple matter to ascertain what sort of distribution is really unusual. Many commonly used approaches for seasonal adjustment are designed for time series with evenly spaced observations that span many years and, therefore, are inappropriate for data from such educational tests. Harmonic regression, a seasonal-adjustment method, can be useful in monitoring scale stability when the number of years available is limited and when the observations are unevenly spaced. Additional forms of adjustments can be included to account for variability in test scores due to different sources of population variations. To illustrate, real data are considered from an international language assessment.

Journal ArticleDOI
TL;DR: Latent state-trait analyses revealed that variances in scores of the RGSE can be decomposed into six components: stable self-esteem (40 %), ephemeral (or temporal-state) variance (36 %), stable negative method variance (9 %, stable positive method Variance (4 %), specific variance (1 %) and random error variance (10 %).
Abstract: The present research evaluates the stability of self-esteem as assessed by a daily version of the Rosenberg (Society and the adolescent self-image, Princeton University Press, Princeton, 1965) general self-esteem scale (RGSE). The scale was administered to 391 undergraduates for five consecutive days. The longitudinal data were analyzed using the integrated LC-LSTM framework that allowed us to evaluate: (1) the measurement invariance of the RGSE, (2) its stability and change across the 5-day assessment period, (3) the amount of variance attributable to stable and transitory latent factors, and (4) the criterion-related validity of these factors. Results provided evidence for measurement invariance, mean-level stability, and rank-order stability of daily self-esteem. Latent state-trait analyses revealed that variances in scores of the RGSE can be decomposed into six components: stable self-esteem (40 %), ephemeral (or temporal-state) variance (36 %), stable negative method variance (9 %), stable positive method variance (4 %), specific variance (1 %) and random error variance (10 %). Moreover, latent factors associated with daily self-esteem were associated with measures of depression, implicit self-esteem, and grade point average.

Journal ArticleDOI
TL;DR: Cl clusterwise simultaneous component analysis is proposed which simultaneously clusters blocks with a similar structure and performs an SCA per cluster, and the number of components was restricted to be the same across clusters, which is often unrealistic.
Abstract: Given multivariate multiblock data (e.g., subjects nested in groups are measured on multiple variables), one may be interested in the nature and number of dimensions that underlie the variables, and in differences in dimensional structure across data blocks. To this end, clusterwise simultaneous component analysis (SCA) was proposed which simultaneously clusters blocks with a similar structure and performs an SCA per cluster. However, the number of components was restricted to be the same across clusters, which is often unrealistic. In this paper, this restriction is removed. The resulting challenges with respect to model estimation and selection are resolved.

Journal ArticleDOI
TL;DR: A multiobjective tabu search procedure is proposed for estimating the set of Pareto efficient blockmodels and is used in three examples that demonstrate possible applications of the multiobjectives blockmodeling paradigm.
Abstract: To date, most methods for direct blockmodeling of social network data have focused on the optimization of a single objective function. However, there are a variety of social network applications where it is advantageous to consider two or more objectives simultaneously. These applications can broadly be placed into two categories: (1) simultaneous optimization of multiple criteria for fitting a blockmodel based on a single network matrix and (2) simultaneous optimization of multiple criteria for fitting a blockmodel based on two or more network matrices, where the matrices being fit can take the form of multiple indicators for an underlying relationship, or multiple matrices for a set of objects measured at two or more different points in time. A multiobjective tabu search procedure is proposed for estimating the set of Pareto efficient blockmodels. This procedure is used in three examples that demonstrate possible applications of the multiobjective blockmodeling paradigm.

Journal ArticleDOI
TL;DR: The derivation shows that the asymptotic sampling distribution of the test statistic for testing a single bivariate component in an ACE or ADE model is a mixture of χ2 distributions of degrees of freedom (dfs) ranging from 0 to 3, and that for testing both the A and C (or D) components is one of dfs ranging from0 to 6.
Abstract: The ACE and ADE models have been heavily exploited in twin studies to identify the genetic and environmental components in phenotypes. However, the validity of the likelihood ratio test (LRT) of the existence of a variance component, a key step in the use of such models, has been doubted because the true values of the parameters lie on the boundary of the parameter space of the alternative model for such tests, violating a regularity condition required for a LRT (e.g., Carey in Behav. Genet. 35:653–665, 2005; Visscher in Twin Res. Hum. Genet. 9:490–495, 2006). Dominicus, Skrondal, Gjessing, Pedersen, and Palmgren (Behav. Genet. 36:331–340, 2006) solve the problem of testing univariate components in ACDE models. Our current work as presented in this paper resolves the issue of LRTs in bivariate ACDE models by exploiting the theoretical frameworks of inequality constrained LRTs based on cone approximations. Our derivation shows that the asymptotic sampling distribution of the test statistic for testing a single bivariate component in an ACE or ADE model is a mixture of χ2 distributions of degrees of freedom (dfs) ranging from 0 to 3, and that for testing both the A and C (or D) components is one of dfs ranging from 0 to 6. These correct distributions are stochastically smaller than the χ2 distributions in traditional LRTs and therefore LRTs based on these distributions are more powerful than those used naively. Formulas for calculating the weights are derived and the sampling distributions are confirmed by simulation studies. Several invariance properties for normal data (at most) missing by person are also proved. Potential generalizations of this work are also discussed.

Journal ArticleDOI
TL;DR: There is no clear advantage in using goodness-of-fit statistics specifically designed for Rasch-type models to test these models when marginal ML estimation is used, and three statistics were found to be more powerful than Pearson’s X2 against two- and three-parameter logistic alternatives and against multidimensional 1PL models.
Abstract: We investigate the performance of three statistics, R 1, R 2 (Glas in Psychometrika 53:525–546, 1988), and M 2 (Maydeu-Olivares & Joe in J. Am. Stat. Assoc. 100:1009–1020, 2005, Psychometrika 71:713–732, 2006) to assess the overall fit of a one-parameter logistic model (1PL) estimated by (marginal) maximum likelihood (ML). R 1 and R 2 were specifically designed to target specific assumptions of Rasch models, whereas M 2 is a general purpose test statistic. We report asymptotic power rates under some interesting violations of model assumptions (different item discrimination, presence of guessing, and multidimensionality) as well as empirical rejection rates for correctly specified models and some misspecified models. All three statistics were found to be more powerful than Pearson’s X 2 against two- and three-parameter logistic alternatives (2PL and 3PL), and against multidimensional 1PL models. The results suggest that there is no clear advantage in using goodness-of-fit statistics specifically designed for Rasch-type models to test these models when marginal ML estimation is used.

Journal ArticleDOI
TL;DR: The potential relevance of the response times for psychological assessment is explored for the model of van der Linden (Psychometrika 72:287–308, 2007) that seems to have become the standard approach to response time modeling in educational testing.
Abstract: Findings suggest that in psychological tests not only the responses but also the times needed to give the responses are related to characteristics of the test taker. This observation has stimulated the development of latent trait models for the joint distribution of the responses and the response times. Such models are motivated by the hope to improve the estimation of the latent traits by additionally considering response time. In this article, the potential relevance of the response times for psychological assessment is explored for the model of van der Linden (Psychometrika 72:287–308, 2007) that seems to have become the standard approach to response time modeling in educational testing. It can be shown that the consideration of response times increases the information of the test. However, one also can prove that the contribution of the response times to the test information is bounded and has a simple limit.

Journal ArticleDOI
TL;DR: This article studies item response theory equating methods for complex linkage plans when the common-item nonequivalent group design is used and an efficient way to average equating coefficients that link the same two forms through different paths will be presented.
Abstract: Linkage plans can be rather complex, including many forms, several links, and the connection of forms through different paths. This article studies item response theory equating methods for complex linkage plans when the common-item nonequivalent group design is used. An efficient way to average equating coefficients that link the same two forms through different paths will be presented and the asymptotic standard errors of indirect and average equating coefficients are derived. The methodology is illustrated using simulations studies and a real data example.

Journal ArticleDOI
TL;DR: This paper introduces some of the normalization and calibration methods that have been proposed for making the BOLD signal a more accurate reflection of underlying brain activity for human fMRI studies.
Abstract: In functional magnetic resonance imaging (fMRI), the blood oxygenation level dependent (BOLD) signal is often interpreted as a measure of neural activity. However, because the BOLD signal reflects the complex interplay of neural, vascular, and metabolic processes, such an interpretation is not always valid. There is growing evidence that changes in the baseline neurovascular state can result in significant modulations of the BOLD signal that are independent of changes in neural activity. This paper introduces some of the normalization and calibration methods that have been proposed for making the BOLD signal a more accurate reflection of underlying brain activity for human fMRI studies.

Journal ArticleDOI
TL;DR: The results show that the invariance assumption might be violated by the empirical data even when the model’s fit is very good, and the proposed method may prove to be a promising tool to detect invariance violations of the BLIM.
Abstract: In knowledge space theory, the knowledge state of a student is the set of all problems he is capable of solving in a specific knowledge domain and a knowledge structure is the collection of knowledge states. The basic local independence model (BLIM) is a probabilistic model for knowledge structures. The BLIM assumes a probability distribution on the knowledge states and a lucky guess and a careless error probability for each problem. A key assumption of the BLIM is that the lucky guess and careless error probabilities do not depend on knowledge states (invariance assumption). This article proposes a method for testing the violations of this specific assumption. The proposed method was assessed in a simulation study and in an empirical application. The results show that (1) the invariance assumption might be violated by the empirical data even when the model’s fit is very good, and (2) the proposed method may prove to be a promising tool to detect invariance violations of the BLIM.

Journal ArticleDOI
TL;DR: This study examined the dimensionality of a person’s word knowledge, termed lexical representation, and how aspects of morphological knowledge contributed to lexical representations for different persons, items, and item groups.
Abstract: This paper presents an explanatory multidimensional multilevel random item response model and its application to reading data with multilevel item structure. The model includes multilevel random item parameters that allow consideration of variability in item parameters at both item and item group levels. Item-level random item parameters were included to model unexplained variance remaining when item related covariates were used to explain variation in item difficulties. Item group-level random item parameters were included to model dependency in item responses among items having the same item stem. Using the model, this study examined the dimensionality of a person’s word knowledge, termed lexical representation, and how aspects of morphological knowledge contributed to lexical representations for different persons, items, and item groups.

Journal ArticleDOI
Kohei Adachi1
TL;DR: This paper proves the following fact: the EM algorithm always gives a proper solution with positive unique variances and factor correlations with absolute values that do not exceed one, when the covariance matrix to be analyzed and the initial matrices including uniquevariances and inter-factor correlations are positive definite.
Abstract: Rubin and Thayer (Psychometrika, 47:69-76, 1982) proposed the EM algorithm for exploratory and confirmatory maximum likelihood factor analysis. In this paper, we prove the following fact: the EM algorithm always gives a proper solution with positive unique variances and factor correlations with absolute values that do not exceed one, when the covariance matrix to be analyzed and the initial matrices including unique variances and inter-factor correlations are positive definite. We further numerically demonstrate that the EM algorithm yields proper solutions for the data which lead the prevailing gradient algorithms for factor analysis to produce improper solutions. The numerical studies also show that, in real computations with limited numerical precision, Rubin and Thayer's (Psychometrika, 47:69-76, 1982) original formulas for confirmatory factor analysis can make factor correlation matrices asymmetric, so that the EM algorithm fails to converge. However, this problem can be overcome by using an EM algorithm in which the original formulas are replaced by those guaranteeing the symmetry of factor correlation matrices, or by formulas used to prove the above fact.

Journal ArticleDOI
TL;DR: The results of a synthetic example and a consumer psychology study involving categories of restaurant brands illustrate how the application of the proposed methodology to the new sorting task can account for a variety of category phenomena including multiple category memberships and for heterogeneity through individual differences in the saliency of latent category structures.
Abstract: We introduce a new statistical procedure for the identification of unobserved categories that vary between individuals and in which objects may span multiple categories. This procedure can be used to analyze data from a proposed sorting task in which individuals may simultaneously assign objects to multiple piles. The results of a synthetic example and a consumer psychology study involving categories of restaurant brands illustrate how the application of the proposed methodology to the new sorting task can account for a variety of categorization phenomena including multiple category memberships and for heterogeneity through individual differences in the saliency of latent category structures.

Journal ArticleDOI
TL;DR: Some variants of the Candecomp/Parafac model where the orthogonality constraints are relaxed either by constraining only a pair, or a subset, of components or by stimulating the CP solution to be possibly orthogonal are considered.
Abstract: The Candecomp/Parafac (CP) model is a well-known tool for summarizing a three-way array by extracting a limited number of components. Unfortunately, in some cases, the model suffers from the so-called degeneracy, that is a solution with diverging and uninterpretable components. To avoid degeneracy, orthogonality constraints are usually applied to one of the component matrices. This solves the problem only from a technical point of view because the existence of orthogonal components underlying the data is not guaranteed. For this purpose, we consider some variants of the CP model where the orthogonality constraints are relaxed either by constraining only a pair, or a subset, of components or by stimulating the CP solution to be possibly orthogonal. We theoretically clarify that only the latter approach, based on the least absolute shrinkage and selection operator and named the CP-Lasso, is helpful in solving the degeneracy problem. The results of the application of CP-Lasso on simulated and real life data show its effectiveness.

Journal ArticleDOI
TL;DR: In this paper, a multilevel Latent Transition Analysis (LTA) with a mixture IRT measurement model (MixIRTM) is described for investigating the effectiveness of an intervention.
Abstract: A multilevel latent transition analysis (LTA) with a mixture IRT measurement model (MixIRTM) is described for investigating the effectiveness of an intervention. The addition of a MixIRTM to the multilevel LTA permits consideration of both potential heterogeneity in students’ response to instructional intervention as well as a methodology for assessing stage sequential change over time at both student and teacher levels. Results from an LTA–MixIRTM and multilevel LTA–MixIRTM were compared in the context of an educational intervention study. Both models were able to describe homogeneities in problem solving and transition patterns. However, ignoring a multilevel structure in LTA–MixIRTM led to different results in group membership assignment in empirical results. Results for the multilevel LTA–MixIRTM indicated that there were distinct individual differences in the different transition patterns. The students receiving the intervention treatment outscored their business as usual (i.e., control group) counterparts on the curriculum-based Fractions Computation test. In addition, 27.4 % of the students in the sample moved from the low ability student-level latent class to the high ability student-level latent class. Students were characterized differently depending on the teacher-level latent class.