scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 2018"


Journal ArticleDOI
TL;DR: A Bayesian framework for estimating the DINA Q matrix is proposed and applied to Tatsuoka’s fraction-subtraction dataset to support the accuracy of parameter recovery.
Abstract: Cognitive diagnosis models are partially ordered latent class models and are used to classify students into skill mastery profiles. The deterministic inputs, noisy “and” gate model (DINA) is a popular psychometric model for cognitive diagnosis. Application of the DINA model requires content expert knowledge of a Q matrix, which maps the attributes or skills needed to master a collection of items. Misspecification of Q has been shown to yield biased diagnostic classifications. We propose a Bayesian framework for estimating the DINA Q matrix. The developed algorithm builds upon prior research (Chen, Liu, Xu, & Ying, in J Am Stat Assoc 110(510):850–866, 2015) and ensures the estimated Q matrix is identified. Monte Carlo evidence is presented to support the accuracy of parameter recovery. The developed methodology is applied to Tatsuoka’s fraction-subtraction dataset.

71 citations


Journal ArticleDOI
TL;DR: Estimated standard errors are derived for the two-step estimates of the structural model which account for the uncertainty from both steps of the estimation, and how the method can be implemented in existing software for latent variable modelling is shown.
Abstract: We consider models which combine latent class measurement models for categorical latent variables with structural regression models for the relationships between the latent classes and observed explanatory and response variables. We propose a two-step method of estimating such models. In its first step, the measurement model is estimated alone, and in the second step the parameters of this measurement model are held fixed when the structural model is estimated. Simulation studies and applied examples suggest that the two-step method is an attractive alternative to existing one-step and three-step methods. We derive estimated standard errors for the two-step estimates of the structural model which account for the uncertainty from both steps of the estimation, and show how the method can be implemented in existing software for latent variable modelling.

64 citations


Journal ArticleDOI
TL;DR: A two-stage method that yields accurate item and person parameter estimates, as well as high true detection rate and low false detection rate, under different manipulated conditions mimicking NAEP parameters is proposed.
Abstract: Statistical methods for identifying aberrances on psychological and educational tests are pivotal to detect flaws in the design of a test or irregular behavior of test takers Two approaches have been taken in the past to address the challenge of aberrant behavior detection, which are (1) modeling aberrant behavior via mixture modeling methods, and (2) flagging aberrant behavior via residual based outlier detection methods In this paper, we propose a two-stage method that is conceived of as a combination of both approaches In the first stage, a mixture hierarchical model is fitted to the response and response time data to distinguish normal and aberrant behaviors using Markov chain Monte Carlo (MCMC) algorithm In the second stage, a further distinction between rapid guessing and cheating behavior is made at a person level using a Bayesian residual index Simulation results show that the two-stage method yields accurate item and person parameter estimates, as well as high true detection rate and low false detection rate, under different manipulated conditions mimicking NAEP parameters A real data example is given in the end to illustrate the potential application of the proposed method

48 citations


Journal ArticleDOI
TL;DR: The Transition Diagnostic Classification Model is introduced, which combines latent transition analysis with the log-linear cognitive diagnosis model to provide methodology for analyzing growth in a general DCM framework and results indicate that the proposed model is flexible, provides accurate and reliable classifications, and is quite robust to violations to measurement invariance over time.
Abstract: A common assessment research design is the single-group pre-test/post-test design in which examinees are administered an assessment before instruction and then another assessment after instruction. In this type of study, the primary objective is to measure growth in examinees, individually and collectively. In an item response theory (IRT) framework, longitudinal IRT models can be used to assess growth in examinee ability over time. In a diagnostic classification model (DCM) framework, assessing growth translates to measuring changes in attribute mastery status over time, thereby providing a categorical, criterion-referenced interpretation of growth. This study introduces the Transition Diagnostic Classification Model (TDCM), which combines latent transition analysis with the log-linear cognitive diagnosis model to provide methodology for analyzing growth in a general DCM framework. Simulation study results indicate that the proposed model is flexible, provides accurate and reliable classifications, and is quite robust to violations to measurement invariance over time. The TDCM is used to analyze pre-test/post-test data from a diagnostic mathematics assessment.

36 citations


Journal ArticleDOI
TL;DR: An autoregressive GLMM with crossed random effects that accounts for variability in lag effects across persons and items is presented and is shown to be applicable to intensive binary time series eye-tracking data when researchers are interested in detecting experimental condition effects while controlling for previous responses.
Abstract: As a method to ascertain person and item effects in psycholinguistics, a generalized linear mixed effect model (GLMM) with crossed random effects has met limitations in handing serial dependence across persons and items. This paper presents an autoregressive GLMM with crossed random effects that accounts for variability in lag effects across persons and items. The model is shown to be applicable to intensive binary time series eye-tracking data when researchers are interested in detecting experimental condition effects while controlling for previous responses. In addition, a simulation study shows that ignoring lag effects can lead to biased estimates and underestimated standard errors for the experimental condition effects.

32 citations


Journal ArticleDOI
TL;DR: A general nonparametric classification method that allows for assigning examinee’s strengths and weaknesses in terms of cognitive skills learned and skills that need study when sample sizes are at the classroom level is proposed as an extension of the non parametric classification (NPC) method.
Abstract: The focus of cognitive diagnosis (CD) is on evaluating an examinee’s strengths and weaknesses in terms of cognitive skills learned and skills that need study. Current methods for fitting CD models (CDMs) work well for large-scale assessments, where the data of hundreds or thousands of examinees are available. However, the development of CD-based assessment tools that can be used in small-scale test settings, say, for monitoring the instruction and learning process at the classroom level has not kept up with the rapid pace at which research and development proceeded for large-scale assessments. The main reason is that the sample sizes of the small-scale test settings are simply too small to guarantee the reliable estimation of item parameters and examinees’ proficiency class membership. In this article, a general nonparametric classification (GNPC) method that allows for assigning examinees to the correct proficiency classes with a high rate when sample sizes are at the classroom level is proposed as an extension of the nonparametric classification (NPC) method (Chiu and Douglas in J Classif 30:225–250, 2013). The proposed method remedies the shortcomings of the NPC method and can accommodate any CDM. The theoretical justification and the empirical studies are presented based on the saturated general CDMs, supporting the legitimacy of using the GNPC method with any CDM. The results from the simulation studies and real data analysis show that the GNPC method outperforms the general CDMs when samples are small.

28 citations


Journal ArticleDOI
TL;DR: The method of response mixture modeling is presented to account for within-subject heterogeneity in the item characteristics across response times and is shown to be viable in terms of parameter recovery.
Abstract: In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.

26 citations


Journal ArticleDOI
TL;DR: It is shown that choice overload varies substantially as a function of the six dependent measures and four moderators examined in the domain and that there are potentially interesting and theoretically important interactions among them.
Abstract: We introduce multilevel multivariate meta-analysis methodology designed to account for the complexity of contemporary psychological research data Our methodology directly models the observations from a set of studies in a manner that accounts for the variation and covariation induced by the facts that observations differ in their dependent measures and moderators and are nested within, for example, papers, studies, groups of subjects, and study conditions Our methodology is motivated by data from papers and studies of the choice overload hypothesis It more fully accounts for the complexity of choice overload data relative to two prior meta-analyses and thus provides richer insight In particular, it shows that choice overload varies substantially as a function of the six dependent measures and four moderators examined in the domain and that there are potentially interesting and theoretically important interactions among them It also shows that the various dependent measures have differing levels of variation and that levels up to and including the highest (ie, the fifth, or paper, level) are necessary to capture the variation and covariation induced by the nesting structure Our results have substantial implications for future studies of choice overload

25 citations


Journal ArticleDOI
TL;DR: The potential of using a cognitive model for decision making, the Markov decision process, to provide a mapping between within-task actions and latent traits of interest is explored and estimates from the model are found to correlate more strongly with posttest results than a partial-credit IRT model based on outcome data alone.
Abstract: Within-task actions can provide additional information on student competencies but are challenging to model. This paper explores the potential of using a cognitive model for decision making, the Markov decision process, to provide a mapping between within-task actions and latent traits of interest. Psychometric properties of the model are explored, and simulation studies report on parameter recovery within the context of a simple strategy game. The model is then applied to empirical data from an educational game. Estimates from the model are found to correlate more strongly with posttest results than a partial-credit IRT model based on outcome data alone.

23 citations


Journal ArticleDOI
TL;DR: A generalization of the speed–accuracy response model (SARM) introduced by Maris and van der Maas showed improved fit compared to the one-parameter SARM in both data sets, and an expectation–maximization (EM) algorithm for estimating model parameters and standard errors was developed.
Abstract: We propose a generalization of the speed–accuracy response model (SARM) introduced by Maris and van der Maas (Psychometrika 77:615–633, 2012). In these models, the scores that result from a scoring rule that incorporates both the speed and accuracy of item responses are modeled. Our generalization is similar to that of the one-parameter logistic (or Rasch) model to the two-parameter logistic (or Birnbaum) model in item response theory. An expectation–maximization (EM) algorithm for estimating model parameters and standard errors was developed. Furthermore, methods to assess model fit are provided in the form of generalized residuals for item score functions and saddlepoint approximations to the density of the sum score. The presented methods were evaluated in a small simulation study, the results of which indicated good parameter recovery and reasonable type I error rates for the residuals. Finally, the methods were applied to two real data sets. It was found that the two-parameter SARM showed improved fit compared to the one-parameter SARM in both data sets.

22 citations


Journal ArticleDOI
TL;DR: The utility of a novel regime-switching differential equation model in representing children’s tendency to exhibit shifts between the goal of staying close to their mothers and intermittent interest in moving away from their mothers to explore the room during the SSP is illustrated.
Abstract: A growing number of social scientists have turned to differential equations as a tool for capturing the dynamic interdependence among a system of variables. Current tools for fitting differential equation models do not provide a straightforward mechanism for diagnosing evidence for qualitative shifts in dynamics, nor do they provide ways of identifying the timing and possible determinants of such shifts. In this paper, we discuss regime-switching differential equation models, a novel modeling framework for representing abrupt changes in a system of differential equation models. Estimation was performed by combining the Kim filter (Kim and Nelson State-space models with regime switching: classical and Gibbs-sampling approaches with applications, MIT Press, Cambridge, 1999) and a numerical differential equation solver that can handle both ordinary and stochastic differential equations. The proposed approach was motivated by the need to represent discrete shifts in the movement dynamics of [Formula: see text] mother-infant dyads during the Strange Situation Procedure (SSP), a behavioral assessment where the infant is separated from and reunited with the mother twice. We illustrate the utility of a novel regime-switching differential equation model in representing children's tendency to exhibit shifts between the goal of staying close to their mothers and intermittent interest in moving away from their mothers to explore the room during the SSP. Results from empirical model fitting were supplemented with a Monte Carlo simulation study to evaluate the use of information criterion measures to diagnose sudden shifts in dynamics.

Journal ArticleDOI
TL;DR: The proposed statistics offer an indication of which actors are most distinctive in the network structure, in terms of not abiding by the structural norms present across other actors.
Abstract: We discuss measuring and detecting influential observations and outliers in the context of exponential family random graph (ERG) models for social networks. We focus on the level of the nodes of the network and consider those nodes whose removal would result in changes to the model as extreme or "central" with respect to the structural features that "matter". We construe removal in terms of two case-deletion strategies: the tie-variables of an actor are assumed to be unobserved, or the node is removed resulting in the induced subgraph. We define the difference in inferred model resulting from case deletion from the perspective of information theory and difference in estimates, in both the natural and mean-value parameterisation, representing varying degrees of approximation. We arrive at several measures of influence and propose the use of two that do not require refitting of the model and lend themselves to routine application in the ERGM fitting procedure. MCMC p values are obtained for testing how extreme each node is with respect to the network structure. The influence measures are applied to two well-known data sets to illustrate the information they provide. From a network perspective, the proposed statistics offer an indication of which actors are most distinctive in the network structure, in terms of not abiding by the structural norms present across other actors.

Journal ArticleDOI
TL;DR: This article demonstrates that the S–L loadings matrix is necessarily rank deficient, and shows how this feature of the S—L transformation can be used to obtain a direct S-L solution from an unrotated first-level factor structure.
Abstract: The Schmid-Leiman (S-L; Psychometrika 22: 53-61, 1957) transformation is a popular method for conducting exploratory bifactor analysis that has been used in hundreds of studies of individual differences variables. To perform a two-level S-L transformation, it is generally believed that two separate factor analyses are required: a first-level analysis in which k obliquely rotated factors are extracted from an observed-variable correlation matrix, and a second-level analysis in which a general factor is extracted from the correlations of the first-level factors. In this article, I demonstrate that the S-L loadings matrix is necessarily rank deficient. I then show how this feature of the S-L transformation can be used to obtain a direct S-L solution from an unrotated first-level factor structure. Next, I reanalyze two examples from Mansolf and Reise (Multivar Behav Res 51: 698-717, 2016) to illustrate the utility of 'best-fitting' S-L rotations when gauging the ability of hierarchical factor models to recover known bifactor structures. Finally, I show how to compute direct bifactor solutions for non-hierarchical bifactor structures. An online supplement includes R code to reproduce all of the analyses that are reported in the article.

Journal ArticleDOI
TL;DR: Extending the Linear Model with R : Generalized Linear, Mixed Effects and Nonparametric Regression Models, 2nd edition as mentioned in this paper, extended the linear model with R with generalized linear, mixed effects and nonparametric regression models.
Abstract: Extending the Linear Model with R : Generalized Linear, Mixed Effects and Nonparametric Regression Models, 2nd edition

Journal ArticleDOI
TL;DR: The potential for asymmetric IRT models to inform empirically about underlying item complexity, and thus the potential value of asymmetric models as tools for item validation, is studied.
Abstract: While item complexity is often considered as an item feature in test development, it is much less frequently attended to in the psychometric modeling of test items. Prior work suggests that item complexity may manifest through asymmetry in item characteristics curves (ICCs; Samejima in Psychometrika 65:319-335, 2000). In the current paper, we study the potential for asymmetric IRT models to inform empirically about underlying item complexity, and thus the potential value of asymmetric models as tools for item validation. Both simulation and real data studies are presented. Some psychometric consequences of ignoring asymmetry, as well as potential strategies for more effective estimation of asymmetry, are considered in discussion.

Journal ArticleDOI
TL;DR: Most of the ethnic differences in anxiety or depression between Hispanic and non-Hispanic white cancer survivors were explained by younger diagnosis age, lower education level, lower proportions of employment, less likely of being born in the USA, less insurance, and less social support among Hispanic patients.
Abstract: Mediation analysis allows the examination of effects of a third variable (mediator/confounder) in the causal pathway between an exposure and an outcome. The general multiple mediation analysis method (MMA), proposed by Yu et al., improves traditional methods (e.g., estimation of natural and controlled direct effects) to enable consideration of multiple mediators/confounders simultaneously and the use of linear and nonlinear predictive models for estimating mediation/confounding effects. Previous studies find that compared with non-Hispanic cancer survivors, Hispanic survivors are more likely to endure anxiety and depression after cancer diagnoses. In this paper, we applied MMA on MY-Health study to identify mediators/confounders and quantify the indirect effect of each identified mediator/confounder in explaining ethnic disparities in anxiety and depression among cancer survivors who enrolled in the study. We considered a number of socio-demographic variables, tumor characteristics, and treatment factors as potential mediators/confounders and found that most of the ethnic differences in anxiety or depression between Hispanic and non-Hispanic white cancer survivors were explained by younger diagnosis age, lower education level, lower proportions of employment, less likely of being born in the USA, less insurance, and less social support among Hispanic patients.

Journal ArticleDOI
TL;DR: The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation, to solve the problem of automated item generation.
Abstract: Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.

Journal ArticleDOI
TL;DR: Simulation results with an operational item pool indicate that, compared to the analysis of responses alone, utilizing response times can afford marked improvements in detection power with fewer false positives.
Abstract: Item compromise persists in undermining the integrity of testing, even secure administrations of computerized adaptive testing (CAT) with sophisticated item exposure controls. In ongoing efforts to tackle this perennial security issue in CAT, a couple of recent studies investigated sequential procedures for detecting compromised items, in which a significant increase in the proportion of correct responses for each item in the pool is monitored in real time using moving averages. In addition to actual responses, response times are valuable information with tremendous potential to reveal items that may have been leaked. Specifically, examinees that have preknowledge of an item would likely respond more quickly to it than those who do not. Therefore, the current study proposes several augmented methods for the detection of compromised items, all involving simultaneous monitoring of changes in both the proportion correct and average response time for every item using various moving average strategies. Simulation results with an operational item pool indicate that, compared to the analysis of responses alone, utilizing response times can afford marked improvements in detection power with fewer false positives.

Journal ArticleDOI
TL;DR: This paper focuses on a family of recently proposed tests based on stochastic processes of casewise derivatives of the likelihood function, which have been previously applied in factor-analytic, continuous data contexts as well as in models of the Rasch family and aim to extend these tests to two-parameter item response models, with strong emphasis on pairwise maximum likelihood.
Abstract: Measurement invariance is a fundamental assumption in item response theory models, where the relationship between a latent construct (ability) and observed item responses is of interest. Violation of this assumption would render the scale misinterpreted or cause systematic bias against certain groups of persons. While a number of methods have been proposed to detect measurement invariance violations, they typically require advance definition of problematic item parameters and respondent grouping information. However, these pieces of information are typically unknown in practice. As an alternative, this paper focuses on a family of recently proposed tests based on stochastic processes of casewise derivatives of the likelihood function (i.e., scores). These score-based tests only require estimation of the null model (when measurement invariance is assumed to hold), and they have been previously applied in factor-analytic, continuous data contexts as well as in models of the Rasch family. In this paper, we aim to extend these tests to two-parameter item response models, with strong emphasis on pairwise maximum likelihood. The tests' theoretical background and implementation are detailed, and the tests' abilities to identify problematic item parameters are studied via simulation. An empirical example illustrating the tests' use in practice is also provided.

Journal ArticleDOI
TL;DR: In this article, an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence was proposed to provide more robust measurements.
Abstract: Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.

Journal ArticleDOI
TL;DR: It is argued that psychometric properties of change scores, such as reliability and measurement precision, should be treated at suitable levels within a multilevel framework and shown that, if examined at the suitable levels with such a framework, the negative beliefs about change scores can be renounced convincingly.
Abstract: Change scores obtained in pretest-posttest designs are important for evaluating treatment effectiveness and for assessing change of individual test scores in psychological research. However, over the years the use of change scores has raised much controversy. In this article, from a multilevel perspective, we provide a structured treatise on several persistent negative beliefs about change scores and show that these beliefs originated from the confounding of the effects of within-person change on change-score reliability and between-person change differences. We argue that psychometric properties of change scores, such as reliability and measurement precision, should be treated at suitable levels within a multilevel framework. We show that, if examined at the suitable levels with such a framework, the negative beliefs about change scores can be renounced convincingly. Finally, we summarize the conclusions about change scores to dispel the myths and to promote the potential and practical usefulness of change scores.

Journal ArticleDOI
TL;DR: This article proposes a resampling-based method, namely bootstrap calibration (BC), to reduce the impact of the carryover sampling error on the interval estimates of LV scores, and investigates the finite-sample performance of BC via Monte Carlo simulations and applies it to two empirical data examples.
Abstract: In most item response theory applications, model parameters need to be first calibrated from sample data. Latent variable (LV) scores calculated using estimated parameters are thus subject to sampling error inherited from the calibration stage. In this article, we propose a resampling-based method, namely bootstrap calibration (BC), to reduce the impact of the carryover sampling error on the interval estimates of LV scores. BC modifies the quantile of the plug-in posterior, i.e., the posterior distribution of the LV evaluated at the estimated model parameters, to better match the corresponding quantile of the true posterior, i.e., the posterior distribution evaluated at the true model parameters, over repeated sampling of calibration data. Furthermore, to achieve better coverage of the fixed true LV score, we explore the use of BC in conjunction with Jeffreys' prior. We investigate the finite-sample performance of BC via Monte Carlo simulations and apply it to two empirical data examples.

Journal ArticleDOI
TL;DR: A GPT version of the feature comparison model of semantic categorization is applied to computer-mouse trajectories and identifies identifiability, parameter estimation, model testing, a modeling syntax, and the improved precision of GPT estimates are discussed.
Abstract: Multinomial processing tree models assume that discrete cognitive states determine observed response frequencies. Generalized processing tree (GPT) models extend this conceptual framework to continuous variables such as response times, process-tracing measures, or neurophysiological variables. GPT models assume finite-mixture distributions, with weights determined by a processing tree structure, and continuous components modeled by parameterized distributions such as Gaussians with separate or shared parameters across states. We discuss identifiability, parameter estimation, model testing, a modeling syntax, and the improved precision of GPT estimates. Finally, a GPT version of the feature comparison model of semantic categorization is applied to computer-mouse trajectories.

Journal ArticleDOI
TL;DR: In this article, the problem of penalized maximum likelihood (PML) for an exploratory factor analysis (EFA) model is studied and an approximation to PML is proposed.
Abstract: The problem of penalized maximum likelihood (PML) for an exploratory factor analysis (EFA) model is studied in this paper. An EFA model is typically estimated using maximum likelihood and then the estimated loading matrix is rotated to obtain a sparse representation. Penalized maximum likelihood simultaneously fits the EFA model and produces a sparse loading matrix. To overcome some of the computational drawbacks of PML, an approximation to PML is proposed in this paper. It is further applied to an empirical dataset for illustration. A simulation study shows that the approximation naturally produces a sparse loading matrix and more accurately estimates the factor loadings and the covariance matrix, in the sense of having a lower mean squared error than factor rotations, under various conditions.

Journal ArticleDOI
TL;DR: In this paper, a modified version of the asymptotically distribution-free (ADF) test statistic is proposed to deal with the possible ill-conditioning of the involved large-scale covariance matrices.
Abstract: Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62–83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.

Journal ArticleDOI
TL;DR: Results indicate that the new statistics proposed are more optimal effect size estimates of marginal response bias than the SIBTEST family, are competitive with a selection of likelihood-based methods when studying item-level bias, and are the most optimal when studying differential bundle and test bias.
Abstract: This paper proposes a model-based family of detection and quantification statistics to evaluate response bias in item bundles of any size. Compensatory (CDRF) and non-compensatory (NCDRF) response bias measures are proposed, along with their sample realizations and large-sample variability when models are fitted using multiple-group estimation. Based on the underlying connection to item response theory estimation methodology, it is argued that these new statistics provide a powerful and flexible approach to studying response bias for categorical response data over and above methods that have previously appeared in the literature. To evaluate their practical utility, CDRF and NCDRF are compared to the closely related SIBTEST family of statistics and likelihood-based detection methods through a series of Monte Carlo simulations. Results indicate that the new statistics are more optimal effect size estimates of marginal response bias than the SIBTEST family, are competitive with a selection of likelihood-based methods when studying item-level bias, and are the most optimal when studying differential bundle and test bias.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian method for testing the axioms of additive conjoint measurement is proposed. But the method is based on an importance sampling algorithm that performs likelihood-free, approximate Bayesian inference using a synthetic likelihood to overcome the analytical intractability of this testing problem.
Abstract: This article introduces a Bayesian method for testing the axioms of additive conjoint measurement. The method is based on an importance sampling algorithm that performs likelihood-free, approximate Bayesian inference using a synthetic likelihood to overcome the analytical intractability of this testing problem. This new method improves upon previous methods because it provides an omnibus test of the entire hierarchy of cancellation axioms, beyond double cancellation. It does so while accounting for the posterior uncertainty that is inherent in the empirical orderings that are implied by these axioms, together. The new method is illustrated through a test of the cancellation axioms on a classic survey data set, and through the analysis of simulated data.

Journal ArticleDOI
TL;DR: The user-friendly package BayesianPGMM for R is developed to facilitate the adoption of this methodology in practice, and an application to mouse-tracking data for a visual recognition task is described.
Abstract: Piecewise growth mixture models are a flexible and useful class of methods for analyzing segmented trends in individual growth trajectory over time, where the individuals come from a mixture of two or more latent classes. These models allow each segment of the overall developmental process within each class to have a different functional form; examples include two linear phases of growth, or a quadratic phase followed by a linear phase. The changepoint (knot) is the time of transition from one developmental phase (segment) to another. Inferring the location of the changepoint(s) is often of practical interest, along with inference for other model parameters. A random changepoint allows for individual differences in the transition time within each class. The primary objectives of our study are as follows: (1) to develop a PGMM using a Bayesian inference approach that allows the estimation of multiple random changepoints within each class; (2) to develop a procedure to empirically detect the number of random changepoints within each class; and (3) to empirically investigate the bias and precision of the estimation of the model parameters, including the random changepoints, via a simulation study. We have developed the user-friendly package BayesianPGMM for R to facilitate the adoption of this methodology in practice, which is available at https://github.com/lockEF/BayesianPGMM . We describe an application to mouse-tracking data for a visual recognition task.

Journal ArticleDOI
TL;DR: A flexible quantile-based imputation model suitable for distributions defined over singly or doubly bounded intervals, able to deal with skewness, bimodality, and heteroscedasticity and has superior properties as compared to competing approaches, such as log-normal imputation and predictive mean matching.
Abstract: Missing data are a common issue in statistical analyses. Multiple imputation is a technique that has been applied in countless research studies and has a strong theoretical basis. Most of the statistical literature on multiple imputation has focused on unbounded continuous variables, with mostly ad hoc remedies for variables with bounded support. These approaches can be unsatisfactory when applied to bounded variables as they can produce misleading inferences. In this paper, we propose a flexible quantile-based imputation model suitable for distributions defined over singly or doubly bounded intervals. Proper support of the imputed values is ensured by applying a family of transformations with singly or doubly bounded range. Simulation studies demonstrate that our method is able to deal with skewness, bimodality, and heteroscedasticity and has superior properties as compared to competing approaches, such as log-normal imputation and predictive mean matching. We demonstrate the application of the proposed imputation procedure by analysing data on mathematical development scores in children from the Millennium Cohort Study, UK. We also show a specific advantage of our methods using a small psychiatric dataset. Our methods are relevant in a number of fields, including education and psychology.

Journal ArticleDOI
TL;DR: In this article, a Bayes factor with equal priors on all variances is proposed, where the priors are specified automatically using a small share of the information in the sample data.
Abstract: In comparing characteristics of independent populations, researchers frequently expect a certain structure of the population variances. These expectations can be formulated as hypotheses with equality and/or inequality constraints on the variances. In this article, we consider the Bayes factor for testing such (in)equality-constrained hypotheses on variances. Application of Bayes factors requires specification of a prior under every hypothesis to be tested. However, specifying subjective priors for variances based on prior information is a difficult task. We therefore consider so-called automatic or default Bayes factors. These methods avoid the need for the user to specify priors by using information from the sample data. We present three automatic Bayes factors for testing variances. The first is a Bayes factor with equal priors on all variances, where the priors are specified automatically using a small share of the information in the sample data. The second is the fractional Bayes factor, where a fraction of the likelihood is used for automatic prior specification. The third is an adjustment of the fractional Bayes factor such that the parsimony of inequality-constrained hypotheses is properly taken into account. The Bayes factors are evaluated by investigating different properties such as information consistency and large sample consistency. Based on this evaluation, it is concluded that the adjusted fractional Bayes factor is generally recommendable for testing equality- and inequality-constrained hypotheses on variances.