scispace - formally typeset
Search or ask a question

Showing papers in "Psychometrika in 2006"



Journal ArticleDOI
TL;DR: This paper analyzes the theoretical, pragmatic, and substantive factors that have hampered the integration between psychology and psychometrics and a number of promising recent developments are discussed.
Abstract: This paper analyzes the theoretical, pragmatic, and substantive factors that have hampered the integration between psychology and psychometrics. Theoretical factors include the operationalist mode of thinking which is common throughout psychology, the dominance of classical test theory, and the use of “construct validity” as a catch-all category for a range of challenging psychometric problems. Pragmatic factors include the lack of interest in mathematically precise thinking in psychology, inadequate representation of psychometric modeling in major statistics programs, and insufficient mathematical training in the psychological curriculum. Substantive factors relate to the absence of psychological theories that are sufficiently strong to motivate the structure of psychometric models. Following the identification of these problems, a number of promising recent developments are discussed, and suggestions are made to further the integration of psychology and psychometrics.

424 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce a family of goodness-of-fit statistics for testing composite null hypotheses in multidimensional contingency tables, which are quadratic forms in marginal residuals up to order r.
Abstract: We introduce a family of goodness-of-fit statistics for testing composite null hypotheses in multidimensional contingency tables. These statistics are quadratic forms in marginal residuals up to order r. They are asymptotically chi-square under the null hypothesis when parameters are estimated using any asymptotically normal consistent estimator. For a widely used item response model, when r is small and multidimensional tables are sparse, the proposed statistics have accurate empirical Type I errors, unlike Pearson’s X2. For this model in nonsparse situations, the proposed statistics are also more powerful than X2. In addition, the proposed statistics are asymptotically chi-square when applied to subtables, and can be used for a piecewise goodness-of-fit assessment to determine the source of misfit in poorly fitting models.

239 citations


Journal ArticleDOI
TL;DR: In this paper, a hierarchical extension of the model class is proposed, using a multivariate normal distribution of person-level parameters with the mean and covariance matrix to be estimated from the data.
Abstract: Multinomial processing tree models are widely used in many areas of psychology. A hierarchical extension of the model class is proposed, using a multivariate normal distribution of person-level parameters with the mean and covariance matrix to be estimated from the data. The hierarchical model allows one to take variability between persons into account and to assess parameter correlations. The model is estimated using Bayesian methods with weakly informative hyperprior distribution and a Gibbs sampler based on two steps of data augmentation. Estimation, model checks, and hypotheses tests are discussed. The new method is illustrated using a real data set, and its performance is evaluated in a simulation study.

212 citations


Journal ArticleDOI
TL;DR: A simulation study shows that the new procedure is feasible in practice, and that when the latent distribution is not well approximated as normal, two-parameter logistic (2PL) item parameter estimates and expected a posteriori scores (EAPs) can be improved over what they would be with the normal model.
Abstract: The purpose of this paper is to introduce a new method for fitting item response theory models with the latent population distribution estimated from the data using splines. A spline-based density estimation system provides a flexible alternative to existing procedures that use a normal distribution, or a different functional form, for the population distribution. A simulation study shows that the new procedure is feasible in practice, and that when the latent distribution is not well approximated as normal, two-parameter logistic (2PL) item parameter estimates and expected a posteriori scores (EAPs) can be improved over what they would be with the normal model. An example with real data compares the new method and the extant empirical histogram approach.

111 citations


Journal ArticleDOI
TL;DR: In this paper, the real-valued CP objective function does not have a minimum in these cases, but an infimum, and any sequence of CP approximations of which the objective value approaches the infimum will become degenerate.
Abstract: The Candecomp/Parafac (CP) model decomposes a three-way array into a prespecified number R of rank-1 arrays and a residual array, in which the sum of squares of the residual array is minimized. The practical use of CP is sometimes complicated by the occurrence of so-called degenerate solutions, in which some components are highly correlated in all three modes and the elements of these components become arbitrarily large. We consider the real-valued CP model in which p × p × 2 arrays of rank p + 1 or higher are decomposed into p rank-1 arrays and a residual array. It is shown that the CP objective function does not have a minimum in these cases, but an infimum. Moreover, any sequence of CP approximations, of which the objective value approaches the infimum, will become degenerate. This result extends Ten Berge, Kiers, & De Leeuw (1988), who consider a particular 2 × 2 × 2 array of rank 3.

77 citations


Journal ArticleDOI
TL;DR: This work examines the uniqueness of the Candecomp/Parafac and Indscal decompositions, considering the case where two component matrices are randomly sampled from a continuous distribution, and the third component matrix has full column rank.
Abstract: A key feature of the analysis of three-way arrays by Candecomp/Parafac is the essential uniqueness of the trilinear decomposition. We examine the uniqueness of the Candecomp/Parafac and Indscal decompositions. In the latter, the array to be decomposed has symmetric slices. We consider the case where two component matrices are randomly sampled from a continuous distribution, and the third component matrix has full column rank. In this context, we obtain almost sure sufficient uniqueness conditions for the Candecomp/Parafac and Indscal models separately, involving only the order of the three-way array and the number of components in the decomposition. Both uniqueness conditions are closer to necessity than the classical uniqueness condition by Kruskal.

76 citations



Journal ArticleDOI
TL;DR: In this article, the shape of the component loss function (CLF) affects the performance of the criterion it defines, and it is shown that monotone concave CLFs give criteria that are minimized by loadings with perfect simple structure when such loadings exist.
Abstract: Component loss functions (CLFs) similar to those used in orthogonal rotation are introduced to define criteria for oblique rotation in factor analysis. It is shown how the shape of the CLF affects the performance of the criterion it defines. For example, it is shown that monotone concave CLFs give criteria that are minimized by loadings with perfect simple structure when such loadings exist. Moreover, if the CLFs are strictly concave, minimizing must produce perfect simple structure whenever it exists. Examples show that methods defined by concave CLFs perform well much more generally. While it appears important to use a concave CLF, the specific CLF used is less important. For example, the very simple linear CLF gives a rotation method that can easily outperform the most popular oblique rotation methods promax and quartimin and is competitive with the more complex simplimax and geomin methods.

75 citations


Journal ArticleDOI
TL;DR: This review discusses two books on the general topic of complex statistical models for behavioral science data, one of which is an edited volume by Paul de Boeck and Mark Wilson and the other by Anders Skrondal and Sophia Rabe-Hesketh, titled GLVM.
Abstract: 2004 saw the publication of two interesting and useful books on the general topic of complex statistical models for behavioral science data. This review discusses both, comparing and contrasting them. The first book is an edited volume by Paul de Boeck (KU-Leuven) and Mark Wilson (UC-Berkeley), titled Explanatory item response modeling: A generalized linear and nonlinear approach (henceforth EIRM). The second is by Anders Skrondal (Norwegian Institute of Public Health) and Sophia Rabe-Hesketh (UC-Berkeley), titled Generalized latent variable modeling: Multilevel, longitudinal and structural equation models (henceforth GLVM). The general focus of both books is to provide an integrative framework for the disparate set of models existing in psychometrics, econometrics, biometrics, and statistics. As the authors of GLVM note in the Introduction, there is a substantial degree of balkanization of these disciplines, even though the needs of practitioners in them are often similar. This is unfortunate (if, perhaps, unavoidable) because it leads to frequent reinvention of the wheel. For instance, econometricians have developed a number of interesting statistical models for modeling discrete choice behavior— many of which are built explicitly on choice models in psychology—that are in turn very similar to models from educational measurement, signal detection, or bioassay. Technologies developed by the different groups of researchers often prove useful in addressing problems faced in all literatures. However, because developments exist in largely parallel bodies of work, this fact often goes unrecognized and so different groups are left to reinvent developments that may well be old somewhere else, possibly even superseded by better techniques. Similar things could be said about the literatures on multilevel models or, indeed, any number of other areas. GLVM has as its explicit focus the unification of these literatures. EIRM is less ambitious but still has as its goal the synthesis of the truly vast number of models falling under the banner of item response theory in terms of the generalized linear and nonlinear mixed models.

73 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a methodology for handling omitted variables in a multilevel modeling framework, where a battery of statistical tools are developed to test various forms of model misspecification as well as to obtain estimators that are robust to the presence of omitted variables.
Abstract: Statistical methodology for handling omitted variables is presented in a multilevel modeling framework. In many nonexperimental studies, the analyst may not have access to all requisite variables, and this omission may lead to biased estimates of model parameters. By exploiting the hierarchical nature of multilevel data, a battery of statistical tools are developed to test various forms of model misspecification as well as to obtain estimators that are robust to the presence of omitted variables. The methodology allows for tests of omitted effects at single and multiple levels. The paper also introduces intermediate-level tests; these are tests for omitted effects at a single level, regardless of the presence of omitted effects at a higher level. A simulation study shows, not surprisingly, that the omission of variables yields bias in both regression coefficients and variance components; it also suggests that omitted effects at lower levels may cause more severe bias than at higher levels. Important factors resulting in bias were found to be the level of an omitted variable, its effect size, and sample size. A real data study illustrates that an omitted variable at one level may yield biased estimators at any level and, in this study, one cannot obtain reliable estimates for school-level variables when omitted child effects exist. However, robust estimators may provide unbiased estimates for effects of interest even when the efficient estimators fail, and the one-degree-of-freedom test helps one to understand where the problem is located. It is argued that multilevel data typically contain rich information to deal with omitted variables, offering yet another appealing reason for the use of multilevel models in the social sciences.

Journal ArticleDOI
TL;DR: This work discusses new modeling avenues that can account for such seemingly inconsistent choice behavior and concludes by emphasizing the interdisciplinary frontiers in the study of choicebehavior and the resulting challenges for psychometricians.
Abstract: Current psychometric models of choice behavior are strongly influenced by Thurstone’s (1927, 1931) experimental and statistical work on measuring and scaling preferences. Aided by advances in computational techniques, choice models can now accommodate a wide range of different data types and sources of preference variability among respondents induced by such diverse factors as person-specific choice sets or different functional forms for the underlying utility representations. At the same time, these models are increasingly challenged by behavioral work demonstrating the prevalence of choice behavior that is not consistent with the underlying assumptions of these models. I discuss new modeling avenues that can account for such seemingly inconsistent choice behavior and conclude by emphasizing the interdisciplinary frontiers in the study of choice behavior and the resulting challenges for psychometricians.

Journal ArticleDOI
TL;DR: In this article, an extension of multiple correspondence analysis is proposed that takes into account cluster-level heterogeneity in respondents' preferences/choices, which is used for uncovering a low-dimensional space of multivariate categorical variables and identifying relatively homogeneous clusters of respondents.
Abstract: An extension of multiple correspondence analysis is proposed that takes into account cluster-level heterogeneity in respondents’ preferences/choices. The method involves combining multiple correspondence analysis and k-means in a unified framework. The former is used for uncovering a low-dimensional space of multivariate categorical variables while the latter is used for identifying relatively homogeneous clusters of respondents. The proposed method offers an integrated graphical display that provides information on cluster-based structures inherent in multivariate categorical data as well as the interdependencies among the data. An empirical application is presented which demonstrates the usefulness of the proposed method and how it compares to several extant approaches.

Journal ArticleDOI
TL;DR: For the confirmatory factor model, a series of inequalities with respect to the mean square error (MSE) of three main factor score predictors are given in this paper, where a necessary and sufficient condition for mean square convergence of predictors is divergence of the smallest eigenvalue of Γp or, equivalently, divergence of signal-to-noise.
Abstract: For the confirmatory factor model a series of inequalities is given with respect to the mean square error (MSE) of three main factor score predictors. The eigenvalues of these MSE matrices are a monotonic function of the eigenvalues of the matrix Γp = Φ1/2Λ′pΨp−1ΛpΦ1/2. This matrix increases with the number of observable variables p. A necessary and sufficient condition for mean square convergence of predictors is divergence of the smallest eigenvalue of Γp or, equivalently, divergence of signal-to-noise (Schneeweiss & Mathes, 1995). The same condition is necessary and sufficient for convergence to zero of the positive definite MSE differences of factor predictors, convergence to zero of the distance between factor predictors, and convergence to the unit value of the relative efficiencies of predictors. Various illustrations and examples of the convergence are given as well as explicit recommendations on the problem of choosing between the three main factor score predictors.

Journal ArticleDOI

Journal ArticleDOI
TL;DR: In this paper, a Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data, which is specified by a logistic regression model, and results obtained with respect to different missing data models, and different prior inputs are compared via simulation studies.
Abstract: A Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data. The nonignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm that combines the Gibbs sampler and the Metropolis–Hastings algorithm is used to produce the joint Bayesian estimates of structural parameters, latent variables, parameters in the nonignorable missing model, as well as their standard errors estimates. A goodness-of-fit statistic for assessing the plausibility of the posited nonlinear structural equation model is introduced, and a procedure for computing the Bayes factor for model comparison is developed via path sampling. Results obtained with respect to different missing data models, and different prior inputs are compared via simulation studies. In particular, it is shown that in the presence of nonignorable missing data, results obtained by the proposed method with a nonignorable missing data model are significantly better than those that are obtained under the missing at random assumption. A real example is presented to illustrate the newly developed Bayesian methodologies.

Journal ArticleDOI
TL;DR: As what will be given by this branch and bound applications in combinatorial data analysis, how can you bargain with the thing that has many benefits for you?
Abstract: Bargaining with reading habit is no need. Reading is not kind of something sold that you can take or not. It is a thing that will change your life to life better. It is the thing that will give you many things around the world and this universe, in the real world and here after. As what will be given by this branch and bound applications in combinatorial data analysis, how can you bargain with the thing that has many benefits for you?

Journal ArticleDOI
TL;DR: A new branch-and-bound algorithm for minimizing WCSS is presented that provides optimal solutions for problems with up to 240 objects and eight well-separated clusters and was successfully applied to three empirical data sets from the classification literature.
Abstract: Minimization of the within-cluster sums of squares (WCSS) is one of the most important optimization criteria in cluster analysis. Although cluster analysis modules in commercial software packages typically use heuristic methods for this criterion, optimal approaches can be computationally feasible for problems of modest size. This paper presents a new branch-and-bound algorithm for minimizing WCSS. Algorithmic enhancements include an effective reordering of objects and a repetitive solution approach that precludes the need for splitting the data set, while maintaining strong bounds throughout the solution process. The new algorithm provided optimal solutions for problems with up to 240 objects and eight well-separated clusters. Poorly separated problems with no inherent cluster structure were optimally solved for up to 60 objects and six clusters. The repetitive branch-and-bound algorithm was also successfully applied to three empirical data sets from the classification literature.

Journal ArticleDOI
TL;DR: Two types of component models for LR1 and LR2 fuzzy data are proposed and the estimation of the parameters of these models is based on a Least Squares approach, exploiting an appropriately introduced distance measure for fuzzy data.
Abstract: The fuzzy perspective in statistical analysis is first illustrated with reference to the “Informational Paradigm–allowing us to deal with different types of uncertainties related to the various informational ingredients (data, model, assumptions). The fuzzy empirical data are then introduced, referring to J LR fuzzy variables as observed on I observation units. Each observation is characterized by its center and its left and right spreads (LR1 fuzzy number) or by its left and right “centers–and its left and right spreads (LR2 fuzzy number). Two types of component models for LR1 and LR2 fuzzy data are proposed. The estimation of the parameters of these models is based on a Least Squares approach, exploiting an appropriately introduced distance measure for fuzzy data. A simulation study is carried out in order to assess the efficacy of the suggested models as compared with traditional Principal Component Analysis on the centers and with existing methods for fuzzy and interval valued data. An application to real fuzzy data is finally performed.

Journal ArticleDOI
TL;DR: In this article, it was shown that the overall discrepancy between the contingency table and the model for these estimators can be decomposed into a distributional discrepancy and a structural discrepancy, and that relatively small samples are needed for parameter estimates, standard errors, and structural tests.
Abstract: Discretized multivariate normal structural models are often estimated using multistage estimation procedures. The asymptotic properties of parameter estimates, standard errors, and tests of structural restrictions on thresholds and polychoric correlations are well known. It was not clear how to assess the overall discrepancy between the contingency table and the model for these estimators. It is shown that the overall discrepancy can be decomposed into a distributional discrepancy and a structural discrepancy. A test of the overall model specification is proposed, as well as a test of the distributional specification (i.e., discretized multivariate normality). Also, the small sample performance of overall, distributional, and structural tests, as well as of parameter estimates and standard errors is investigated under conditions of correct model specification and also under mild structural and/or distributional misspecification. It is found that relatively small samples are needed for parameter estimates, standard errors, and structural tests. Larger samples are needed for the distributional and overall tests. Furthermore, parameter estimates, standard errors, and structural tests are surprisingly robust to distributional misspecification.

Journal ArticleDOI
TL;DR: In this paper, the authors developed a maximum likelihood approach that is robust to outliers and symmetric heavy-tailed distributions for analyzing nonlinear structural equation models with ignorable missing data, where the analytic strategy is to incorporate a general class of distributions into the latent variables and the error measurements in the measurement and structural equations.
Abstract: By means of more than a dozen user friendly packages, structural equation models (SEMs) are widely used in behavioral, education, social, and psychological research. As the underlying theory and methods in these packages are vulnerable to outliers and distributions with longer-than-normal tails, a fundamental problem in the field is the development of robust methods to reduce the influence of outliers and the distributional deviation in the analysis. In this paper we develop a maximum likelihood (ML) approach that is robust to outliers and symmetrically heavy-tailed distributions for analyzing nonlinear SEMs with ignorable missing data. The analytic strategy is to incorporate a general class of distributions into the latent variables and the error measurements in the measurement and structural equations. A Monte Carlo EM (MCEM) algorithm is constructed to obtain the ML estimates, and a path sampling procedure is implemented to compute the observed-data log-likelihood and then the Bayesian information criterion for model comparison. The proposed methodologies are illustrated with simulation studies and an example.

Journal ArticleDOI
TL;DR: Taxicab correspondence analysis is based on the taxicab singular value decomposition of a contingency table, and it shares some similar properties with correspondence analysis.
Abstract: Taxicab correspondence analysis is based on the taxicab singular value decomposition of a contingency table, and it shares some similar properties with correspondence analysis. It is more robust than the ordinary correspondence analysis, because it gives uniform weights to all the points. The visual map constructed by taxicab correspondence analysis has a larger sweep and clearer perspective than the map obtained by correspondence analysis. Two examples are provided.

Journal ArticleDOI
TL;DR: In this paper, the authors compared the performance of the likelihood ratio and Wald statistics with the Hotelling T2 statistics for mean comparison with latent variables. And they found that the noncentrality parameter corresponding to the T2 statistic can be much greater than those corresponding to likelihood ratio, which is different from those provided in the literature.
Abstract: Mean comparisons are of great importance in the application of statistics. Procedures for mean comparison with manifest variables have been well studied. However, few rigorous studies have been conducted on mean comparisons with latent variables, although the methodology has been widely used and documented. This paper studies the commonly used statistics in latent variable mean modeling and compares them with parallel manifest variable statistics. Our results indicate that, under certain conditions, the likelihood ratio and Wald statistics used for latent mean comparisons do not always have greater power than the Hotelling T2 statistics used for manifest mean comparisons. The noncentrality parameter corresponding to the T2 statistic can be much greater than those corresponding to the likelihood ratio and Wald statistics, which we find to be different from those provided in the literature. Under a fixed alternative hypothesis, our results also indicate that the likelihood ratio statistic can be stochastically much greater than the corresponding Wald statistic. The robustness property of each statistic is also explored when the model is misspecified or when data are nonnormally distributed. Recommendations and advice are provided for the use of each statistic.

Journal ArticleDOI
TL;DR: It is argued that the KL constant is more statistically appropriate for use in IRT, and yields an approximation that is an improvement in fit of the tails of the distribution as compared to the minimax constant.
Abstract: A rationale is proposed for approximating the normal distribution with a logistic distribution using a scaling constant based on minimizing the Kullback-Leibler (KL) information, that is, the expected amount of information available in a sample to distinguish between two competing distributions using a likelihood ratio (LR) test, assuming one of them is true. The new constant 1.749, computed assuming the normal distribution is true, yields an approximation that is an improvement in fit of the tails of the distribution as compared to the minimax constant of 1.702, widely used in item response theory (IRT). The minimax constant is by definition marginally better in its overall maximal error. It is argued that the KL constant is more statistically appropriate for use in IRT.

Journal ArticleDOI
TL;DR: It is found that the proposed method outperforms the GENCOM algorithm both with respect to model fit and recovery of the true structure.
Abstract: A method is presented for generalized canonical correlation analysis of two or more matrices with missing rows. The method is a combination of Carroll's (1968) method and the missing data approach of the OVERALS technique (Van der Burg, 1988). In a simulation study we assess the performance of the method and compare it to an existing procedure called GENCOM, proposed by Green and Carroll (1988). We find that the proposed method outperforms the GENCOM algorithm both with respect to model fit and recovery of the true structure.

Journal ArticleDOI
TL;DR: A principled way of imposing a metric representing dissimilarities on any discrete set of stimuli, given the probabilities with which they are discriminated from each other by a perceiving system, such as an organism, person, group of experts, neuronal structure, technical device, or even an abstract computational algorithm is described.
Abstract: We describe a principled way of imposing a metric representing dissimilarities on any discrete set of stimuli (symbols, handwritings, consumer products, X-ray films, etc.), given the probabilities with which they are discriminated from each other by a perceiving system, such as an organism, person, group of experts, neuronal structure, technical device, or even an abstract computational algorithm. In this procedure one does not have to assume that discrimination probabilities are monotonically related to distances, or that the distances belong to a predefined class of metrics, such as Minkowski. Discrimination probabilities do not have to be symmetric, the probability of discriminating an object from itself need not be a constant, and discrimination probabilities are allowed to be 0's and 1's. The only requirement that has to be satisfied is Regular Minimality, a principle we consider the defining property of discrimination: for ordered stimulus pairs (a,b), b is least frequently discriminated from a if and only if a is least frequently discriminated from b. Regular Minimality generalizes one of the weak consequences of the assumption that discrimination probabilities are monotonically related to distances: the probability of discriminating a from a should be less than that of discriminating a from any other object. This special form of Regular Minimality also underlies such traditional analyses of discrimination probabilities as Multidimensional Scaling and Cluster Analysis.

Journal ArticleDOI
TL;DR: The study shows that the procedures based on Shannon entropy and Kullback–Leibler information perform similarly in terms of root mean square error, and perform much better than random item selection and shows that item exposure rates need to be addressed for these methods to be practical.
Abstract: Nonparametric item response models have been developed as alternatives to the relatively inflexible parametric item response models. An open question is whether it is possible and practical to administer computerized adaptive testing with nonparametric models. This paper explores the possibility of computerized adaptive testing when using nonparametric item response models. A central issue is that the derivatives of item characteristic Curves may not be estimated well, which eliminates the availability of the standard maximum Fisher information criterion. As alternatives, procedures based on Shannon entropy and Kullback–Leibler information are proposed. For a long test, these procedures, which do not require the derivatives of the item characteristic eurves, become equivalent to the maximum Fisher information criterion. A simulation study is conducted to study the behavior of these two procedures, compared with random item selection. The study shows that the procedures based on Shannon entropy and Kullback–Leibler information perform similarly in terms of root mean square error, and perform much better than random item selection. The study also shows that item exposure rates need to be addressed for these methods to be practical.

Journal ArticleDOI
TL;DR: In this paper, a propensity score weighted M estimator (PWME) was proposed to compare latent variable means with covariates adjusted using propensity scores, which was not feasible by previous methods.
Abstract: In the behavioral and social sciences, quasi-experimental and observational studies are used due to the difficulty achieving a random assignment. However, the estimation of differences between groups in observational studies frequently suffers from bias due to differences in the distributions of covariates. To estimate average treatment effects when the treatment variable is binary, Rosenbaum and Rubin (1983a) proposed adjustment methods for pretreatment variables using the propensity score. However, these studies were interested only in estimating the average causal effect and/or marginal means. In the behavioral and social sciences, a general estimation method is required to estimate parameters in multiple group structural equation modeling where the differences of covariates are adjusted. We show that a Horvitz-Thompson-type estimator, propensity score weighted M estimator (PWME) is consistent, even when we use estimated propensity scores, and the asymptotic variance of the PWME is shown to be less than that with true propensity scores. Furthermore, we show that the asymptotic distribution of the propensity score weighted statistic under a null hypothesis is a weighted sum of independent χ 1 2 variables. We show the method can compare latent variable means with covariates adjusted using propensity scores, which was not feasible by previous methods. We also apply the proposed method for correlated longitudinal binary responses with informative dropout using data from the Longitudinal Study of Aging (LSOA). The results of a simulation study indicate that the proposed estimation method is more robust than the maximum likelihood (ML) estimation method, in that PWME does not require the knowledge of the relationships among dependent variables and covariates.

Journal ArticleDOI
TL;DR: In this article, a semiparametric Bayesian Thurstonian model is proposed for analyzing repeated choice decisions involving multinomial, multivariate binary or multivariate ordinal data.
Abstract: We develop semiparametric Bayesian Thurstonian models for analyzing repeated choice decisions involving multinomial, multivariate binary or multivariate ordinal data. Our modeling framework has multiple components that together yield considerable flexibility in modeling preference utilities, cross-sectional heterogeneity and parameter-driven dynamics. Each component of our model is specified semiparametrically using Dirichlet process (DP) priors. The utility (latent variable) component of our model allows the alternative-specific utility errors to semiparametrically deviate from a normal distribution. This generates a robust alternative to popular Thurstonian specifications that are based on underlying normally distributed latent variables. Our second component focuses on flexibly modeling cross-sectional heterogeneity. The semiparametric specification allows the heterogeneity distribution to mimic either a finite mixture distribution or a continuous distribution such as the normal, whichever is supported by the data. Thus, special features such as multimodality can be readily incorporated without the need to overtly search for the best heterogeneity specification across a series of models. Finally, we allow for parameter-driven dynamics using a semiparametric state-space approach. This specification adds to the literature on robust Kalman filters. The resulting framework is very general and integrates divergent strands of the literatures on flexible choice models, Bayesian nonparametrics and robust time series specifications. Given this generality, we show how several existing Thurstonian models can be obtained as special forms of our model. We describe Markov chain Monte Carlo methods for the inference of model parameters, report results from two simulation studies and apply the model to consumer choice data from a frequently purchased product category. The results from our simulations and application highlight the benefits of using our semiparametric approach.

Journal ArticleDOI
TL;DR: In this article, a unified theorem is derived for the regression model with normally distributed explanatory variables and the general results are employed to provide useful expressions for the distributions of simple, multiple, and partial-multiple correlation coefficients.
Abstract: This paper considers the problem of analysis of correlation coefficients from a multivariate normal population. A unified theorem is derived for the regression model with normally distributed explanatory variables and the general results are employed to provide useful expressions for the distributions of simple, multiple, and partial-multiple correlation coefficients. The inversion principle and monotonicity property of the proposed formulations are used to describe alternative approaches to the exact interval estimation, power calculation, and sample size determination for correlation coefficients.