scispace - formally typeset
Search or ask a question

Showing papers on "Latent variable model published in 2002"



Journal ArticleDOI
TL;DR: Different applications of latent variable applications are discussed in a unifying framework that brings together in one general model such different analysis types as factor models, growth curve models, multilevel models, latent class models and discrete-time survival models.
Abstract: This article gives an overview of statistical analysis with latent variables. Using traditional structural equation modeling as a starting point, it shows how the idea of latent variables captures a wide variety of statistical concepts, including random effects, missing data, sources of variation in hierarchical data, finite mixtures. latent classes, and clusters. These latent variable applications go beyond the traditional latent variable useage in psychometrics with its focus on measurement error and hypothetical constructs measured by multiple indicators. The article argues for the value of integrating statistical and psychometric modeling ideas. Different applications are discussed in a unifying framework that brings together in one general model such different analysis types as factor models, growth curve models, multilevel models, latent class models and discrete-time survival models. Several possible combinations and extensions of these models are made clear due to the unifying framework.

978 citations


Journal ArticleDOI
TL;DR: The role of latent variables in multiple regression, probit and logistic regression, factor analysis, latent curve models, item response theory, latent class analysis, and structural equation models is reviewed.
Abstract: ▪ Abstract The paper discusses the use of latent variables in psychology and social science research. Local independence, expected value true scores, and nondeterministic functions of observed variables are three types of definitions for latent variables. These definitions are reviewed and an alternative “sample realizations” definition is presented. Another section briefly describes identification, latent variable indeterminancy, and other properties common to models with latent variables. The paper then reviews the role of latent variables in multiple regression, probit and logistic regression, factor analysis, latent curve models, item response theory, latent class analysis, and structural equation models. Though these application areas are diverse, the paper highlights the similarities as well as the differences in the manner in which the latent variables are defined and used. It concludes with an evaluation of the different definitions of latent variables and their properties.

883 citations


01 Jan 2002
TL;DR: The authors compare these two approaches using data simulated from a setting where true group membership is known to indicate that LC substantially outperforms the K-means technique.
Abstract: Recent developments in latent class (LC) analysis and associated software to include continuous variables offer a model-based alternative to more traditional clustering approaches such as K-means. In this paper, the authors compare these two approaches using data simulated from a setting where true group membership is known. The authors choose a setting favourable to K-means by simulating data according to the assumptions made in both discriminant analysis (DISC) and K-means clustering. Since the information on true group membership is used in DISC but not in clustering approaches in general, the authors use the results obtained from DISC as a gold standard in determining an upper bound on the best possible outcome that might be expected from a clustering technique. The results indicate that LC substantially outperforms the K-means technique. A truly surprising result is that the LC performance is so good that it is virtually indistinguishable from the performance of DISC.

700 citations


Book ChapterDOI
01 Jan 2002
TL;DR: In this article, the authors consider building structural equation models with latent (unobservable) variables with measurement errors, and propose a structural equation model with latent variables and measurement errors.
Abstract: In this chapter, we consider building structural equation models with latent (unobservable) variables with measurement errors.

231 citations


Proceedings ArticleDOI
28 Jul 2002
TL;DR: In this article, a search-based algorithm for learning hierarchical latent class models from data is proposed, which is evaluated using both synthetic and real-world data and evaluated on both real and synthetic data sets.
Abstract: Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a search-based algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and real-world data.

227 citations



Proceedings Article
01 Jan 2002
TL;DR: This paper proposes an alternative view in which the variability of cortical sensory neurons is related to the uncertainty, about world parameters, which is inherent in the sensory stimulus, and provides simulations suggesting how some aspects of response variability might be understood in this framework.
Abstract: The responses of cortical sensory neurons are notoriously variable, with the number of spikes evoked by identical stimuli varying significantly from trial to trial. This variability is most often interpreted as 'noise', purely detrimental to the sensory system. In this paper, we propose an alternative view in which the variability is related to the uncertainty, about world parameters, which is inherent in the sensory stimulus. Specifically, the responses of a population of neurons are interpreted as stochastic samples from the posterior distribution in a latent variable model. In addition to giving theoretical arguments supporting such a representational scheme, we provide simulations suggesting how some aspects of response variability might be understood in this framework.

168 citations



Journal ArticleDOI
Tong Li1
TL;DR: In this paper, a nonparametric estimation of the conditional density of the latent variables given the measurements using the identification results at the first stage, and at the second stage, a semiparametric nonlinear least-squares estimator is proposed.

129 citations


Journal ArticleDOI
TL;DR: This article used a latent variable model to estimate the effect of disability status on the labor force participation of older men in Canada, with data from the National Population Health Survey (NPHS).

Journal ArticleDOI
TL;DR: In this paper, Latent variable interaction modeling with continuous observed variables is presented using two different approaches: LISREL 8.30 and PRELIS2 and SIMPLIS programs.
Abstract: Latent variable interaction modeling with continuous observed variables is presented using 2 different approaches. The 1st approach analyzes data using a LISREL 8.30 program where the latent interaction variable is defined by multiplying pairs of observed variables. The 2nd approach analyzes data using PRELIS2 and SIMPLIS programs where the latent interaction variable is defined by multiplying the latent variable scores of the exogeneous latent independent variables. The programs used to create the multivariate normal observed variables and conduct the analyses for the 2 different approaches are given in the appendixes. The product indicant and latent variable score approach produced similar gamma coefficients in their hypothesized models but differed in their standard errors for the gamma coefficients. The latent variable score approach holds the promise of being easier to implement and can be applied to more complex latent variable interaction models.

Journal ArticleDOI
TL;DR: In this article, the authors analyzed changes in treatment practices in outpatient methadone treatment units from a national panel study and found that a substantial percentage of units did not respond during the follow-up.
Abstract: This article analyzes changes in treatment practices in outpatient methadone treatment units from a national panel study. The analysis of this dataset is challenging due to several difficulties, including multiple longitudinal outcomes, nonignorable nonresponses, and missing covariates. Specifically, the data included several variables that measure the effectiveness of methadone treatment practices for each unit. A substantial percentage of units (33%%) did not respond during the follow-up. These dropout units tended to be units with less effective treatment practices; the dropout mechanism thus may be nonignorable. Finally, the time-varying covariates for the units that dropped out were missing at the time of dropout. A valid analysis hence needs to address these three issues simultaneously. Our approach assumes that the observed outcomes measure a latent variable (e.g., treatment practice effectiveness) with error. We model the relationship between this latent variable and covariates using a linear mixe...

Journal ArticleDOI
TL;DR: Results indicated a larger amount of individual variation for average ability level than rate of change for all three traits: Block Design, Thurstone's Picture Memory, and Symbol Digit, which supports theories of increasing environmental influences with age.
Abstract: The use of latent growth models to examine influence on individual differences on ability level versus rate of change were examined for measures of fluid ability, memory, and perceptual speed in a sample of twins from the Swedish Adoption/Twin Study of Aging. Results indicated a larger amount of individual variation for average ability level (i.e., intercept) than rate of change (i.e., slope) for all three traits: Block Design, Thurstone's Picture Memory, and Symbol Digit. Generally, genetic influences were of greater importance to individual variation in ability level whereas variation for rate of change exhibited a larger environmental component. These findings support theories of increasing environmental influences with age. When genetic and environmental sources of covariation between educational attainment and pulmonary function with latent growth parameters were considered, the sources of covariation between the latent cognitive growth model parameters (i.e., intercept and slope) and both covariates were primarily genetic for ability level (intercepts) but environmental for rate of change (slopes). Such findings suggest that the forces important to timing or entry into cognitive decline may reflect stochastic processes or external environmental factors, primarily nonshared, that may differentially hasten cognitive decline in twins. These same forces may overlap with those that influence higher or lower educational attainment or those leading to better or worse pulmonary functioning.

MonographDOI
01 Jan 2002
TL;DR: This paper presents a meta-modelling framework for estimating confidence levels in the modeled environments of Hierarchically Related Nonparametric IRT Models and Practical Data Analysis Methods.
Abstract: Contents: Preface. D.J. Bartholomew, Old and New Approaches to Latent Variable Modelling. I. Moustaki, C. O'Muircheartaigh, Locating "Don't Know," "No Answer" and Middle Alternatives on an Attitude Scale: A Latent Variable Approach. L.A. van der Ark, B.T. Hemker, K. Sijtsma, Hierarchically Related Nonparametric IRT Models, and Practical Data Analysis Methods. P. Tzamourani, M. Knott, Fully Semiparametric Estimation of the Two-Parameter Latent Trait Model for Binary Data. P. Rivera, A. Satorra, Analyzing Group Differences: A Comparison of SEM Approaches. R.D. Wiggins, A. Sacker, Strategies for Handling Missing Data in SEM: A User's Perspective. T. Raykov, S. Penev, Exploring Structural Equation Model Misspecifications Via Latent Individual Residuals. J-Q. Shi, S-Y. Lee, B-C. Wei, On Confidence Regions of SEM Models. P. Filzmoser, Robust Factor Analysis: Methods and Applications. M. Croon, Using Predicted Latent Scores in General Latent Structure Models. H. Goldstein, W. Browne, Multilevel Factor Analysis Modelling Using Markov Chain Monte Carlo Estimation. J-P. Fox, C.A.W. Glas, Modelling Measurement Error in Structural Multilevel Models.

Posted Content
TL;DR: In this paper, the authors presented a new way of estimating latent total consumption in a household that may improve the accuracy of studies into permanent income and consumption inequality, by giving accurate indicators more weight, and align weights to minimize variance.
Abstract: This article presents a new way of estimating latent total consumption in a household that may improve the accuracy of studies into permanent income and consumption inequality. While the frequently used total purchase expenditure in a household is an unbiased estimator of latent total household consumption, it is inoptimal since total purchase expenditure is an un-weighted sum of expenditures that contain measurement errors. We derive a competing estimator, unbiased and variance minimizing, based on a latent variable model. From estimates of error term variance among consumption indicators, we give accurate indicators more weight, and align weights to minimize variance. An advantage of the suggested estimator is that it allows both expenditure and non-expenditure indicators of latent total consumption. We demonstrate empirically how the minimum-variance estimator reduces variance, and find that on Norwegian expenditure data from 1993 the reduction is 44 per cent.

Journal ArticleDOI
TL;DR: In this article, a structural modeling approach based on latent growth curve model specifications is proposed for use with dyadic data, which allows researchers to test more sophisticated causal models, incorporate latent variables, and estimate more complex error structures than is currently possible using hierarchical linear modeling or multilevel structural equation models.
Abstract: Dyadic data involving couples, twins, or parent-child pairs are common in the social sciences, but available statistical approaches are limited in the types of hypotheses that can be tested with dyadic data. A novel structural modeling approach, based on latent growth curve model specifications, is proposed for use with dyadic data. The approach allows researchers to test more sophisticated causal models, incorporate latent variables, and estimate more complex error structures than is currently possible using hierarchical linear modeling or multilevel structural equation models. A brief overview of multilevel regression and latent growth curve models is given, and the equivalence of the statistical model for nested and longitudinal data is explained. Possible expansion of the strategy for application with small groups and with unbalanced data is briefly explored.

Journal ArticleDOI
TL;DR: In this paper, a general model that incorporates spatial correlation and potential lagged or shifted dependencies is proposed to represent subject matter theory or serve as a practical exploratory model for modeling underlying structure and representing interrelationships in terms of a smaller number of variables.
Abstract: Multivariate spatial or geo-referenced data arise naturally in such disciplines as ecology, agriculture, geology, and atmospheric sciences. In practice, interest often lies in modeling underlying structure and representing interrelationships in terms of a smaller number of variables. For such situations, statistical analysis using a latent variable model is proposed. We present a general model that incorporates spatial correlation and potential lagged or shifted dependencies and that can represent subject matter theory or serve as a practical exploratory model. Procedures for model fitting, parameter estimation, inferences, and latent variable prediction are developed without restrictive assumptions on distribution and covariance function forms. The properties and usefulness of the proposed approaches are assessed by asymptotic theory and an extensive simulation study. An example from precision agriculture is also presented.

Book ChapterDOI
TL;DR: A new probability model, 'asymmetric Gaussian(AG),' which can capture spatially asymmetric distributions is proposed and extended to mixture of AGs and it is shown that the AGs outperform Gaussian models.
Abstract: In this paper, we propose a new probability model, 'asymmetric Gaussian(AG),' which can capture spatially asymmetric distributions. It is also extended to mixture of AGs. The values of its parameters can be determined by Expectation-Conditional Maximization algorithm. We apply the AGs to a pattern classification problem and show that the AGs outperform Gaussian models.

Journal ArticleDOI
TL;DR: In this article, two ways of testing the assumption that items measure the same unidimensional latent construct against specific multidimensional alternatives are discussed, and a study on occupational health is used to motivate and illustrate the methods.
Abstract: A fundamental assumption of most IRT models is that items measure the same unidimensional latent construct. For the polytomous Rasch model two ways of testing this assumption against specific multidimensional alternatives are discussed. One, a marginal approach assuming a multidimensional parametric latent variable distribution, and, two, a conditional approach with no distributional assumptions about the latent variable. The second approach generalizes the Martin-Lof test for the dichotomous Rasch model in two ways: to polytomous items and to a test against an alternative that may have more than two dimensions. A study on occupational health is used to motivate and illustrate the methods.

Journal ArticleDOI
TL;DR: This article illustrates the use of the latent class model to identify classes of individuals and to assess the psychometric reliability of categorical items and methods for estimating the reliability of individual items as well as sets of items are presented.

Journal ArticleDOI
TL;DR: Utilizing simulated data, it is demonstrated that even though a model may be considered to have a “good fit” based on conventional criteria, data interpretation may be misleading or erroneous if precautions are not taken when specifying residual covariances.
Abstract: Latent variable models assess the common variance across multiple indicators of a specific construct and are often used when measurement error may bias parameter estimates However, care must be taken when interpreting the meaning of the latent construct when using item indicators that come from different measurement domains (eg, self-report and biochemical indicators of smoking) Utilizing simulated data, we demonstrate that even though a model may be considered to have a “good fit” based on conventional criteria, data interpretation may be misleading or erroneous if precautions are not taken when specifying residual covariances These findings have important implications for health-related research Whenever different kinds of data are used to define latent variables in a health domain, exactly what items are used, and what biases may be present can affect, sometimes dramatically, (a) the definition of the latent variables and (b) the effects of the latent variables on other variables of interest

01 Jan 2002
TL;DR: According to Kaufman and Rousseeuw (1990), cluster analysis is "the classification of similar objects into groups, where the number of groups, as well as their forms are unknown".
Abstract: According to Kaufman and Rousseeuw (1990), cluster analysis is "the classification of similar objects into groups, where the number of groups, as well as their forms are unknown". This same definition could be used for exploratory Latent Class (LC) analysis where a K-class latent variable is used to explain the associations among a set of observed variables. Each latent class, like each cluster, groups together similar cases.

Journal ArticleDOI
TL;DR: A class of locally dependent latent trait models based on a family of conditional distributions that describes joint multiple item responses as a function of student latent trait, not assuming conditional independence is proposed.
Abstract: In this paper, we propose a class of locally dependent latent trait models for responses to psychological and educational tests. Typically, item response models treat an individual's multiple response to stimuli as conditional independent given the individual's latent trait. In this paper, instead the focus is on models based on a family of conditional distributions, or kernel, that describes joint multiple item responses as a function of student latent trait, not assuming conditional independence. Specifically, we examine a hybrid kernel which comprises a component for one-way item response functions and a component for conditional associations between items given latent traits. The class of models allows the extension of item response theory to cover some new and innovative applications in psychological and educational research. An EM algorithm for marginal maximum likelihood of the hybrid kernel model is proposed. Furthermore, we delineate the relationship of the class of locally dependent models and the log-linear model by revisiting the Dutch identity (Holland, 1990).

Book
01 Jan 2002
TL;DR: The author examines the relationship between Multilevel Analysis and Group Comparison, and the Basics of the Linear MultileVEL Model, which aims to clarify the role of variables in this analysis.
Abstract: Preface. 1. Introduction. 1.1 Rationale for Statistical Comparison. 1.2 Comparative Research in the Social Sciences. 1.3 Focus of the Book. 1.4 Outline of the Book. 2. Statistical Foundation for Comparison. 2.1 A System for Statistical Comparison. 2.2 Test Statistics. 2.3 What to Compare? 3. Comparison in Linear Models. 3.1 Introduction. 3.2 An Example. 3.3 Some Preliminary Considerations. 3.4 The Linear Model. 3.5 Comparing Two Means. 3.6 ANOVA. 3.7 Multiple Comparison Methods. 3.8 ANCOVA. 3.9 Multiple Linear Regression. 3.10 Regression Decomposition. 3.11 Which Linear Method to Use? 4. Nonparametric Comparison. 4.1 Nonparametric Tests. 4.2 Resampling Methods. 4.3 Relative Distribution Methods. 5. Comparison of Rates. 5.1 The Data. 5.2 Standardization. 5.3 Decomposition. 6. Comparison in Generalized Linear Models. 6.1 Introduction. 6.2 Comparing Generalized Linear Models. 6.3 A Logit Model Example. 6.4 A Hazard Rate Model Example. 6.A Data Used in Section 6.4. 7. Additional Topics of Comparison in Generalized Linear Models. 7.1 Introduction. 7.2 GLM for Matched Case--Control Studies. 7.3 Dispersion Heterogeneity. 7.4 Bayesian Generalized Linear Models. 7.A The Data for the n : m Design. 8. Comparison in Structural Equation Modeling. 8.1 Introduction. 8.2 Statistical Background. 8.3 Mean and Covariance Structures. 8.4 Group Comparison in SEM. 8.5 An Example. 8.A Examples of Computer Program Listings. 9. Comparison with Categorical Latent Variables. 9.1 Introduction. 9.2 Latent Class Models. 9.3 Latent Trait Models. 9.4 Latent Variable Models for Continuous Indicators. 9.5 Casual Models with Categorical Latent variables. 9.6 Comparison with Categorical Latent Variables. 9.7 Examples. 9.A Software for Categorical Latent Variables. 9.B Computer Program Listings for the Examples. 10. Comparison in Multilevel Analysis. 10.1 Introduction. 10.2 An Introduction to Multilevel Analysis. 10.3 The Basics of the Linear Multilevel Model. 10.4 The Basics of the Generalized Linear Multilevel Model. 10.5 Group as an External Variable in Multilevel Analysis. 10.6 The Relation between Multilevel Analysis and Group Comparison. 10.7 Multiple Membership Models. 10.8 Summary. 10.A Software for Multilevel Analysis. 10.B SAS Program Listings for GLMM Examples. References. Index.

Journal ArticleDOI
TL;DR: It is shown that both the simple form of the Rasch model for binary data and a generalisation are essentially equivalent to special dichotomised Gaussian models, and the implications for scoring of the binary variables are discussed.
Abstract: It is shown that both the simple form of the Rasch model for binary data and a generalisation are essentially equivalent to special dichotomised Gaussian models. In these the underlying Gaussian structure is of single factor form; that is, the correlations between the binary variables arise via a single underlying variable, called in psychometrics a latent trait. The implications for scoring of the binary variables are discussed, in particular regarding the scoring system as in effect estimating the latent trait. In particular, the role of the simple sum score, in effect the total number of 'successes', is examined. Relations with the principal component analysis of binary data are outlined and some connections with the quadratic exponential binary model are sketched.

Journal ArticleDOI
TL;DR: A fully Bayesian approach to non-life risk premium rating, based on hierarchical models with latent variables for both claim frequency and claim size, is proposed and it is shown that interaction among latent variables can improve predictions significantly.
Abstract: We propose a fully Bayesian approach to non-life risk premium rating, based on hierarchical models with latent variables for both claim frequency and claim size. Inference is based on the joint posterior distribution and is performed by Markov Chain Monte Carlo. Rather than plug-in point estimates of all unknown parameters, we take into account all sources of uncertainty simultaneously when the model is used to predict claims and estimate risk premiums. Several models are fitted to both a simulated dataset and a small portfolio regarding theft from cars. We show that interaction among latent variables can improve predictions significantly. We also investigate when interaction is not necessary. We compare our results with those obtained under a standard generalized linear model and show through numerical simulation that geographically located and spatially interacting latent variables can successfully compensate for missing covariates. However, when applied to the real portfolio data, the proposed models a...

Journal ArticleDOI
TL;DR: Bayesian latent-variable regression (BLVR) permits extraction and incorporation of knowledge about the statistical behavior of measurements in developing linear process models and can handle noise in inputs and outputs, collinear variables, and incorporate prior knowledge about regression parameters and measured variables.
Abstract: Large quantities of measured data are being routinely collected in various industries and used for extracting linear models for tasks such as process control, fault diagnosis, and process monitoring. Existing linear modeling methods, however, do not fully utilize all the information contained in the measurements. A new approach for linear process modeling makes maximum use of available process data and process knowledge. Bayesian latent-variable regression (BLVR) permits extraction and incorporation of knowledge about the statistical behavior of measurements in developing linear process models. Furthermore, BLVR can handle noise in inputs and outputs, collinear variables, and incorporate prior knowledge about regression parameters and measured variables. The model is usually more accurate than those of existing methods, including OLS, PCR, and PLS. BLVR considers a univariate output and assumes the underlying variables and noise to be Gaussian, but it can be used for multivariate outputs and other distributions. An empirical Bayes approach is developed to extract the prior information from historical data or maximum-likelihood solution of available data. Examples of steady-state, dynamic and inferential modeling demonstrate the superior accuracy of BLVR over existing methods even when the assumptions of Gaussian distributions are violated. The relationship between BLVR and existing methods and opportunities for future work based on this framework are also discussed.


Journal ArticleDOI
TL;DR: A latent-class model of rater agreement is presented for which 1 of the model parameters can be interpreted as the proportion of systematic agreement.
Abstract: A latent-class model of rater agreement is presented for which 1 of the model parameters can be interpreted as the proportion of systematic agreement. The latent classes of the model emerge from the factorial combination of the "true" category in which a target belongs and the ease with which raters are able to classify targets into the true category. Several constrained cases of the model are described, and the relations to other well-known agreement models and kappa-type summary coefficients are explained. The differential quality of the rating categories can be assessed on the basis of the model fit. The model is illustrated using data from diagnoses of psychiatric disorders and classifications of individuals in a persuasive communication study.