Topic
Latent class model
About: Latent class model is a research topic. Over the lifetime, 5715 publications have been published within this topic receiving 263726 citations. The topic is also known as: latent class analysis.
Papers published on a yearly basis
Papers
More filters
•
03 Jan 2001TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.
25,546 citations
•
28 Apr 1989
TL;DR: The General Model, Part I: Latent Variable and Measurement Models Combined, Part II: Extensions, Part III: Extensions and Part IV: Confirmatory Factor Analysis as discussed by the authors.
Abstract: Model Notation, Covariances, and Path Analysis. Causality and Causal Models. Structural Equation Models with Observed Variables. The Consequences of Measurement Error. Measurement Models: The Relation Between Latent and Observed Variables. Confirmatory Factor Analysis. The General Model, Part I: Latent Variable and Measurement Models Combined. The General Model, Part II: Extensions. Appendices. Distribution Theory. References. Index.
19,019 citations
••
TL;DR: Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test proved to be a very consistent indicator of classes across all of the models considered.
Abstract: Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, one unresolved issue in the application of mixture models is that there is not one commonly accepted statistical indicator for deciding on the number of classes in a study population. This article presents the results of a simulation study that examines the performance of likelihood-based tests and the traditionally used Information Criterion (ICs) used for determining the number of classes in mixture modeling. We look at the performance of these tests and indexes for 3 types of mixture models: latent class analysis (LCA), a factor mixture model (FMA), and a growth mixture models (GMM). We evaluate the ability of the tests and indexes to correctly identify the number of classes at three different sample sizes (n = 200, 500, 1,000). Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test ...
7,716 citations
••
01 Aug 1999TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.
Abstract: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain{specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.
4,577 citations
••
TL;DR: In this paper, exact Bayesian methods for modeling categorical response data are developed using the idea of data augmentation, which can be summarized as follows: the probit regression model for binary outcomes is seen to have an underlying normal regression structure on latent continuous data, and values of the latent data can be simulated from suitable truncated normal distributions.
Abstract: A vast literature in statistics, biometrics, and econometrics is concerned with the analysis of binary and polychotomous response data. The classical approach fits a categorical response regression model using maximum likelihood, and inferences about the model are based on the associated asymptotic theory. The accuracy of classical confidence statements is questionable for small sample sizes. In this article, exact Bayesian methods for modeling categorical response data are developed using the idea of data augmentation. The general approach can be summarized as follows. The probit regression model for binary outcomes is seen to have an underlying normal regression structure on latent continuous data. Values of the latent data can be simulated from suitable truncated normal distributions. If the latent data are known, then the posterior distribution of the parameters can be computed using standard results for normal linear models. Draws from this posterior are used to sample new latent data, and t...
3,272 citations