scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: A model for the analysis of time–budgets using a property that rows of this data matrix add up to one is discussed and compared with logcontrast principal component analysis.
Abstract: Time–budgets summarize how the time of objects is distributed over a number of categories. Usually they are collected in object by category matrices with the property that rows of this data matrix add up to one. In this paper we discuss a model for the analysis of time–budgets that used this property. The model approximates the observed time–budgets by weighted sums of a number of latent time–budgets. These latent time–budgets determine the behavior of all objects. Special attention is given to the identification of the model. The model is compared with logcontrast principal component analysis.

29 citations

Posted Content
TL;DR: This paper proposed a piecewise constant distribution to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable, and showed that incorporating this new latent distribution into different models yields substantial improvements in NLP tasks such as document modeling and natural language generation for dialogue.
Abstract: Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.

29 citations

Journal ArticleDOI
TL;DR: My commentary focuses on the relationship between the two perspectives and aims to qualify the presumed contrast between interpretations in terms of networks and latent variables.
Abstract: Cramer et al. present an original and interesting network perspective on comorbidity and contrast this perspective with a more traditional interpretation of comorbidity in terms of latent variable theory. My commentary focuses on the relationship between the two perspectives; that is, it aims to qualify the presumed contrast between interpretations in terms of networks and latent variables.

28 citations

Journal ArticleDOI
Thorsten Brants1
TL;DR: A new test-data likelihood substitute is derived for PLSA and an empirical evaluation shows that the new likelihood substitute produces the best predictions about accuracies in two different IR tasks and is therefore best suited to determine the number of EM steps when training PLSA models.
Abstract: Probabilistic Latent Semantic Analysis (PLSA) is a statistical latent class model that has recently received considerable attention. In its usual formulation it cannot assign likelihoods to unseen documents. Furthermore, it assigns a probability of zero to unseen documents during training. We point out that one of the two existing alternative formulations of the Expectation-Maximization algorithms for PLSA does not require this assumption. However, even that formulation does not allow calculation ofthe actual likelihood values. We therefore derive a new test-data likelihood substitute for PLSA and compare it to three existing likelihood substitutes. An empirical evaluation shows that our new likelihood substitute produces the best predictions about accuracies in two different IR tasks and is therefore best suited to determine the number of EM steps when training PLSA models. The new likelihood measure and its evaluation also suggest that PLSA is not very sensitive to overfitting for the two tasks considered. This renders additions like tempered EM that especially address overfitting unnecessary.

28 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858