scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a joint factor analysis and latent clustering framework, which aims at learning cluster-aware low-dimensional representations of matrix and tensor data, and leverages matrix and Tensor factorization models that produce essentially unique latent representations of the data.
Abstract: Dimensionality reduction techniques play an essential role in data analytics, signal processing, and machine learning. Dimensionality reduction is usually performed in a preprocessing stage that is separate from subsequent data analysis, such as clustering or classification. Finding reduced-dimension representations that are well-suited for the intended task is more appealing. This paper proposes a joint factor analysis and latent clustering framework, which aims at learning cluster-aware low-dimensional representations of matrix and tensor data. The proposed approach leverages matrix and tensor factorization models that produce essentially unique latent representations of the data to unravel latent cluster structure—which is otherwise obscured because of the freedom to apply an oblique transformation in latent space. At the same time, latent cluster structure is used as prior information to enhance the performance of factorization. Specific contributions include several custom-built problem formulations, corresponding algorithms, and discussion of associated convergence properties. Besides extensive simulations, real-world datasets such as Reuters document data and MNIST image data are also employed to showcase the effectiveness of the proposed approaches.

42 citations

Proceedings Article
26 Jun 2012
TL;DR: In this paper, a nonparametric Bayesian model that posits a mixture of factor analyzers structure on the tasks is proposed to learn the "right" task structure in a data-driven manner.
Abstract: Multitask learning algorithms are typically designed assuming some fixed, a priori known latent structure shared by all the tasks. However, it is usually unclear what type of latent task structure is the most appropriate for a given multitask learning problem. Ideally, the "right" latent task structure should be learned in a data-driven manner. We present a flexible, nonparametric Bayesian model that posits a mixture of factor analyzers structure on the tasks. The nonparametric aspect makes the model expressive enough to subsume many existing models of latent task structures (e.g, mean-regularized tasks, clustered tasks, low-rank or linear/non-linear subspace assumption on tasks, etc.). Moreover, it can also learn more general task structures, addressing the shortcomings of such models. We present a variational inference algorithm for our model. Experimental results on synthetic and real-world datasets, on both regression and classification problems, demonstrate the effectiveness of the proposed method.

42 citations

Journal ArticleDOI
TL;DR: This work proposes a new approach structural pLSA (SpLSA) to model explicitly word orders by introducing latent variables, and develops an action categorization approach that learns action representations as the distribution of latent topics in an unsupervised way.

41 citations

Proceedings ArticleDOI
28 Jun 2009
TL;DR: In this paper, a GPBF-Learn framework is proposed for training GP-BayesFilters without ground truth states, which is a general framework for integrating Gaussian process prediction and observation models into Bayesian filtering techniques, including particle filters and extended and unscented Kalman filters.
Abstract: GP-BayesFilters are a general framework for integrating Gaussian process prediction and observation models into Bayesian filtering techniques, including particle filters and extended and unscented Kalman filters. GP-BayesFilters have been shown to be extremely well suited for systems for which accurate parametric models are difficult to obtain. GP-BayesFilters learn non-parametric models from training data containing sequences of control inputs, observations, and ground truth states. The need for ground truth states limits the applicability of GP-BayesFilters to systems for which the ground truth can be estimated without significant overhead. In this paper we introduce GPBF-Learn, a framework for training GP-BayesFilters without ground truth states. Our approach extends Gaussian Process Latent Variable Models to the setting of dynamical robotics systems. We show how weak labels for the ground truth states can be incorporated into the GPBF-Learn framework. The approach is evaluated using a difficult tracking task, namely tracking a slotcar based on inertial measurement unit (IMU) observations only. We also show some special features enabled by this framework, including time alignment, and control replay for both the slotcar, and a robotic arm.

41 citations

Journal ArticleDOI
TL;DR: The fuzzy p-value provides an exact test using two sets of simulations of the latent variables under the null hypothesis, one unconditional and the other conditional on the observed data, which provides not only an expression of the strength of the evidence against thenull hypothesis but also anexpression of the uncertainty in that expression owing to lack of knowledge of the hidden variables.
Abstract: We consider the problem of testing a statistical hypothesis where the scientifically meaningful test statistic is a function of latent variables. In particular, we consider detection of genetic linkage, where the latent variables are patterns of inheritance at specific genome locations. Introduced by Geyer & Meeden (2005), fuzzy p-values are random variables, described by their probability distributions, that are interpreted as p-values. For latent variable problems, we introduce the notion of a fuzzy p-value as having the conditional distribution of the latent p-value given the observed data, where the latent p-value is the random variable that would be the p-value if the latent variables were observed. The fuzzy p-value provides an exact test using two sets of simulations of the latent variables under the null hypothesis, one unconditional and the other conditional on the observed data. It provides not only an expression of the strength of the evidence against the null hypothesis but also an expression of the uncertainty in that expression owing to lack of knowledge of the latent variables. We illustrate these features with an example of simulated data mimicking a real example of the detection of genetic linkage.

41 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858