scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Proceedings ArticleDOI
26 Oct 2010
TL;DR: This study proposes a generative statistical model, named Collaborative Dual-PLSA, to simultaneously capture both the domain distinction and commonality among multiple domains and demonstrates the superiority of the proposed model over existing state-of-the-art methods of supervised and transfer learning.
Abstract: The distribution difference among multiple data domains has been considered for the cross-domain text classification problem. In this study, we show two new observations along this line. First, the data distribution difference may come from the fact that different domains use different key words to express the same concept. Second, the association between this conceptual feature and the document class may be stable across domains. These two issues are actually the distinction and commonality across data domains. Inspired by the above observations, we propose a generative statistical model, named Collaborative Dual-PLSA (CD-PLSA), to simultaneously capture both the domain distinction and commonality among multiple domains. Different from Probabilistic Latent Semantic Analysis (PLSA) with only one latent variable, the proposed model has two latent factors y and z, corresponding to word concept and document class respectively. The shared commonality intertwines with the distinctions over multiple domains, and is also used as the bridge for knowledge transformation. We exploit an Expectation Maximization (EM) algorithm to learn this model, and also propose its distributed version to handle the situation where the data domains are geographically separated from each other. Finally, we conduct extensive experiments over hundreds of classification tasks with multiple source domains and multiple target domains to validate the superiority of the proposed CD-PLSA model over existing state-of-the-art methods of supervised and transfer learning. In particular, we show that CD-PLSA is more tolerant of distribution differences.

46 citations

Journal ArticleDOI
TL;DR: A violation of the naive Bayes model is interpreted as an indication of the presence of latent variables, and it is shown how latent variables can be detected.

46 citations

Journal ArticleDOI
TL;DR: An effective person re-identification method with latent variables is proposed, which represents a pedestrian as the mixture of a holistic model and a number of flexible models, and develops a latent metric learning method for learning the effective metric matrix.
Abstract: In this paper, we propose an effective person re-identification method with latent variables, which represents a pedestrian as the mixture of a holistic model and a number of flexible models. Three types of latent variables are introduced to model uncertain factors in the re-identification problem, including vertical misalignments, horizontal misalignments and leg posture variations. The distance between two pedestrians can be determined by minimizing a given distance function with respect to latent variables, and then be used to conduct the re-identification task. In addition, we develop a latent metric learning method for learning the effective metric matrix, which can be solved via an iterative manner: once latent information is specified, the metric matrix can be obtained based on some typical metric learning methods; with the computed metric matrix, the latent variables can be determined by searching the state space exhaustively. Finally, extensive experiments are conducted on seven databases to evaluate the proposed method. The experimental results demonstrate that our method achieves better performance than other competing algorithms.

46 citations

Book ChapterDOI
01 Jan 1983
TL;DR: This chapter discusses the maximum likelihood estimation in a latent variable problem, which is a viable approach to a broad class of latent variable problems.
Abstract: Publisher Summary This chapter discusses the maximum likelihood estimation in a latent variable problem. Latent variates are random variables, which cannot be measured directly, but play essential roles in the description of observable quantities. They occur in a broad range of statistical problems. In the case that the dependent variate y is discrete, latent structure models play an important role, arising in connection with ability tests. Computing uniform residuals is an effective general means to proceed in latent variable problems. In some cases, the subject's ability can be eliminated by conditioning on an appropriate statistic. Maximum likelihood estimation is a viable approach to a broad class of latent variable problems. Generalized linear interactive modeling (GLIM) is an effective tool for carrying out the needed computations. GLIM also contains a high-level syntax for handling variables with factorial structure, vectors, and nonfull rank models. Its powerful directives shorten the length of the program considerably and allow simple simulation of the whole situation for checking programs and logic.

46 citations

Proceedings Article
12 Dec 2011
TL;DR: This work presents a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables that can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference.
Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach.

46 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858