scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Proceedings ArticleDOI
25 Oct 2008
TL;DR: It is pointed out that to better represent the contents of hypertexts it is more essential to assume that the hyperlinks are fixed and to define the topic model as that of generating words only.
Abstract: Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recently, topic models for processing hypertexts such as web pages were also proposed. The proposed hypertext models are generative models giving rise to both words and hyperlinks. This paper points out that to better represent the contents of hypertexts it is more essential to assume that the hyperlinks are fixed and to define the topic model as that of generating words only. The paper then proposes a new topic model for hypertext processing, referred to as Hypertext Topic Model (HTM). HTM defines the distribution of words in a document (i.e., the content of the document) as a mixture over latent topics in the document itself and latent topics in the documents which the document cites. The topics are further characterized as distributions of words, as in the conventional topic models. This paper further proposes a method for learning the HTM model. Experimental results show that HTM outperforms the baselines on topic discovery and document classification in three datasets.

21 citations

Proceedings Article
Xin Li1, Yuhong Guo1
21 Jun 2014
TL;DR: A novel patch-based latent variable model is proposed to integrate latent contextual representation learning and classification model training in one joint optimization framework, providing discriminative explanations for the semantic output labels, while being predictable from the low-level input features.
Abstract: The performance of machine learning methods is heavily dependent on the choice of data representation. In real world applications such as scene recognition problems, the widely used low-level input features can fail to explain the high-level semantic label concepts. In this work, we address this problem by proposing a novel patch-based latent variable model to integrate latent contextual representation learning and classification model training in one joint optimization framework. Within this framework, the latent layer of variables bridge the gap between inputs and outputs by providing discriminative explanations for the semantic output labels, while being predictable from the low-level input features. Experiments conducted on standard scene recognition tasks demonstrate the efficacy of the proposed approach, comparing to the state-of-the-art scene recognition methods.

21 citations

Journal ArticleDOI
TL;DR: This work proposes the extension of probabilistic latent semantic analysis to higher order, so as to become applicable for more than two observable variables, and learns a space of latent topics that incorporates the semantics of both visual and tag information.

21 citations

Journal ArticleDOI
TL;DR: This work derives the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs and a single top-level pLSA node merging the two leaf, pLSAs.
Abstract: In this work, we extend the standard single-layer probabilistic Latent Semantic Analysis (pLSA) (Hofmann in Mach Learn 42(1–2):177–196, 2001) to multiple layers. As multiple layers should naturally handle multiple modalities and a hierarchy of abstractions, we denote this new approach multilayer multimodal probabilistic Latent Semantic Analysis (mm-pLSA). We derive the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs and a single top-level pLSA node merging the two leaf-pLSAs. We evaluate this approach on two pairs of different modalities: SIFT features and image annotations (tags) as well as the combination of SIFT and HOG features. We also propose a fast and strictly stepwise forward procedure to initialize the bottom–up mm-pLSA model, which in turn can then be post-optimized by the general mm-pLSA learning algorithm. The proposed approach is evaluated in a query-by-example retrieval task where various variants of our mm-pLSA system are compared to systems relying on a single modality and other ad-hoc combinations of feature histograms. We further describe possible pitfalls of the mm-pLSA training and analyze the resulting model yielding an intuitive explanation of its behaviour.

21 citations

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper proposes novel spectral methods for learning latent semantics from a large vocabulary of abundant mid-level features (i.e. visual keywords), which can help to bridge the semantic gap in the challenging task of action recognition.
Abstract: This paper proposes novel spectral methods for learning latent semantics (i.e. high-level features) from a large vocabulary of abundant mid-level features (i.e. visual keywords), which can help to bridge the semantic gap in the challenging task of action recognition. To discover the manifold structure hidden among mid-level features, we develop spectral embedding approaches based on graphs and hypergraphs, without the need to tune any parameter for graph construction which is a key step of manifold learning. In particular, the traditional graphs are constructed by linear reconstruction with sparse coding. In the new embedding space, we learn high-level latent semantics automatically from abundant mid-level features through spectral clustering. The learnt latent semantics can be readily used for action recognition with SVM by defining a histogram intersection kernel. Different from the traditional latent semantic analysis based on topic models, our two spectral methods for semantic learning can discover the manifold structure hidden among mid-level features, which results in compact but discriminative high-level features. The experimental results on two standard action datasets have shown the superior performance of our spectral methods.

21 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858