Topic
Probabilistic latent semantic analysis
About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.
Papers published on a yearly basis
Papers
More filters
••
25 Oct 2008TL;DR: It is pointed out that to better represent the contents of hypertexts it is more essential to assume that the hyperlinks are fixed and to define the topic model as that of generating words only.
Abstract: Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recently, topic models for processing hypertexts such as web pages were also proposed. The proposed hypertext models are generative models giving rise to both words and hyperlinks. This paper points out that to better represent the contents of hypertexts it is more essential to assume that the hyperlinks are fixed and to define the topic model as that of generating words only. The paper then proposes a new topic model for hypertext processing, referred to as Hypertext Topic Model (HTM). HTM defines the distribution of words in a document (i.e., the content of the document) as a mixture over latent topics in the document itself and latent topics in the documents which the document cites. The topics are further characterized as distributions of words, as in the conventional topic models. This paper further proposes a method for learning the HTM model. Experimental results show that HTM outperforms the baselines on topic discovery and document classification in three datasets.
21 citations
•
21 Jun 2014TL;DR: A novel patch-based latent variable model is proposed to integrate latent contextual representation learning and classification model training in one joint optimization framework, providing discriminative explanations for the semantic output labels, while being predictable from the low-level input features.
Abstract: The performance of machine learning methods is heavily dependent on the choice of data representation. In real world applications such as scene recognition problems, the widely used low-level input features can fail to explain the high-level semantic label concepts. In this work, we address this problem by proposing a novel patch-based latent variable model to integrate latent contextual representation learning and classification model training in one joint optimization framework. Within this framework, the latent layer of variables bridge the gap between inputs and outputs by providing discriminative explanations for the semantic output labels, while being predictable from the low-level input features. Experiments conducted on standard scene recognition tasks demonstrate the efficacy of the proposed approach, comparing to the state-of-the-art scene recognition methods.
21 citations
••
TL;DR: This work proposes the extension of probabilistic latent semantic analysis to higher order, so as to become applicable for more than two observable variables, and learns a space of latent topics that incorporates the semantics of both visual and tag information.
21 citations
••
TL;DR: This work derives the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs and a single top-level pLSA node merging the two leaf, pLSAs.
Abstract: In this work, we extend the standard single-layer probabilistic Latent Semantic Analysis (pLSA) (Hofmann in Mach Learn 42(1–2):177–196, 2001) to multiple layers. As multiple layers should naturally handle multiple modalities and a hierarchy of abstractions, we denote this new approach multilayer multimodal probabilistic Latent Semantic Analysis (mm-pLSA). We derive the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs and a single top-level pLSA node merging the two leaf-pLSAs. We evaluate this approach on two pairs of different modalities: SIFT features and image annotations (tags) as well as the combination of SIFT and HOG features. We also propose a fast and strictly stepwise forward procedure to initialize the bottom–up mm-pLSA model, which in turn can then be post-optimized by the general mm-pLSA learning algorithm. The proposed approach is evaluated in a query-by-example retrieval task where various variants of our mm-pLSA system are compared to systems relying on a single modality and other ad-hoc combinations of feature histograms. We further describe possible pitfalls of the mm-pLSA training and analyze the resulting model yielding an intuitive explanation of its behaviour.
21 citations
••
06 Nov 2011TL;DR: This paper proposes novel spectral methods for learning latent semantics from a large vocabulary of abundant mid-level features (i.e. visual keywords), which can help to bridge the semantic gap in the challenging task of action recognition.
Abstract: This paper proposes novel spectral methods for learning latent semantics (i.e. high-level features) from a large vocabulary of abundant mid-level features (i.e. visual keywords), which can help to bridge the semantic gap in the challenging task of action recognition. To discover the manifold structure hidden among mid-level features, we develop spectral embedding approaches based on graphs and hypergraphs, without the need to tune any parameter for graph construction which is a key step of manifold learning. In particular, the traditional graphs are constructed by linear reconstruction with sparse coding. In the new embedding space, we learn high-level latent semantics automatically from abundant mid-level features through spectral clustering. The learnt latent semantics can be readily used for action recognition with SVM by defining a histogram intersection kernel. Different from the traditional latent semantic analysis based on topic models, our two spectral methods for semantic learning can discover the manifold structure hidden among mid-level features, which results in compact but discriminative high-level features. The experimental results on two standard action datasets have shown the superior performance of our spectral methods.
21 citations