Topic
Probabilistic latent semantic analysis
About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A semantically enhanced document retrieval system that describes each retrieved document with an ontological multi-grained network of the extracted conceptualization, and a SKOS-based ontology, ad-hoc created for a document corpus that enables the exploration of the concepts at different granularity levels.
19 citations
••
03 Apr 2017TL;DR: The experimental results indicate that the probabilistic-based distance measures are better than the vector based distance measures including Euclidean when it comes to cluster a set of documents in the topic space.
Abstract: This paper evaluates through an empirical study eight different distance measures used on the LDA + K-means model. We performed our analysis on two miscellaneous datasets that are commonly used. Our experimental results indicate that the probabilistic-based distance measures are better than the vector based distance measures including Euclidean when it comes to cluster a set of documents in the topic space. Moreover, we investigate the implication of the number of topics and show that K-means combined to the results of the Latent Dirichlet Allocation model allows us to have better results than the LDA + Naive and Vector Space Model.
19 citations
••
TL;DR: A flexible multi-task learning framework to identify latent grouping structures under agnostic settings, where the prior of the latent subspace is unknown to the learner, and provides proofs of theoretical guarantee on learning performance.
19 citations
••
TL;DR: This paper proposes a novel transfer learning method, referred to as Multi-Bridge Transfer Learning (MBTL), to learn the distributions in the different latent spaces together and presents an iterative algorithm with convergence guarantee to solve MBTL.
Abstract: MBTL constructs multiple latent spaces to exploit more common latent factors.MBTL reduces the discrepancies of the distributions in different latent spaces.To solve MBTL, we present an iterative algorithm with convergence guarantee.MBTL outperforms state-of-the-art learning methods on several datasets. Transfer learning, which aims to exploit the knowledge in the source domains to promote the learning tasks in the target domains, has attracted extensive research interests recently. The general idea of the previous approaches is to model the shared structure in one latent space as the bridge across domains by reducing the distribution divergences. However, there exist some latent factors in the other latent spaces, which can also be utilized to draw the corresponding distributions closer for establishing the bridges. In this paper, we propose a novel transfer learning method, referred to as Multi-Bridge Transfer Learning (MBTL), to learn the distributions in the different latent spaces together. Therefore, more latent factors shared can be utilized to transfer knowledge. Additionally, an iterative algorithm with convergence guarantee based on non-negative matrix tri-factorization techniques is proposed to solve the optimization problem. Comprehensive experiments demonstrate that MBTL can significantly outperform state-of-the-art learning methods on the topic and sentiment classification tasks.
19 citations
••
TL;DR: It is demonstrated that Sharma–Mittal entropy is a convenient tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do.
Abstract: Topic modeling is a popular approach for clustering text documents. However, current tools have a number of unsolved problems such as instability and a lack of criteria for selecting the values of model parameters. In this work, we propose a method to solve partially the problems of optimizing model parameters, simultaneously accounting for semantic stability. Our method is inspired by the concepts from statistical physics and is based on Sharma-Mittal entropy. We test our approach on two models: probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA) with Gibbs sampling, and on two datasets in different languages. We compare our approach against a number of standard metrics, each of which is able to account for just one of the parameters of our interest. We demonstrate that Sharma-Mittal entropy is a convenient tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do. Furthermore, we show that concepts from statistical physics can be used to contribute to theory construction for machine learning, a rapidly-developing sphere that currently lacks a consistent theoretical ground.
19 citations