scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: A semantically enhanced document retrieval system that describes each retrieved document with an ontological multi-grained network of the extracted conceptualization, and a SKOS-based ontology, ad-hoc created for a document corpus that enables the exploration of the concepts at different granularity levels.

19 citations

Book ChapterDOI
03 Apr 2017
TL;DR: The experimental results indicate that the probabilistic-based distance measures are better than the vector based distance measures including Euclidean when it comes to cluster a set of documents in the topic space.
Abstract: This paper evaluates through an empirical study eight different distance measures used on the LDA + K-means model. We performed our analysis on two miscellaneous datasets that are commonly used. Our experimental results indicate that the probabilistic-based distance measures are better than the vector based distance measures including Euclidean when it comes to cluster a set of documents in the topic space. Moreover, we investigate the implication of the number of topics and show that K-means combined to the results of the Latent Dirichlet Allocation model allows us to have better results than the LDA + Naive and Vector Space Model.

19 citations

Journal ArticleDOI
TL;DR: A flexible multi-task learning framework to identify latent grouping structures under agnostic settings, where the prior of the latent subspace is unknown to the learner, and provides proofs of theoretical guarantee on learning performance.

19 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel transfer learning method, referred to as Multi-Bridge Transfer Learning (MBTL), to learn the distributions in the different latent spaces together and presents an iterative algorithm with convergence guarantee to solve MBTL.
Abstract: MBTL constructs multiple latent spaces to exploit more common latent factors.MBTL reduces the discrepancies of the distributions in different latent spaces.To solve MBTL, we present an iterative algorithm with convergence guarantee.MBTL outperforms state-of-the-art learning methods on several datasets. Transfer learning, which aims to exploit the knowledge in the source domains to promote the learning tasks in the target domains, has attracted extensive research interests recently. The general idea of the previous approaches is to model the shared structure in one latent space as the bridge across domains by reducing the distribution divergences. However, there exist some latent factors in the other latent spaces, which can also be utilized to draw the corresponding distributions closer for establishing the bridges. In this paper, we propose a novel transfer learning method, referred to as Multi-Bridge Transfer Learning (MBTL), to learn the distributions in the different latent spaces together. Therefore, more latent factors shared can be utilized to transfer knowledge. Additionally, an iterative algorithm with convergence guarantee based on non-negative matrix tri-factorization techniques is proposed to solve the optimization problem. Comprehensive experiments demonstrate that MBTL can significantly outperform state-of-the-art learning methods on the topic and sentiment classification tasks.

19 citations

Journal ArticleDOI
05 Jul 2019-Entropy
TL;DR: It is demonstrated that Sharma–Mittal entropy is a convenient tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do.
Abstract: Topic modeling is a popular approach for clustering text documents. However, current tools have a number of unsolved problems such as instability and a lack of criteria for selecting the values of model parameters. In this work, we propose a method to solve partially the problems of optimizing model parameters, simultaneously accounting for semantic stability. Our method is inspired by the concepts from statistical physics and is based on Sharma-Mittal entropy. We test our approach on two models: probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA) with Gibbs sampling, and on two datasets in different languages. We compare our approach against a number of standard metrics, each of which is able to account for just one of the parameters of our interest. We demonstrate that Sharma-Mittal entropy is a convenient tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do. Furthermore, we show that concepts from statistical physics can be used to contribute to theory construction for machine learning, a rapidly-developing sphere that currently lacks a consistent theoretical ground.

19 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858