Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Using Google latent semantic distance to extract the most relevant information

[...]

P. C. Chen¹, Shi-Jen Lin¹, Ya-Chi Chu¹•Institutions (1)

National Central University¹

01 Jun 2011-Expert Systems With Applications

TL;DR: A new algorithm called "Google latent semantic distance" (GLSD) is described and used to extract the most important sequence of keywords to provide the most relevant search results to the user.

...read moreread less

Abstract: Research highlights? We adapted the Google similarity distance algorithm into a more efficient new algorithm. ? We used the PLSA to enhance the original 2-gram NGD into a 3-gram algorithm. ? To extract the most important sequence of keywords to provide the most relevant search results to the user. There have been many studies about how to help users enter more keywords into a search engine to find the most relevant documents or search results. Methods previously reported in the literature require a database to save the user profile, and construct a well-trained model to provide the potential "next keyword" to the user. Because the predictive models are based on the training data, they can only be used in a single knowledge domain. In this paper, we describe a new algorithm called "Google latent semantic distance" (GLSD) and use it to extract the most important sequence of keywords to provide the most relevant search results to the user. Our method utilizes on-line, real-time processing and needs no training data. Thus, it can be used in different knowledge domains. Our experiments show that the GLSD can achieve high accuracy, and we can find out the most relevant information in the top search results in most cases. We believe that this new system can increase users' effectiveness in both reading and writing articles.

...read moreread less

16 citations

Proceedings Article•

Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

[...]

John P. McCrae¹, Philipp Cimiano¹, Roman Klinger¹•Institutions (1)

Bielefeld University¹

01 Oct 2013

TL;DR: It is shown that on a crosslingual mate retrieval task, the model significantly outperforms LDA, LSI, and ESA, as well as a baseline that translates every word in a document into the target language.

...read moreread less

Abstract: Cross-lingual topic modelling has applications in machine translation, word sense disambiguation and terminology alignment Multilingual extensions of approaches based on latent (LSI), generative (LDA, PLSI) as well as explicit (ESA) topic modelling can induce an interlingual topic space allowing documents in different languages to be mapped into the same space and thus to be compared across languages In this paper, we present a novel approach that combines latent and explicit topic modelling approaches in the sense that it builds on a set of explicitly defined topics, but then computes latent relations between these Thus, the method combines the benefits of both explicit and latent topic modelling approaches We show that on a crosslingual mate retrieval task, our model significantly outperforms LDA, LSI, and ESA, as well as a baseline that translates every word in a document into the target language

...read moreread less

16 citations

Journal Article•DOI•

Customer Behavior Prediction System by Large Scale Data Fusion in a Retail Service

[...]

Tsukasa Ishigaki, Takeshi Takenaka¹, Yoichi Motomura•Institutions (1)

National Institute of Advanced Industrial Science and Technology¹

01 Jan 2011-Transactions of The Japanese Society for Artificial Intelligence

TL;DR: It is shown that the performance of the proposed model is superior to that of the k-means and PLSI in terms of category mining, and is applicable for marketing support, service modeling, and decision making in various business fields, including retail services.

...read moreread less

Abstract: This paper describes a computational customer behavior modeling by Bayesian network with an appropriate category. Categories are generated by a heterogeneous data fusion using an ID-POS data and customer's questionnaire responses with respect to their lifestyle. We propose a latent class model that is an extension of PLSI model. In the proposed model, customers and items are classified probabilistically into some latent lifestyle categories and latent item category. We show that the performance of the proposed model is superior to that of the k-means and PLSI in terms of category mining. We produce a Bayesian network model including the customer and item categories, situations and conditions of purchases. Based on that network structure, we can systematically identify useful knowledge for use in sustainable services. In the retail service, knowledge management with point of sales data mining is integral to maintaining and improving productivity. This method provides useful knowledge based on the ID-POS data for efficient customer relationship management and can be applicable for other service industries. This method is applicable for marketing support, service modeling, and decision making in various business fields, including retail services.

...read moreread less

16 citations

Journal Article•DOI•

Hybrid Topic Cluster Models for Social Healthcare Data

[...]

K. Rajendra Prasad, Moulana Mohammed, R. M. Noorullah

01 Jan 2019-International Journal of Advanced Computer Science and Applications

TL;DR: Evaluation and comparison of hybrid topic models are presented in the experimental section for demonstrating the efficiency with different distance measures, include, Euclidean distance, cosine distance, and multi-viewpoint cosine similarity.

...read moreread less

Abstract: Social media and in particular, microblogs are becoming an important data source for disease surveillance, behavioral medicine, and public healthcare. Topic Models are widely used in microblog analytics for analyzing and integrating the textual data within a corpus. This paper uses health tweets as microblogs and attempts the health data clustering by topic models. The traditional topic models, such as Latent Semantic Indexing (LSI), Probabilistic Latent Schematic Indexing (PLSI), Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and integer Joint NMF(intJNMF) methods are used for health data clustering; however, they are intractable to assess the number of health topic clusters. Proper visualizations are essential to extract the information from and identifying trends of data, as they may include thousands of documents and millions of words. For visualization of topic clouds and health tendency in the document collection, we present hybrid topic models by integrating traditional topic models with VAT. Proposed hybrid topic models viz., Visual Non-negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual Probabilistic Latent Schematic Indexing (VPLSI) and Visual Latent Schematic Indexing (VLSI) are promising methods for accessing the health tendency and visualization of topic clusters from benchmarked and Twitter datasets. Evaluation and comparison of hybrid topic models are presented in the experimental section for demonstrating the efficiency with different distance measures, include, Euclidean distance, cosine distance, and multi-viewpoint cosine similarity.

...read moreread less

16 citations

Book Chapter•DOI•

Constructing visual models with a latent space approach

[...]

Florent Monay¹, Pedro Quelhas¹, Daniel Gatica-Perez¹, Jean-Marc Odobez¹•Institutions (1)

Idiap Research Institute¹

23 Feb 2005-Lecture Notes in Computer Science

TL;DR: It is investigated whether using latent space models enables to learn patterns of visual co-occurrence and if the learned visual models improve performance when less labeled data are available, and results that support these hypotheses are presented.

...read moreread less

Abstract: We propose the use of latent space models applied to local invariant features for object classification. We investigate whether using latent space models enables to learn patterns of visual co-occurrence and if the learned visual models improve performance when less labeled data are available. We present and discuss results that support these hypotheses. Probabilistic Latent Semantic Analysis (PLSA) automatically identifies aspects from the data with semantic meaning, producing unsupervised soft clustering. The resulting compact representation retains sufficient discriminative information for accurate object classification, and improves the classification accuracy through the use of unlabeled data when less labeled training data are available. We perform experiments on a 7-class object database containing 1776 images.

...read moreread less

16 citations

Collapse

Network Information

Performance

Metrics

2,984

Papers

212,744

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	77
2021	14
2020	36
2019	27
2018	58

Probabilistic latent semantic analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics