scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: A new algorithm called "Google latent semantic distance" (GLSD) is described and used to extract the most important sequence of keywords to provide the most relevant search results to the user.
Abstract: Research highlights? We adapted the Google similarity distance algorithm into a more efficient new algorithm. ? We used the PLSA to enhance the original 2-gram NGD into a 3-gram algorithm. ? To extract the most important sequence of keywords to provide the most relevant search results to the user. There have been many studies about how to help users enter more keywords into a search engine to find the most relevant documents or search results. Methods previously reported in the literature require a database to save the user profile, and construct a well-trained model to provide the potential "next keyword" to the user. Because the predictive models are based on the training data, they can only be used in a single knowledge domain. In this paper, we describe a new algorithm called "Google latent semantic distance" (GLSD) and use it to extract the most important sequence of keywords to provide the most relevant search results to the user. Our method utilizes on-line, real-time processing and needs no training data. Thus, it can be used in different knowledge domains. Our experiments show that the GLSD can achieve high accuracy, and we can find out the most relevant information in the top search results in most cases. We believe that this new system can increase users' effectiveness in both reading and writing articles.

16 citations

Proceedings Article
01 Oct 2013
TL;DR: It is shown that on a crosslingual mate retrieval task, the model significantly outperforms LDA, LSI, and ESA, as well as a baseline that translates every word in a document into the target language.
Abstract: Cross-lingual topic modelling has applications in machine translation, word sense disambiguation and terminology alignment Multilingual extensions of approaches based on latent (LSI), generative (LDA, PLSI) as well as explicit (ESA) topic modelling can induce an interlingual topic space allowing documents in different languages to be mapped into the same space and thus to be compared across languages In this paper, we present a novel approach that combines latent and explicit topic modelling approaches in the sense that it builds on a set of explicitly defined topics, but then computes latent relations between these Thus, the method combines the benefits of both explicit and latent topic modelling approaches We show that on a crosslingual mate retrieval task, our model significantly outperforms LDA, LSI, and ESA, as well as a baseline that translates every word in a document into the target language

16 citations

Journal ArticleDOI
TL;DR: It is shown that the performance of the proposed model is superior to that of the k-means and PLSI in terms of category mining, and is applicable for marketing support, service modeling, and decision making in various business fields, including retail services.
Abstract: This paper describes a computational customer behavior modeling by Bayesian network with an appropriate category. Categories are generated by a heterogeneous data fusion using an ID-POS data and customer's questionnaire responses with respect to their lifestyle. We propose a latent class model that is an extension of PLSI model. In the proposed model, customers and items are classified probabilistically into some latent lifestyle categories and latent item category. We show that the performance of the proposed model is superior to that of the k-means and PLSI in terms of category mining. We produce a Bayesian network model including the customer and item categories, situations and conditions of purchases. Based on that network structure, we can systematically identify useful knowledge for use in sustainable services. In the retail service, knowledge management with point of sales data mining is integral to maintaining and improving productivity. This method provides useful knowledge based on the ID-POS data for efficient customer relationship management and can be applicable for other service industries. This method is applicable for marketing support, service modeling, and decision making in various business fields, including retail services.

16 citations

Journal ArticleDOI
TL;DR: Evaluation and comparison of hybrid topic models are presented in the experimental section for demonstrating the efficiency with different distance measures, include, Euclidean distance, cosine distance, and multi-viewpoint cosine similarity.
Abstract: Social media and in particular, microblogs are becoming an important data source for disease surveillance, behavioral medicine, and public healthcare. Topic Models are widely used in microblog analytics for analyzing and integrating the textual data within a corpus. This paper uses health tweets as microblogs and attempts the health data clustering by topic models. The traditional topic models, such as Latent Semantic Indexing (LSI), Probabilistic Latent Schematic Indexing (PLSI), Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and integer Joint NMF(intJNMF) methods are used for health data clustering; however, they are intractable to assess the number of health topic clusters. Proper visualizations are essential to extract the information from and identifying trends of data, as they may include thousands of documents and millions of words. For visualization of topic clouds and health tendency in the document collection, we present hybrid topic models by integrating traditional topic models with VAT. Proposed hybrid topic models viz., Visual Non-negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual Probabilistic Latent Schematic Indexing (VPLSI) and Visual Latent Schematic Indexing (VLSI) are promising methods for accessing the health tendency and visualization of topic clusters from benchmarked and Twitter datasets. Evaluation and comparison of hybrid topic models are presented in the experimental section for demonstrating the efficiency with different distance measures, include, Euclidean distance, cosine distance, and multi-viewpoint cosine similarity.

16 citations

Book ChapterDOI
TL;DR: It is investigated whether using latent space models enables to learn patterns of visual co-occurrence and if the learned visual models improve performance when less labeled data are available, and results that support these hypotheses are presented.
Abstract: We propose the use of latent space models applied to local invariant features for object classification. We investigate whether using latent space models enables to learn patterns of visual co-occurrence and if the learned visual models improve performance when less labeled data are available. We present and discuss results that support these hypotheses. Probabilistic Latent Semantic Analysis (PLSA) automatically identifies aspects from the data with semantic meaning, producing unsupervised soft clustering. The resulting compact representation retains sufficient discriminative information for accurate object classification, and improves the classification accuracy through the use of unlabeled data when less labeled training data are available. We perform experiments on a 7-class object database containing 1776 images.

16 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858