scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Patent
19 Dec 2011
TL;DR: In this paper, a computer-implemented method and system for ranking documents is presented, which includes identifying a number of query-document pairs based on clickthrough data for a many of documents.
Abstract: There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.

21 citations

Journal ArticleDOI
09 Jan 2013
TL;DR: An algorithm called coherence hidden Markov model (HMM) to extract coherence features and rank content and an intelligent assisted blog writing system based on the coherence-HMM ranking model is proposed.
Abstract: In this paper, we propose an algorithm called coherence hidden Markov model (HMM) to extract coherence features and rank content. Coherence HMM is a variant of HMM and is used to model the stochastic process of essay writing and identify topics as hidden states, given sequenced clauses as observations. This study uses probabilistic latent semantic analysis for parameter estimation of coherence HMM. In coherence-feature extraction, support vector regression (SVR) with surface features and coherence features is used for essay grading. The experimental results indicate that SVR can benefit from coherence features. The adjacent agreement rate and the exact agreement rate are 95.24% and 59.80%, respectively. Moreover, this study submits high-scoring essays to the same experiment and finds that the adjacent agreement rate and exact agreement rate are 98.33% and 64.50%, respectively. In content ranking, we design and implement an intelligent assisted blog writing system based on the coherence-HMM ranking model. Several corpora are employed to help users efficiently compose blog articles. When users finish composing a clause or sentence, the system provides candidate texts for their reference based on current clause or sentence content. The experimental results demonstrate that all participants can benefit from the system and save considerable time on writing articles.

21 citations

Proceedings ArticleDOI
20 Apr 2009
TL;DR: This paper proposes a novel ranking model that provides more accurate semantic search results compared to existing ranking models and considers the number of meaningful semantic relationships between a resource and keywords, the coverage of keywords, and the distinguishability of keywords.
Abstract: In this paper, we present a semantic search technique considering the type of desired Web resources and the semantic relationships between the resources and the query keywords in the ontology. In order to effectively retrieve the most relevant top-k resources, we propose a novel ranking model. To do this, we devise a measure to determine the weight of the semantic relationship. In addition, we consider the number of meaningful semantic relationships between a resource and keywords, the coverage of keywords, and the distinguishability of keywords. Through experiments using real datasets, we observe that our ranking model provides more accurate semantic search results compared to existing ranking models.

20 citations

Proceedings ArticleDOI
30 Oct 2008
TL;DR: A probabilistic approach for unsupervised modeling and recognition of object categories which combines two types of complementary visual evidence, visual contents and inter-connected links between the images is proposed.
Abstract: This paper proposes a probabilistic approach for unsupervised modeling and recognition of object categories which combines two types of complementary visual evidence, visual contents and inter-connected links between the images. By doing so, our approach not only increases modeling and recognition performance but also provides possible solutions to several problems including modeling of geometric information, computational complexity, and the inherent ambiguity of visual words. Our approach can be incorporated in any generative models, but here we consider two popular models, pLSA and LDA. Experimental results show that the topic models updated by adding link analysis terms significantly improve the standard pLSA and LDA models. Furthermore, we presented competitive performances on unsupervised modeling, ranking of training images, classification of unseen images, and localization tasks with MSRC and PASCAL2005 datasets.

20 citations

Book ChapterDOI
30 Aug 2009
TL;DR: This paper proposes a novel and theoretically sound document similarity, which avoids the problem of "folding in" unknown documents in PLSI, and experimental results are provided on several information retrieval evaluation sets.
Abstract: The Probabilistic Latent Semantic Indexing model, introduced by T. Hofmann (1999), has engendered applications in numerous fields, notably document classification and information retrieval. In this context, the Fisher kernel was found to be an appropriate document similarity measure. However, the kernels published so far contain unjustified features, some of which hinder their performances. Furthermore, PLSI is not generative for unknown documents, a shortcoming usually remedied by "folding them in" the PLSI parameter space. This paper contributes on both points by (1) introducing a new, rigorous development of the Fisher kernel for PLSI, addressing the role of the Fisher Information Matrix, and uncovering its relation to the kernels proposed so far; and (2) proposing a novel and theoretically sound document similarity, which avoids the problem of "folding in" unknown documents. For both aspects, experimental results are provided on several information retrieval evaluation sets.

20 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858