scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Proceedings ArticleDOI
14 May 2006
TL;DR: The use of probabilistic latent topical information for extractive summarization of spoken documents is proposed and the summarization capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as the HMM model.
Abstract: The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In the paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as the HMM model. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained.

27 citations

Patent
01 Apr 2014
TL;DR: In this paper, a deep learning model, such as a convolutional latent semantic model, is designed to capture both the local and global linguistic contexts of the linguistic items, and the similarity measure expresses the closeness between the first and second linguistic items in a high-level semantic space.
Abstract: Functionality is described herein for transforming first and second symbolic linguistic items into respective first and second continuous-valued concept vectors, using a deep learning model, such as a convolutional latent semantic model. The model is designed to capture both the local and global linguistic contexts of the linguistic items. The functionality then compares the first concept vector with the second concept vector to produce a similarity measure. More specifically, the similarity measure expresses the closeness between the first and second linguistic items in a high-level semantic space. In one case, the first linguistic item corresponds to a query, and the second linguistic item may correspond to a phrase, or a document, or a keyword, or an ad, etc. In one implementation, the convolutional latent semantic model is produced in a training phase based on click-through data.

27 citations

Proceedings ArticleDOI
Hans Laurberg1
26 Aug 2007
TL;DR: A strong uniqueness theorem on non-negative matrix factorizations (NMF) is introduced and it is described how the theorem can be applied to two of the common application areas of NMF, namely music analysis and probabilistic latent semantic analysis.
Abstract: In this paper, two new properties of stochastic vectors are introduced and a strong uniqueness theorem on non-negative matrix factorizations (NMF) is introduced. It is described how the theorem can be applied to two of the common application areas of NMF, namely music analysis and probabilistic latent semantic analysis. Additionally, the theorem can be used for selecting the model order and the sparsity parameter in sparse NMFs.

27 citations

Proceedings ArticleDOI
01 Dec 2010
TL;DR: This work has developed an AEG system using Generalized Latent Semantic Analysis (GLSA) which makes n-gram by document matrix instead of word by documents matrix and outperforms the existing system.
Abstract: Automated Essay Grading (AEG) is a very important research area in educational technology. Latent Semantic Analysis (LSA) is an information retrieval technique used for automated essay grading. LSA forms a word by document matrix and then the matrix is decomposed using Singular Value Decomposition (SVD) technique. Existing AEG systems based on LSA cannot achieve higher level of performance to be a replica of human grader. We have developed an AEG system using Generalized Latent Semantic Analysis (GLSA) which makes n-gram by document matrix instead of word by document matrix. We have evaluated this system using details representation and showed the performance of the system. Experimental results show that our system outperforms the existing system.

27 citations

Book ChapterDOI
22 May 2007
TL;DR: This investigation examines the novel automatic query expansion method using the latent semantic thesaurus, which is based on probabilistic latent semantic analysis, and shows how to construct it by mining text documents for probabilism term relationships.
Abstract: Many queries on collections of text documents are too short to produce informative results. Automatic query expansion is a method of adding terms to the query without interaction from the user in order to obtain more refined results. In this investigation, we examine our novel automatic query expansion method using the probabilistic latent semantic thesaurus, which is based on probabilistic latent semantic analysis. We show how to construct the thesaurus by mining text documents for probabilistic term relationships, and we show that by using the latent semantic thesaurus, we can overcome many of the problems associated to latent semantic analysis on large document sets which were previously identified. Experiments using TREC document sets show that our term expansion method out performs the popular probabilistic pseudorelevance feedback method by 7.3%.

27 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858