scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Proceedings ArticleDOI
24 Jul 2011
TL;DR: This paper introduces Interdependent Latent Dirichlet Allocation (ILDA) model, a probabilistic graphical models which aim to extract aspects and corresponding ratings of products from online reviews and conducts experiments on a real life dataset, Epinions.com.
Abstract: Today, more and more product reviews become available on the Internet, e.g., product review forums, discussion groups, and Blogs. However, it is almost impossible for a customer to read all of the different and possibly even contradictory opinions and make an informed decision. Therefore, mining online reviews (opinion mining) has emerged as an interesting new research direction. Extracting aspects and the corresponding ratings is an important challenge in opinion mining. An aspect is an attribute or component of a product, e.g. 'screen' for a digital camera. It is common that reviewers use different words to describe an aspect (e.g. 'LCD', 'display', 'screen'). A rating is an intended interpretation of the user satisfaction in terms of numerical values. Reviewers usually express the rating of an aspect by a set of sentiments, e.g. 'blurry screen'. In this paper we present three probabilistic graphical models which aim to extract aspects and corresponding ratings of products from online reviews. The first two models extend standard PLSI and LDA to generate a rated aspect summary of product reviews. As our main contribution, we introduce Interdependent Latent Dirichlet Allocation (ILDA) model. This model is more natural for our task since the underlying probabilistic assumptions (interdependency between aspects and ratings) are appropriate for our problem domain. We conduct experiments on a real life dataset, Epinions.com, demonstrating the improved effectiveness of the ILDA model in terms of the likelihood of a held-out test set, and the accuracy of aspects and aspect ratings.

198 citations

Proceedings ArticleDOI
01 Aug 1994
TL;DR: This paper applies LSI to the routing task, which operates under the assumption that a sample of relevant and non-relevant documents is available to use in constructing the query, and finds that when LSI is used is conjuction with statistical classification, there is a dramatic improvement in performance.
Abstract: Latent Semantic Indexing (LSI) is a novel approach to information retrieval that attempts to model the underlying structure of term associations by transforming the traditional representation of documents as vectors of weighted term frequencies to a new coordinate space where both documents and terms are represented as linear combinations of underlying semantic factors. In previous research, LSI has produced a small improvement in retrieval performance. In this paper, we apply LSI to the routing task, which operates under the assumption that a sample of relevant and non-relevant documents is available to use in constructing the query. Once again, LSI slightly improves performance. However, when LSI is used is conjuction with statistical classification, there is a dramatic improvement in performance.

197 citations

Proceedings Article
01 Jan 1997
TL;DR: It is shown that, using semantic information, mixture LMs performs better than a conventional single LM with slight increase of computational cost and compared to manual clustering, this work builds on previous work in the eld of information retrieval.
Abstract: In this paper, an approach for constructing mixture language models (LMs) based on some notion of semantics is discussed. To this end, a technique known as latent semantic analysis (LSA) is used. The approach encapsulates corpus-derived semantic information and is able to model the varying style of the text. Using such information , the corpus texts are clustered in an unsuper-vised manner and mixture LMs are automatically created. This work builds on previous work in the eld of information retrieval which was recently applied by Bel-legarda et. al. to the problem of clustering words by semantic categories. The principal contribution of this work is to characterize the document space resulting from the LSA modeling and to demonstrate the approach for mixture LM application. Comparison is made between manual and automatic clustering in order to elucidate how the semantic information is expressed in the space. It is shown that, using semantic information, mixture LMs performs better than a conventional single LM with slight increase of computational cost.

192 citations

01 Jan 1985
TL;DR: A semantic data model describes the concepts that are important to an organization along Description: with their meanings and relationships to other important concepts and how the data relate to the real world.

191 citations

Book
27 Apr 2013
TL;DR: This chapter discusses Latent Trait Theory, a model for Latent Class Theory, and its applications to Criterion-Referenced Testing.
Abstract: and Overview.- I Latent Trait Theory.- 1 Measurement Models for Ordered Response Categories.- 2 Testing a Latent Trait Model.- 3 Latent Trait Models with Indicators of Mixed Measurement Level.- II Latent Class Theory.- 4 New Developments in Latent Class Theory.- 5 Log-Linear Modeling, Latent Class Analysis, or Correspondence Analysis: Which Method Should Be Used for the Analysis of Categorical Data?.- 6 A Latent Class Covariate Model with Applications to Criterion-Referenced Testing.- III Comparative Views of Latent Traits and Latent Classes.- 7 Test Theory with Qualitative and Quantitative Latent Variables.- 8 Latent Class Models for Measuring.- Chaffer 9 Comparison of Latent Structure Models.- IV Application Studies.- 10 Latent Variable Techniques for Measuring Development.- 11 Item Bias and Test Multidimensionality.- 12 On a Rasch-Model-Based Test for Noncomputerized Adaptive Testing.- 13 Systematizing the Item Content in Test Design.

190 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858