scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Proceedings ArticleDOI
01 May 1999
TL;DR: Latent SemanticAnalysis (LSA) was used to compute semantic similarity between task descriptions and labels in an applications menu system and when the labels in the menus system were semantically similar to the task descriptions, subjects performed the tasks faster.
Abstract: Models of learning and performing by exploration assume that the semantic similarity between task descriptions and labels on display objects (e.g., menus, tool bars) controls in part the users search strategies. Nevertheless, none of the models has an objective way to compute semantic similarity. In this study, Latent Semantic Analysis (LSA) was used to compute semantic similarity between task descriptions and labels in an applications menu system. Participants performed twelve tasks by exploration and they were tested for recall after a l-week delay. When the labels in the menu system were semantically similar to the task descriptions, subjects performed the tasks faster. LSA could be incorporated into any of the current models, and it could be used to automate the evaluation of computer applications for ease of learning and performing by exploration.

26 citations

Journal ArticleDOI
TL;DR: A model-based clustering approach with mixed binary and continuous variables where each binary attribute is generated by a latent continuous variable that is dichotomized with a suitable threshold value, and where the scores of the latent variables are estimated from the binary data is proposed.
Abstract: For clustering objects, we often collect not only continuous variables, but binary attributes as well. This paper proposes a model-based clustering approach with mixed binary and continuous variables where each binary attribute is generated by a latent continuous variable that is dichotomized with a suitable threshold value, and where the scores of the latent variables are estimated from the binary data. In economics, such variables are called utility functions and the assumption is that the binary attributes (the presence or the absence of a public service or utility) are determined by low and high values of these functions. In genetics, the latent response is interpreted as the `liability' to develop a qualitative trait or phenotype. The estimated scores of the latent variables, together with the observed continuous ones, allow to use a multivariate Gaussian mixture model for clustering, instead of using a mixture of discrete and continuous distributions. After describing the method, this paper presents the results of both simulated and real-case data and compares the performances of the multivariate Gaussian mixture model and of a mixture of joint multivariate and multinomial distributions. Results show that the former model outperforms the mixture model for variables with different scales, both in terms of classification error rate and reproduction of the clusters means.

26 citations

Proceedings Article
23 Aug 2010
TL;DR: A significant performance improvement over contextual and semantic features was observed after adding word emotion components as feature, demonstrating the effectiveness of using semantic feature for word emotion recognition.
Abstract: Emotion words have been well used as the most obvious choice as feature in the task of textual emotion recognition and automatic emotion lexicon construction. In this work, we explore features for recognizing word emotion. Based on Ren-CECps (an annotated emotion corpus) and MaxEnt (Maximum entropy) model, several contextual features and their combination have been experimented. Then PLSA (probabilistic latent semantic analysis) is used to get semantic feature by clustering words and sentences. The experimental results demonstrate the effectiveness of using semantic feature for word emotion recognition. After that, "word emotion components" is proposed to describe the combined basic emotions in a word. A significant performance improvement over contextual and semantic features was observed after adding word emotion components as feature.

26 citations

Proceedings ArticleDOI
30 Mar 2008
TL;DR: The proposed approach iteratively labels the unlabeled documents and estimates the probabilities of its labeling errors, which are then taken into account in the estimation of the new model parameters before the next round.
Abstract: This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model [6] for text classification where the training set is partially labeled. The proposed approach iteratively labels the unlabeled documents and estimates the probabilities of its labeling errors. These probabilities are then taken into account in the estimation of the new model parameters before the next round. Our approach outperforms an earlier semi-supervised extension of PLSA introduced by [9] which is based on the use of fake labels. However, it maintains its simplicity and ability to solve multiclass problems. In addition, it gives valuable information about the most uncertain and difficult classes to label. We perform experiments over the 20Newsgroups, WebKB and Reuters document collections and show the effectiveness of our approach over two other semi-supervised algorithms applied to these text classification problems.

26 citations

Journal ArticleDOI
TL;DR: A Bayesian model is presented for the estimation of latent nonlinear effects when the latent predictor variables are nonnormally distributed and the nonnormal predictor distribution is approximated by a finite mixture distribution.
Abstract: Structural equation models with interaction and quadratic effects have become a standard tool for testing nonlinear hypotheses in the social sciences. Most of the current approaches assume normally distributed latent predictor variables. In this article, we present a Bayesian model for the estimation of latent nonlinear effects when the latent predictor variables are nonnormally distributed. The nonnormal predictor distribution is approximated by a finite mixture distribution. We conduct a simulation study that demonstrates the advantages of the proposed Bayesian model over contemporary approaches (Latent Moderated Structural Equations [LMS], Quasi-Maximum-Likelihood [QML], and the extended unconstrained approach) when the latent predictor variables follow a nonnormal distribution. The conventional approaches show biased estimates of the nonlinear effects; the proposed Bayesian model provides unbiased estimates. We present an empirical example from work and stress research and provide syntax for substanti...

26 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858