scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
Xianghua Fu1, Xudong Sun1, Haiying Wu1, Laizhong Cui1, Joshua Zhexue Huang1 
TL;DR: A novel topic sentiment Joint model called weakly supervised topic sentiment joint model with word embeddings (WS-TSWE), which incorporates word embedDings and HowNet lexicon simultaneously to improve the topic identification and sentiment recognition.
Abstract: Topic sentiment joint model aims to deal with the problem about the mixture of topics and sentiment simultaneously from online reviews. Most of existing topic sentiment modeling algorithms are mainly based on the state-of-art latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA), which infer sentiment and topic distributions from the co-occurrence of words. These methods have been proposed and successfully used for topic and sentiment analysis. However, when the training corpus is small or when the documents are short, the textual features become sparse, so that the results of the sentiment and topic distributions might be not very satisfied. In this paper, we propose a novel topic sentiment joint model called weakly supervised topic sentiment joint model with word embeddings (WS-TSWE), which incorporates word embeddings and HowNet lexicon simultaneously to improve the topic identification and sentiment recognition. The main contributions of WS-TSWE include the following two aspects. (1) Existing models generate the words only from the sentiment-topic-to-word Dirichlet multinomial component, but the WS-TSWE model replaces it with a mixture of two components, a Dirichlet multinomial component and a word embeddings component. Since the word embeddings are trained on a very large corpora and can be used to extend the semantic information of the words, they can provide a certain solution for the problem of the textual sparse. (2) Most of previous models incorporate sentiment knowledge in the β priors. And the priors are usually set from a dictionary and completely rely on previous domain knowledge to identify positive and negative words. In contrast, the WS-TSWE model calculates the sentiment orientation of each word with the HowNet lexicon and automatically infers sentiment-based β priors for sentiment analysis and opinion mining. Furthermore, we implement WS-TSWE with Gibbs sampling algorithms. The experimental results on Chinese and English data sets show that WS-TSWE achieved significant performance in the task of detecting sentiment and topics simultaneously.

29 citations

Journal ArticleDOI
TL;DR: A latent process model for time series of attributed random graphs for characterizing multiple modes of association among a collection of actors over time and it is demonstrated that the analysis through approximation can provide valuable information regarding inference properties.
Abstract: We introduce a latent process model for time series of attributed random graphs for characterizing multiple modes of association among a collection of actors over time. Two mathematically tractable approximations are derived, and we examine the performance of a class of test statistics for an illustrative change-point detection problem and demonstrate that the analysis through approximation can provide valuable information regarding inference properties.

29 citations

Proceedings ArticleDOI
08 Jul 2009
TL;DR: A novel method to select representative photographs for regions in the worldwide dimensions, which helps detect cultural differences over the world regarding word concepts with high geo-location entropy is proposed.
Abstract: In this paper, we describe two methods to analyze the relationship between word concepts and geographical locations by using a large amount of geotagged images on the photo sharing Web sites such as Flickr.Firstly, we propose using both image region entropy and geolocation entropy to analyze relations between location and visual features, and in the experiment we found that concepts with low image entropy tends to have high geo-location entropy and vice versa.Secondly, we propose a novel method to select representative photographs for regions in the worldwide dimensions, which helps detect cultural differences over the world regarding word concepts with high geo-location entropy. In the proposed method, at first, we extracts the most relevant images by clustering and evaluation on the visual features. Then, based on geographic information of the images, representative regions are automatically detected. Finally, we select and generate a set of representative images for the representative regions by employing the Probabilistic Latent Semantic Analysis (PLSA) modelling. The results show the ability of our approach to mine regional representative photographs and cultural differences over the world.

29 citations

Journal ArticleDOI
TL;DR: An extension of the latent class model for the analysis of capture-recapture data which allows us to take into account the effect of a capture on the behavior of a subject with respect to future captures is proposed.
Abstract: We propose an extension of the latent class model for the analysis of capture-recapture data which allows us to take into account the effect of a capture on the behavior of a subject with respect to future captures. The approach is based on the assumption that the variable indexing the latent class of a subject follows a Markov chain with transition probabilities depending on the previous capture history. Several constraints are allowed on these transition probabilities and on the parameters of the conditional distribution of the capture configuration given the latent process. We also allow for the presence of discrete explanatory variables, which may affect the parameters of the latent process. To estimate the resulting models, we rely on the conditional maximum likelihood approach and for this aim we outline an EM algorithm. We also give some simple rules for point and interval estimation of the population size. The approach is illustrated by applying it to two data sets concerning small mammal populations.

29 citations

Journal ArticleDOI
TL;DR: The experimental results demonstrate the superiority in terms of accuracy of the oPLSA over well known PLSA updating methods, such as the PLSA folding-in (PLSA fold.), thePLSA rerun from the breakpoint, the quasi-Bayes PLSA, and the Incremental PLSA.
Abstract: A novel method is proposed for updating an already trained asymmetric and symmetric probabilistic latent semantic analysis (PLSA) model within the context of a varying document stream. The proposed method is coined online PLSA (oPLSA). The oPLSA employs a fixed-size moving window over a document stream to incorporate new documents and at the same time to discard old ones (i.e., documents that fall outside the scope of the window). In addition, the oPLSA assimilates new words that had not been previously seen (out-of-vocabulary words), and discards the words that exclusively appear in the documents to be thrown away. To handle the new words, Good-Turing estimates for the probabilities of unseen words are exploited. The experimental results demonstrate the superiority in terms of accuracy of the oPLSA over well known PLSA updating methods, such as the PLSA folding-in (PLSA fold.), the PLSA rerun from the breakpoint, the quasi-Bayes PLSA, and the Incremental PLSA. A comparison with respect to the CPU run time reveals that the oPLSA is the second fastest method after the PLSA fold. However, the better accuracy of the oPLSA than that of the PLSA fold. pays off the longer computation time. The oPLSA and the other PLSA updating methods together with online LDA are tested for document clustering and F1 scores are also reported.

29 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858