scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Journal ArticleDOI
Xianghua Fu1, Xudong Sun1, Haiying Wu1, Laizhong Cui1, Joshua Zhexue Huang1 
TL;DR: A novel topic sentiment Joint model called weakly supervised topic sentiment joint model with word embeddings (WS-TSWE), which incorporates word embedDings and HowNet lexicon simultaneously to improve the topic identification and sentiment recognition.
Abstract: Topic sentiment joint model aims to deal with the problem about the mixture of topics and sentiment simultaneously from online reviews. Most of existing topic sentiment modeling algorithms are mainly based on the state-of-art latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA), which infer sentiment and topic distributions from the co-occurrence of words. These methods have been proposed and successfully used for topic and sentiment analysis. However, when the training corpus is small or when the documents are short, the textual features become sparse, so that the results of the sentiment and topic distributions might be not very satisfied. In this paper, we propose a novel topic sentiment joint model called weakly supervised topic sentiment joint model with word embeddings (WS-TSWE), which incorporates word embeddings and HowNet lexicon simultaneously to improve the topic identification and sentiment recognition. The main contributions of WS-TSWE include the following two aspects. (1) Existing models generate the words only from the sentiment-topic-to-word Dirichlet multinomial component, but the WS-TSWE model replaces it with a mixture of two components, a Dirichlet multinomial component and a word embeddings component. Since the word embeddings are trained on a very large corpora and can be used to extend the semantic information of the words, they can provide a certain solution for the problem of the textual sparse. (2) Most of previous models incorporate sentiment knowledge in the β priors. And the priors are usually set from a dictionary and completely rely on previous domain knowledge to identify positive and negative words. In contrast, the WS-TSWE model calculates the sentiment orientation of each word with the HowNet lexicon and automatically infers sentiment-based β priors for sentiment analysis and opinion mining. Furthermore, we implement WS-TSWE with Gibbs sampling algorithms. The experimental results on Chinese and English data sets show that WS-TSWE achieved significant performance in the task of detecting sentiment and topics simultaneously.

29 citations

Posted Content
TL;DR: In this article, a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i.e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document, was developed.
Abstract: We develop a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i.e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document. Different from traditional variational learning or Gibbs sampling approaches, the proposed learning method applies (i) the mirror descent algorithm for maximum a posterior inference and (ii) back propagation over a deep architecture together with stochastic gradient/mirror descent for model parameter estimation, leading to scalable and end-to-end discriminative learning of the model. As a byproduct, we also apply this technique to develop a new learning method for the traditional unsupervised LDA model (i.e., BP-LDA). Experimental results on three real-world regression and classification tasks show that the proposed methods significantly outperform the previous supervised topic models, neural networks, and is on par with deep neural networks.

29 citations

Journal ArticleDOI
TL;DR: This paper investigates the adoption of a probabilistic approach based on the Latent Dirichlet Allocation (LDA) as Sentiment grabber for e-learning, and shows how this approach contains a set of weighted word pairs, which are discriminative for sentiment classification.
Abstract: The spread of social networks allows sharing opinions on different aspects of life and daily millions of messages appear on the web. This textual information can be a rich source of data for opinion mining and sentiment analysis: the computational study of opinions, sentiments and emotions expressed in a text. Its main aim is the identification of the agreement or disagreement statements that deal with positive or negative feelings in comments or reviews. In this paper, we investigate the adoption, in the field of the e-learning, of a probabilistic approach based on the Latent Dirichlet Allocation (LDA) as Sentiment grabber. By this approach, for a set of documents belonging to a same knowledge domain, a graph, the Mixed Graph of Terms, can be automatically extracted. The paper shows how this graph contains a set of weighted word pairs, which are discriminative for sentiment classification. In this way, the system can detect the feeling of students on some topics and teacher can better tune his/her teaching approach. In fact, the proposed method has been tested on datasets coming from e-learning platforms. A preliminary experimental campaign shows how the proposed approach is effective and satisfactory.

29 citations

Journal ArticleDOI
TL;DR: Quantitative evaluations showed that employing LDA-based topics as predictors within time series models reduces forecast error compared with naive, autoregressive integrated moving average, and exponential smoothing baselines.
Abstract: The Internet’s increasing use as a means of communication has led to the formation of cyber communities, which have become appealing to violent extremist (VE) groups. This paper presents research on forecasting the daily level of cyber-recruitment activity of VE groups. We used a previously developed support vector machine model to identify recruitment posts within a Western jihadist discussion forum. We analyzed the textual content of this data set with latent Dirichlet allocation (LDA), and we fed these analyses into a variety of time series models to forecast cyber-recruitment activity within the forum. Quantitative evaluations showed that employing LDA-based topics as predictors within time series models reduces forecast error compared with naive (random-walk), autoregressive integrated moving average, and exponential smoothing baselines. To the best of our knowledge, this is the first result reported on this forecasting task. This research could ultimately help assist with efficient allocation of intelligence analysts in response to predicted levels of cyber-recruitment activity.

29 citations

Journal ArticleDOI
TL;DR: This article proposes a novel method based on topic modelling to determine the latent aspects of online review documents called Concept-LDA, which achieves better topic representations than an LDA model alone, as measured by topic coherence and F-measure.
Abstract: Latent Dirichlet allocation (LDA) is one of the probabilistic topic models; it discovers the latent topic structure in a document collection. The basic assumption under LDA is that documents are vi...

29 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446