scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Proceedings Article
23 Jun 2011
TL;DR: This paper proposes ways of employing Latent Dirichlet Allocation in authorship attribution and shows that this approach yields state-of-the-art performance for both a few and many candidate authors, in cases where these authors wrote enough texts to be modelled effectively.
Abstract: The problem of authorship attribution -- attributing texts to their original authors -- has been an active research area since the end of the 19th century, attracting increased interest in the last decade. Most of the work on authorship attribution focuses on scenarios with only a few candidate authors, but recently considered cases with tens to thousands of candidate authors were found to be much more challenging. In this paper, we propose ways of employing Latent Dirichlet Allocation in authorship attribution. We show that our approach yields state-of-the-art performance for both a few and many candidate authors, in cases where these authors wrote enough texts to be modelled effectively.

64 citations

Book ChapterDOI
15 Jul 2010
TL;DR: Experimental results show the proposed method can effectively push new questions to the best answerers, and interests of answerers are modeled with the mixture of the Language Model and the Latent Dirichlet Allocation model.
Abstract: Community question answering (CQA) has become a very popular web service to provide a platform for people to share knowledge. In current CQA services, askers post their questions to the system and wait for answerers to answer them passively. This procedure leads to several drawbacks. Since new questions are presented to all users in the system, the askers can not expect some experts to answer their questions. Meanwhile, answerers have to visit many questions and then pick out only a small part of them to answer. To overcome those drawbacks, a probabilistic framework is proposed to predict best answerers for new questions. By tracking answerers' answering history, interests of answerers are modeled with the mixture of the Language Model and the Latent Dirichlet Allocation model. User activity and authority information is also taken into consideration. Experimental results show the proposed method can effectively push new questions to the best answerers.

64 citations

Proceedings ArticleDOI
03 Nov 2014
TL;DR: Wang et al. as mentioned in this paper proposed an LDA-based opinion model named Twitter Opinion Topic Model (TOTM) for opinion mining and sentiment analysis, which leverages hashtags, mentions, emoticons and strong sentiment words that are present in tweets in its discovery process.
Abstract: Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are laden with opinions, their "dirty" nature (as natural language) has discouraged researchers from applying LDA-based opinion model for product review mining. Tweets are often informal, unstructured and lacking labeled data such as categories and ratings, making it challenging for product opinion mining. In this paper, we propose an LDA-based opinion model named Twitter Opinion Topic Model (TOTM) for opinion mining and sentiment analysis. TOTM leverages hashtags, mentions, emoticons and strong sentiment words that are present in tweets in its discovery process. It improves opinion prediction by modeling the target-opinion interaction directly, thus discovering target specific opinion words, neglected in existing approaches. Moreover, we propose a new formulation of incorporating sentiment prior information into a topic model, by utilizing an existing public sentiment lexicon. This is novel in that it learns and updates with the data. We conduct experiments on 9 million tweets on electronic products, and demonstrate the improved performance of TOTM in both quantitative evaluations and qualitative analysis. We show that aspect-based opinion analysis on massive volume of tweets provides useful opinions on products.

64 citations

Journal ArticleDOI
TL;DR: An unsupervised dependency analysis-based approach is presented to extract Appraisal Expression Patterns (AEPs) from reviews, which represent the manner in which people express opinions regarding products or services and can be regarded as a condensed representation of the syntactic relationship between aspect and sentiment words.
Abstract: With the considerable growth of user-generated content, online reviews are becoming extremely valuable sources for mining customers' opinions on products and services However, most of the traditional opinion mining methods are coarse-grained and cannot understand natural languages Thus, aspect-based opinion mining and summarization are of great interest in academic and industrial research In this paper, we study an approach to extract product and service aspect words, as well as sentiment words, automatically from reviews An unsupervised dependency analysis-based approach is presented to extract Appraisal Expression Patterns (AEPs) from reviews, which represent the manner in which people express opinions regarding products or services and can be regarded as a condensed representation of the syntactic relationship between aspect and sentiment words AEPs are high-level, domain-independent types of information, and have excellent domain adaptability An AEP-based Latent Dirichlet Allocation (AEP-LDA) model is also proposed This is a sentence-level, probabilistic generative model which assumes that all words in a sentence are drawn from one topic – a generally true assumption, based on our observation The model also assumes that every review corpus is composed of several mutually corresponding aspect and sentiment topics, as well as a background word topic The AEP information is incorporated into the AEP-LDA model for mining aspect and sentiment words simultaneously The experimental results on reviews of restaurants, hotels, MP3 players, and cameras show that the AEP-LDA model outperforms other approaches in identifying aspect and sentiment words

63 citations

Journal ArticleDOI
TL;DR: This paper presents a fully Bayesian approach for generalized Dirichlet mixtures estimation and selection, based on the Monte Carlo simulation technique of Gibbs sampling mixed with a Metropolis-Hastings step, and obtains a posterior distribution which is conjugate to a generalizedDirichlet likelihood.
Abstract: In this paper, we present a fully Bayesian approach for generalized Dirichlet mixtures estimation and selection. The estimation of the parameters is based on the Monte Carlo simulation technique of Gibbs sampling mixed with a Metropolis-Hastings step. Also, we obtain a posterior distribution which is conjugate to a generalized Dirichlet likelihood. For the selection of the number of clusters, we used the integrated likelihood. The performance of our Bayesian algorithm is tested and compared with the maximum likelihood approach by the classification of several synthetic and real data sets. The generalized Dirichlet mixture is also applied to the problems of IR eye modeling and introduced as a probabilistic kernel for Support Vector Machines.

63 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446