Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Efficient Methods for Incorporating Knowledge into Topic Models

[...]

Yi Yang¹, Doug Downey¹, Jordan Boyd-Graber²•Institutions (2)

Northwestern University¹, University of Colorado Boulder²

01 Sep 2015

TL;DR: This work proposes a factor graph framework, Sparse Constrained LDA (SC-LDA), for efficiently incorporating prior knowledge into LDA, and evaluates its ability to incorporate word correlation knowledge and document label knowledge on three benchmark datasets.

...read moreread less

Abstract: Latent Dirichlet allocation (LDA) is a popular topic modeling technique for exploring hidden topics in text corpora. Increasingly, topic modeling needs to scale to larger topic spaces and use richer forms of prior knowledge, such as word correlations or document labels. However, inference is cumbersome for LDA models with prior knowledge. As a result, LDA models that use prior knowledge only work in small-scale scenarios. In this work, we propose a factor graph framework, Sparse Constrained LDA (SC-LDA), for efficiently incorporating prior knowledge into LDA. We evaluate SC-LDA’s ability to incorporate word correlation knowledge and document label knowledge on three benchmark datasets. Compared to several baseline methods, SC-LDA achieves comparable performance but is significantly faster.

...read moreread less

60 citations

Proceedings Article•DOI•

NUS-ML:Improving Word Sense Disambiguation Using Topic Features

[...]

Jun Fu Cai¹, Wee Sun Lee¹, Yee Whye Teh²•Institutions (2)

National University of Singapore¹, University College London²

23 Jun 2007

TL;DR: A topic feature is constructed, targeted to capture the global context information, using the latent dirichlet allocation (LDA) algorithm with unlabeled corpus, and a modified naive Bayes classifier is constructed to incorporate all the features.

...read moreread less

Abstract: We participated in SemEval-1 English coarse-grained all-words task (task 7), English fine-grained all-words task (task 17, subtask 3) and English coarse-grained lexical sample task (task 17, subtask 1). The same method with different labeled data is used for the tasks; SemCor is the labeled corpus used to train our system for the all-words tasks while the labeled corpus that is provided is used for the lexical sample task. The knowledge sources include part-of-speech of neighboring words, single words in the surrounding context, local collocations, and syntactic patterns. In addition, we constructed a topic feature, targeted to capture the global context information, using the latent dirichlet allocation (LDA) algorithm with unlabeled corpus. A modified naive Bayes classifier is constructed to incorporate all the features. We achieved 81.6%, 57.6%, 88.7% for coarse-grained all-words task, fine-grained all-words task and coarse-grained lexical sample task respectively.

...read moreread less

59 citations

Proceedings Article•DOI•

Annotating images and image objects using a hierarchical dirichlet process model

[...]

Oksana Yakhnenko¹, Vasant Honavar¹•Institutions (1)

Iowa State University¹

24 Aug 2008

TL;DR: A nonparametric Bayesian model which provides a generalization for multi-model latent Dirichlet allocation model (MoM-LDA) used for similar problems in the past and performs just as well as or better than the MoM- LDA model (regardless of the choice of the number of clusters) for predicting labels of objects in images containing multiple objects.

...read moreread less

Abstract: Many applications call for learning to label individual objects in an image where the only information available to the learner is a dataset of images with their associated captions, i.e., words that describe the image content without specifically labeling the individual objects. We address this problem using a multi-modal hierarchical Dirichlet process model (MoM-HDP) - a nonparametric Bayesian model which provides a generalization for multi-model latent Dirichlet allocation model (MoM-LDA) used for similar problems in the past. We apply this model for predicting labels of objects in images containing multiple objects. During training, the model has access to an un-segmented image and its caption, but not the labels for each object in the image. The trained model is used to predict the label for each region of interest in a segmented image. MoM-HDP generalizes a multi-modal latent Dirichlet allocation model in that it allows the number of components of the mixture model to adapt to the data. The model parameters are efficiently estimated using variational inference. Our experiments show that MoM-HDP performs just as well as or better than the MoM-LDA model (regardless the choice of the number of clusters in the MoM-LDA model).

...read moreread less

59 citations

Journal Article•DOI•

A Text-Based Analysis of Corporate Innovation

[...]

Gustaf Bellstam¹, Sanjai Bhagat², J. Anthony Cookson²•Institutions (2)

Facebook¹, University of Colorado Boulder²

01 Jul 2021-Management Science

TL;DR: A new measure of innovation is developed using the text of analyst reports of S&P 500 firms to give a useful description of innovation by firms with and without patenting and R&...

...read moreread less

Abstract: We develop a new measure of innovation using the text of analyst reports of S&P 500 firms. Our text-based measure gives a useful description of innovation by firms with and without patenting and R&...

...read moreread less

59 citations

Journal Article•DOI•

An unsupervised topic-sentiment joint probabilistic model for detecting deceptive reviews

[...]

Dong Luyu¹, Shujuan Ji², Shujuan Ji¹, Chunjin Zhang¹, Qi Zhang¹, Dickson K.W. Chiu³, Li-qing Qiu¹, Da Li¹ - Show less +4 more•Institutions (3)

Shandong University of Science and Technology¹, Shandong Normal University², University of Hong Kong³

30 Dec 2018-Expert Systems With Applications

TL;DR: The proposed unsupervised topic-sentiment joint probabilistic model (UTSJ) based on Latent Dirichlet Allocation (LDA) model is good at dealing with real-life unbalanced big data, which makes it very suitable for being applied in e-commerce environment.

...read moreread less

Abstract: In electronic commerce, online reviews play very important roles in customers’ purchasing decisions. Unfortunately, malicious sellers often hire buyers to fabricate fake reviews to improve their reputation. In order to detect deceptive reviews and mine the topics and sentiments from the reviews, in this paper, we propose an unsupervised topic-sentiment joint probabilistic model (UTSJ) based on Latent Dirichlet Allocation (LDA) model. This model first employs Gibbs sampling algorithm to approximate parameters of maximum likelihood function offline and obtain topic-sentiment joint probabilistic distribution vector for each review. Secondly, a Random Forest classifier and a SVM (Support Vector Machine) classifier are trained offline, respectively. Experimental results on real-life datasets show that our proposed model is better than baseline models such as n-grams, character n-grams in token, POS (part-of-speech), LDA, and JST (Joint Sentiment/Topic). Moreover, our UTSJ model outperforms or performs similarly to benchmark models in detecting deceptive reviews over balanced dataset and unbalanced dataset in different domains. Particularly, our UTSJ model is good at dealing with real-life unbalanced big data, which makes it very suitable for being applied in e-commerce environment.

...read moreread less

59 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics