Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Improving Topic Models with Latent Feature Word Representations

[...]

Dat Quoc Nguyen¹, Richard Billingsley¹, Lan Du¹, Mark Johnson¹•Institutions (1)

Macquarie University¹

02 Jun 2015-Transactions of the Association for Computational Linguistics

TL;DR: This article extended two Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus.

...read moreread less

Abstract: Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

...read moreread less

276 citations

Journal Article•DOI•

Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation

[...]

M. Lienou¹, H. Maitre¹, Mihai Datcu•Institutions (1)

Télécom ParisTech¹

01 Jan 2010-IEEE Geoscience and Remote Sensing Letters

TL;DR: This annotation task combines a step of supervised classification of patches of the large image and the integration of the spatial information between these patches, using the maximum-likelihood method.

...read moreread less

Abstract: In this letter, we are interested in the annotation of large satellite images, using semantic concepts defined by the user. This annotation task combines a step of supervised classification of patches of the large image and the integration of the spatial information between these patches. Given a training set of images for each concept, learning is based on the latent Dirichlet allocation (LDA) model. This hierarchical model represents each item of a collection as a random mixture of latent topics, where each topic is characterized by a distribution over words. The LDA-based image representation is obtained using simple features extracted from image words. We then exploit the capability of the LDA model to assign probabilities to unseen images, in order to classify the patches of the large image into the semantic concepts, using the maximum-likelihood method. We conduct experiments on panchromatic QuickBird images with 60-cm resolution. Taking into account the spatial information between the patches shows to improve the annotation performance.

...read moreread less

274 citations

Proceedings Article•DOI•

How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms

[...]

Annibale Panichella¹, Bogdan Dit², Rocco Oliveto³, Massimilano Di Penta⁴, Denys Poshynanyk², Andrea De Lucia¹ - Show less +2 more•Institutions (4)

University of Salerno¹, College of William & Mary², University of Molise³, University of Sannio⁴

18 May 2013

TL;DR: A novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks is proposed.

...read moreread less

Abstract: Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar manner as they were used on natural language documents (e.g., using the same settings and parameters) because the underlying assumption was that source code and natural language documents are similar. However, applying topic models on software data using the same settings as for natural language text did not always produce the expected results. Recent research investigated this assumption and showed that source code is much more repetitive and predictable as compared to the natural language text. Our paper builds on this new fundamental finding and proposes a novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks. Our paper introduces a novel solution called LDA-GA, which uses Genetic Algorithms (GA) to determine a near-optimal configuration for LDA in the context of three different SE tasks: (1) traceability link recovery, (2) feature location, and (3) software artifact labeling. The results of our empirical studies demonstrate that LDA-GA is ableto identify robust LDA configurations, which lead to a higher accuracy on all the datasets for these SE tasks as compared to previously published results, heuristics, and the results of a combinatorial search.

...read moreread less

272 citations

Journal Article•DOI•

A Spectral Algorithm for Latent Dirichlet Allocation

[...]

Animashree Anandkumar¹, Yi-Kai Liu², Daniel Hsu³, Dean P. Foster⁴, Sham M. Kakade⁵ - Show less +1 more•Institutions (5)

University of California, Irvine¹, National Institute of Standards and Technology², Columbia University³, Yahoo!⁴, Microsoft⁵

03 Dec 2012

TL;DR: This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA).

...read moreread less

Abstract: Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.

...read moreread less

271 citations

Journal Article•DOI•

Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec

[...]

Donghwa Kim¹, Deokseong Seo¹, Suhyoun Cho¹, Pilsung Kang¹•Institutions (1)

Korea University¹

01 Mar 2019-Information Sciences

TL;DR: This paper transforms a document using three document representation methods: term frequency–inverse document frequency (TF–IDF) based on the bag-of-words scheme, topic distribution based on latent Dirichlet allocation (LDA), and neural-network-based document embedding known as document to vector (Doc2Vec).

...read moreread less

270 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics