Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

[...]

W. Ben Towne¹, Carolyn Penstein Rosé¹, James D. Herbsleb¹•Institutions (1)

Carnegie Mellon University¹

26 Sep 2016-ACM Transactions on Intelligent Systems and Technology

TL;DR: A method for validating Latent Dirichlet Allocation algorithms against human perceptions of similarity, especially applicable to contexts in which the algorithm is intended to support navigability between similar documents via dynamically generated hyperlinks is contributed.

...read moreread less

Abstract: Several intelligent technologies designed to improve navigability in and digestibility of text corpora use topic modeling such as the state-of-the-art Latent Dirichlet Allocation (LDA). This model and variants on it provide lower-dimensional document representations used in visualizations and in computing similarity between documents. This article contributes a method for validating such algorithms against human perceptions of similarity, especially applicable to contexts in which the algorithm is intended to support navigability between similar documents via dynamically generated hyperlinks. Such validation enables researchers to ground their methods in context of intended use instead of relying on assumptions of fit. In addition to the methodology, this article presents the results of an evaluation using a corpus of short documents and the LDA algorithm. We also present some analysis of potential causes of differences between cases in which this model matches human perceptions of similarity more or less well.

...read moreread less

30 citations

Journal Article•DOI•

A novel topic feature for image scene classification

[...]

Mujun Zang¹, Dunwei Wen², Ke Wang¹, Tong Liu¹, Weiwei Song¹ - Show less +1 more•Institutions (2)

Jilin University¹, Athabasca University²

19 Jan 2015-Neurocomputing

TL;DR: The results show that the approach is capable of classifying the scene classes with a higher accuracy than the other topic models and pooling methods without using spatial information, and that the performance improvement is due to the proposed feature and the algorithm, rather than theother factors such as additional low-level image features and stronger preprocessing.

...read moreread less

30 citations

Journal Article•DOI•

A Systematic Literature Review on Using Machine Learning Algorithms for Software Requirements Identification on Stack Overflow

[...]

Arshad Ahmad¹, Chong Feng¹, Muzammil Khan², Asif Khan¹, Ayaz Ullah, Shah Nazir, Adnan Tahir³ - Show less +3 more•Institutions (3)

Beijing Institute of Technology¹, University of Swat², Shenzhen University³

15 Jul 2020-Security and Communication Networks

TL;DR: The SLR study revealed that while ML algorithms have phenomenal capabilities of identifying the software requirements on SO, they still are confronted with various open problems/issues that will eventually limit their practical applications and performances.

...read moreread less

Abstract: Context. The improvements made in the last couple of decades in the requirements engineering (RE) processes and methods have witnessed a rapid rise in effectively using diverse machine learning (ML) techniques to resolve several multifaceted RE issues. One such challenging issue is the effective identification and classification of the software requirements on Stack Overflow (SO) for building quality systems. The appropriateness of ML-based techniques to tackle this issue has revealed quite substantial results, much effective than those produced by the usual available natural language processing (NLP) techniques. Nonetheless, a complete, systematic, and detailed comprehension of these ML based techniques is considerably scarce. Objective. To identify or recognize and classify the kinds of ML algorithms used for software requirements identification primarily on SO. Method. This paper reports a systematic literature review (SLR) collecting empirical evidence published up to May 2020. Results. This SLR study found 2,484 published papers related to RE and SO. The data extraction process of the SLR showed that (1) Latent Dirichlet Allocation (LDA) topic modeling is among the widely used ML algorithm in the selected studies and (2) precision and recall are amongst the most commonly utilized evaluation methods for measuring the performance of these ML algorithms. Conclusion. Our SLR study revealed that while ML algorithms have phenomenal capabilities of identifying the software requirements on SO, they still are confronted with various open problems/issues that will eventually limit their practical applications and performances. Our SLR study calls for the need of close collaboration venture between the RE and ML communities/researchers to handle the open issues confronted in the development of some real world machine learning-based quality systems.

...read moreread less

30 citations

Proceedings Article•DOI•

Robust Audio-Codebooks for Large-Scale Event Detection in Consumer Videos

[...]

Shourabh Rawat¹, Peter Schulam¹, Susanne Burger¹, Duo Ding¹, Yipei Wang¹, Florian Metze¹ - Show less +2 more•Institutions (1)

Carnegie Mellon University¹

25 Aug 2013

TL;DR: This work empirically evaluate several approaches to model expressive and robust audio codebooks for the task of MED while ensuring compactness and applies text based techniques like Latent Dirichlet Allocation to learn acoustictopics as a means of providing compact representation while maintaining performance.

...read moreread less

Abstract: In this paper we present our audio based system for detecting “events” within consumer videos (e.g. You Tube) and report our experiments on the TRECVID Multimedia Event Detection (MED) task and development data. Codebook or bag-of-words models have been widely used in text, visual and audio domains and form the state-of-the-art in MED tasks. The overall effectiveness of these models on such datasets depends critically on the choice of low-level features, clustering approach, sampling method, codebook size, weighting schemes and choice of classifier. In this work we empirically evaluate several approaches to model expressive and robust audio codebooks for the task of MED while ensuring compactness. First, we introduce the Large Scale Pooling Features (LSPF) and Stacked Cepstral Features for encoding local temporal information in audio codebooks. Second, we discuss several design decisions for generating and representing expressive audio codebooks and show how they scale to large datasets. Third, we apply text based techniques like Latent Dirichlet Allocation (LDA) to learn acoustictopics as a means of providing compact representation while maintaining performance. By aggregating these decisions into our model, we obtained 11% relative improvement over our baseline audio systems.

...read moreread less

30 citations

Proceedings Article•

Evaluating Models of Latent Document Semantics in the Presence of OCR Errors

[...]

Daniel David Walker¹, William B. Lund¹, Eric K. Ringger¹•Institutions (1)

Brigham Young University¹

09 Oct 2010

TL;DR: This study endeavors to understand the effect that character-level noise can have on unsupervised topic modeling in noisy optical character recognition (OCR) text output, and shows the effects both with document-level topic analysis (document clustering) and with word-level topics analysis (LDA) on both synthetic and real-world OCR data.

...read moreread less

Abstract: Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover topical semantics in large collections of text. In an effort to apply such models to noisy optical character recognition (OCR) text output, we endeavor to understand the effect that character-level noise can have on unsupervised topic modeling. We show the effects both with document-level topic analysis (document clustering) and with word-level topic analysis (LDA) on both synthetic and real-world OCR data. As expected, experimental results show that performance declines as word error rates increase. Common techniques for alleviating these problems, such as filtering low-frequency words, are successful in enhancing model quality, but exhibit failure trends similar to models trained on unprocessed OCR output in the case of LDA. To our knowledge, this study is the first of its kind.

...read moreread less

30 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics