scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Journal ArticleDOI
06 Aug 2012
TL;DR: A new algorithm based on the LDA (Latent Dirichlet Allocation) and the Support Vector Machine (SVM) used in the classification of Arabic texts is presented, able to achieve high effectiveness for Arabic text classification task.
Abstract: In this paper, we present a new algorithm based on the LDA (Latent Dirichlet Allocation) and the Support Vector Machine (SVM) used in the classification of Arabic texts Current research usually adopts Vector Space Model to represent documents in Text Classification applications In this way, document is coded as a vector of words; n-grams These features cannot indicate semantic or textual content; it results in huge feature space and semantic loss The proposed model in this work adopts a “topics” sampled by LDA model as text features It effectively avoids the above problems We extracted significant themes (topics) of all texts, each theme is described by a particular distribution of descriptors, then each text is represented on the vectors of these topics Experiments are conducted using an in-house corpus of Arabic texts Precision, recall and F-measure are used to quantify categorization effectiveness The results show that the proposed LDA-SVM algorithm is able to achieve high effectiveness for Arabic text classification task (Macro-averaged F1 881% and Micro-averaged F1 914%)

50 citations

Book ChapterDOI
10 Apr 2014
TL;DR: Additive Regularization of Topic Models (ARTM) as mentioned in this paper is a non-Bayesian approach that is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models.
Abstract: Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models.

50 citations

Journal ArticleDOI
TL;DR: The results indicate that the proposed method provides an efficient and economic performance summary of a university and its competitors, and could help its leaders in recruitment and retention efforts.

50 citations

Proceedings Article
08 Jul 2012
TL;DR: A model that combines LDA and AT by representing authors and documents over two disjoint topic sets is defined, and it is shown that this model outperforms LDA, AT and support vector machines on datasets with many authors.
Abstract: Authorship attribution deals with identifying the authors of anonymous texts. Building on our earlier finding that the Latent Dirichlet Allocation (LDA) topic model can be used to improve authorship attribution accuracy, we show that employing a previously-suggested Author-Topic (AT) model outperforms LDA when applied to scenarios with many authors. In addition, we define a model that combines LDA and AT by representing authors and documents over two disjoint topic sets, and show that our model outperforms LDA, AT and support vector machines on datasets with many authors.

50 citations

Journal ArticleDOI
TL;DR: An approach based on the latent Dirichlet allocation (LDA), a probabilistic topic modeling method, is proposed and a practical system to provide users and municipal administrators of B-city with satisfying searching results and the longitudinal changing curves of related topics is designed.
Abstract: The explosion of online user-generated content (UGC) and the development of big data analysis provide a new opportunity and challenge to understand and respond to public opinions in the G2C e-government context.We proposed an approach based on the latent Dirichlet allocation (LDA) and designed a practical system to provide users with satisfying searching results and the longitudinal changing curves of related topics.Municipal administrators could better understand citizens' online comments based on the proposed semantic search approach and could improve their decision-making process by considering public opinions. The explosion of online user-generated content (UGC) and the development of big data analysis provide a new opportunity and challenge to understand and respond to public opinions in the G2C e-government context. To better understand semantic searching of public comments on an online platform for citizens' opinions about urban affairs issues, this paper proposed an approach based on the latent Dirichlet allocation (LDA), a probabilistic topic modeling method, and designed a practical system to provide users-municipal administrators of B-city-with satisfying searching results and the longitudinal changing curves of related topics. The system is developed to respond to actual demand from B-city's local government, and the user evaluation experiment results show that a system based on the LDA method could provide information that is more helpful to relevant staff members. Municipal administrators could better understand citizens' online comments based on the proposed semantic search approach and could improve their decision-making process by considering public opinions.

50 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446