Home
/
Topics
/
Latent Dirichlet allocation

Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1992
1990
1989
1988
1985
1979
1976
1969
1965

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A density-based method for adaptive LDA model selection

[...]

Juan Cao¹, Xia Tian¹, Jintao Li¹, Yongdong Zhang¹, Sheng Tang¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

01 Mar 2009-Neurocomputing

TL;DR: A method of adaptively selecting the best LDA model based on density is proposed, and experiments show that the proposed method can achieve performance matching the best of LDA without manually tuning the number of topics.

...read moreread less

497 citations

Book Chapter•DOI•

On finding the natural number of topics with latent dirichlet allocation: some observations

[...]

R. Arun¹, V. Suresh¹, C. E. Veni Madhavan¹, M. Narasimha Murthy¹•Institutions (1)

Indian Institute of Science¹

21 Jun 2010

TL;DR: A measure to identify the correct number of topics in mechanisms like Latent Dirichlet Allocation and offer empirical evidence in its favor in terms of classification accuracy and the number of Topics that are naturally present in the corpus is proposed.

...read moreread less

Abstract: It is important to identify the “correct” number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M1 and M2 as given by Cd*w = M1d*t x Qt*w. Where d is the number of documents present in the corpus and w is the size of the vocabulary. The quality of the split depends on “t”, the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics – this is shown by a 'dip' at the right value for 't'.

...read moreread less

494 citations

Proceedings Article•DOI•

Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality

[...]

Jey Han Lau¹, David Newman², Timothy Baldwin³•Institutions (3)

King's College London¹, Google², University of Melbourne³

01 Apr 2014

TL;DR: This work explores the two tasks of automatic Evaluation of single topics and automatic evaluation of whole topic models, and provides recommendations on the best strategy for performing the two task, in addition to providing an open-source toolkit for topic and topic model evaluation.

...read moreread less

Abstract: Topic models based on latent Dirichlet allocation and related methods are used in a range of user-focused tasks including document navigation and trend analysis, but evaluation of the intrinsic quality of the topic model and topics remains an open research area. In this work, we explore the two tasks of automatic evaluation of single topics and automatic evaluation of whole topic models, and provide recommendations on the best strategy for performing the two tasks, in addition to providing an open-source toolkit for topic and topic model evaluation.

...read moreread less

493 citations

Journal Article•DOI•

Topic and role discovery in social networks with experiments on enron and academic email

[...]

Andrew McCallum¹, Xuerui Wang¹, Andres Corrada-Emmanuel¹•Institutions (1)

University of Massachusetts Amherst¹

01 Sep 2007-Journal of Artificial Intelligence Research

TL;DR: The Author-Recipient-Topic model for social network analysis, which learns topic distributions based on the direction-sensitive messages sent between entities, is presented and results are given, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people's roles and gives lower perplexity on previously unseen messages.

...read moreread less

Abstract: Previous work in social network analysis (SNA) has modeled the existence of links from one entity to another, but not the attributes such as language content or topics on those links. We present the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the direction-sensitive messages sent between entities. The model builds on Latent Dirichlet Allocation (LDA) and the Author-Topic (AT) model, adding the key attribute that distribution over topics is conditioned distinctly on both the sender and recipient--steering the discovery of topics according to the relationships between people. We give results on both the Enron email corpus and a researcher's email archive, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people's roles and gives lower perplexity on previously unseen messages. We also present the Role-Author-Recipient-Topic (RART) model, an extension to ART that explicitly represents people's roles.

...read moreread less

484 citations

Journal Article•DOI•

Sampling the Dirichlet Mixture Model with Slices

[...]

Stephen G. Walker¹•Institutions (1)

University of Kent¹

29 May 2007-Communications in Statistics - Simulation and Computation

TL;DR: The key to the algorithm detailed in this article, which also keeps the random distribution functions, is the introduction of a latent variable which allows a finite number of objects to be sampled within each iteration of a Gibbs sampler.

...read moreread less

Abstract: We provide a new approach to the sampling of the well known mixture of Dirichlet process model. Recent attention has focused on retention of the random distribution function in the model, but sampling algorithms have then suffered from the countably infinite representation these distributions have. The key to the algorithm detailed in this article, which also keeps the random distribution functions, is the introduction of a latent variable which allows a finite number, which is known, of objects to be sampled within each iteration of a Gibbs sampler.

...read moreread less

482 citations

1
2
3
4
5
6
7
…
8
9
10
11
12
13
14
…
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics