Home
/
Topics
/
Latent Dirichlet allocation

Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1992
1990
1989
1988
1985
1979
1976
1969
1965

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Measuring LDA Topic Stability from Clusters of Replicated Runs

[...]

Mika V. Mäntylä¹, Maëlick Claes¹, Umar Farooq¹•Institutions (1)

University of Oulu¹

24 Aug 2018-arXiv: Computation and Language

TL;DR: In this paper, the authors propose a method that relies on replicated LDA runs, clustering, and providing a stability metric for the topics, which makes LDA stability transparent and is also complementary rather than alternative to many prior works that focus on LDA parameter tuning.

...read moreread less

Abstract: Background: Unstructured and textual data is increasing rapidly and Latent Dirichlet Allocation (LDA) topic modeling is a popular data analysis methods for it. Past work suggests that instability of LDA topics may lead to systematic errors. Aim: We propose a method that relies on replicated LDA runs, clustering, and providing a stability metric for the topics. Method: We generate k LDA topics and replicate this process n times resulting in n*k topics. Then we use K-medioids to cluster the n*k topics to k clusters. The k clusters now represent the original LDA topics and we present them like normal LDA topics showing the ten most probable words. For the clusters, we try multiple stability metrics, out of which we recommend Rank-Biased Overlap, showing the stability of the topics inside the clusters. Results: We provide an initial validation where our method is used for 270,000 Mozilla Firefox commit messages with k=20 and n=20. We show how our topic stability metrics are related to the contents of the topics. Conclusions: Advances in text mining enable us to analyze large masses of text in software engineering but non-deterministic algorithms, such as LDA, may lead to unreplicable conclusions. Our approach makes LDA stability transparent and is also complementary rather than alternative to many prior works that focus on LDA parameter tuning.

...read moreread less

29 citations

Journal Article•DOI•

Learning and Updating of Uncertainty in Dirichlet Models

[...]

Enrique Castillo¹, Ali S. Hadi², Cristina Solares¹•Institutions (2)

University of Cantabria¹, Cornell University²

01 Jan 1997-Machine Learning

TL;DR: This paper obtains the most general family of prior-posterior distributions which is conjugate to a Dirichlet likelihood and identifies those hyperparameter that are influenced by data values and describes some methods to assess the prior hyperparameters.

...read moreread less

Abstract: In this paper we analyze the problem of learning and updating of uncertainty in Dirichlet models, where updating refers to determining the conditional distribution of a single variable when some evidence is known. We first obtain the most general family of prior-posterior distributions which is conjugate to a Dirichlet likelihood and we identify those hyperparameters that are influenced by data values. Next, we describe some methods to assess the prior hyperparameters and we give a numerical method to estimate the Dirichlet parameters in a Bayesian context, based on the posterior mode. We also give formulas for updating uncertainty by determining the conditional probabilities of single variables when the values of other variables are known. A time series approach is presented for dealing with the cases in which samples are not identically distributed, that is, the Dirichlet parameters change from sample to sample. This typically occurs when the population is observed at different times. Finally, two examples are given that illustrate the learning and updating processes and the time series approach.

...read moreread less

29 citations

Journal Article•DOI•

Bayesian analysis of multistate event history data: beta-Dirichlet process prior

[...]

Yongdai Kim¹, Lancelot F. James², Rafael Weissbach³•Institutions (3)

Seoul National University¹, Hong Kong University of Science and Technology², University of Rostock³

01 Mar 2012-Biometrika

TL;DR: In this paper, a new prior process, called a beta-Dirichlet process, is introduced for the cumulative intensity functions and is proved to be conjugate with a Bayesian semiparametric regression model.

...read moreread less

Abstract: Bayesian analysis of a finite state Markov process, which is popularly used to model multistate event history data, is considered. A new prior process, called a beta-Dirichlet process, is introduced for the cumulative intensity functions and is proved to be conjugate. In addition, the beta-Dirichlet prior is applied to a Bayesian semiparametric regression model. To illustrate the application of the proposed model, we analyse a dataset of credit histories. Copyright 2012, Oxford University Press.

...read moreread less

29 citations

Journal Article•DOI•

On mining latent topics from healthcare chat logs

[...]

Tingting Wang¹, Zhengxing Huang¹, Chenxi Gan¹•Institutions (1)

Zhejiang University¹

01 Jun 2016-Journal of Biomedical Informatics

TL;DR: A new probabilistic model is presented that exploits healthcare chat logs to find hidden topics and changes in these topics over time and shows that the performance of the proposed model exceeds that of the benchmark models.

...read moreread less

29 citations

Journal Article•DOI•

Bayesian semiparametric analysis of structural equation models with mixed continuous and unordered categorical variables.

[...]

Xinyuan Song¹, Ye-Mao Xia¹, Sik-Yum Lee¹•Institutions (1)

The Chinese University of Hong Kong¹

30 Jul 2009-Statistics in Medicine

TL;DR: A Bayesian semparametric SEM with covariates, and mixed continuous and unordered categorical variables, in which the explanatory latent variables in the structural equation are modeled via an appropriate truncated Dirichlet process with a stick-breaking procedure.

...read moreread less

Abstract: Recently, structural equation models (SEMs) have been applied for analyzing interrelationships among observed and latent variables in biological and medical research. Latent variables in these models are typically assumed to have a normal distribution. This article considers a Bayesian semparametric SEM with covariates, and mixed continuous and unordered categorical variables, in which the explanatory latent variables in the structural equation are modeled via an appropriate truncated Dirichlet process with a stick-breaking procedure. Results obtained from a simulation study and an analysis of a real medical data set are presented to illustrate the methodology.

...read moreread less

29 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
…
187
188
189
190
191
192
193
…
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics