Home
/
Topics
/
Latent Dirichlet allocation

Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1992
1990
1989
1988
1985
1979
1976
1969
1965

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A powerful finite mixture model based on the generalized Dirichlet distribution: unsupervised learning and applications

[...]

Nizar Bouguila¹, Djemel Ziou¹•Institutions (1)

Université de Sherbrooke¹

23 Aug 2004

TL;DR: A new finite mixture model based on a generalization of the Dirichlet distribution is presented, which involves the comparison of the performance of Gaussian and generalizedDirichlet mixtures in the classification of several pattern-recognition data sets.

...read moreread less

Abstract: This paper presents a new finite mixture model based on a generalization of the Dirichlet distribution. For the estimation of the parameters of this mixture we use a GEM (generalized expectation maximization) algorithm based on a Newton-Raphson step. The experimental results involve the comparison of the performance of Gaussian and generalized Dirichlet mixtures in the classification of several pattern-recognition data sets.

...read moreread less

51 citations

Proceedings Article•DOI•

Text Classification with Topic-based Word Embedding and Convolutional Neural Networks

[...]

Haotian Xu¹, Ming Dong¹, Dongxiao Zhu¹, Alexander Kotov¹, April Idalski Carcone¹, Sylvie Naar-King¹ - Show less +2 more•Institutions (1)

Wayne State University¹

02 Oct 2016

TL;DR: This paper proposes a novel neural language model, Topic-based Skip-gram, to learn topic-based word embeddings for biomedical literature indexing with CNNs, and describes two multimodal CNN architectures, which are able to employ different kinds of wordembeddings at the same time for text classification.

...read moreread less

Abstract: Recently, distributed word embeddings trained by neural language models are commonly used for text classification with Convolutional Neural Networks (CNNs). In this paper, we propose a novel neural language model, Topic-based Skip-gram, to learn topic-based word embeddings for biomedical literature indexing with CNNs. Topic-based Skip-gram leverages textual content with topic models, e.g., Latent Dirichlet Allocation (LDA), to capture precise topic-based word relationship and then integrate it into distributed word embedding learning. We then describe two multimodal CNN architectures, which are able to employ different kinds of word embeddings at the same time for text classification. Through extensive experiments conducted on several real-world datasets, we demonstrate that combination of our Topic-based Skip-gram and multimodal CNN architectures outperforms state-of-the-art methods in biomedical literature indexing, clinical note annotation and general textual benchmark dataset classification.

...read moreread less

51 citations

Online Courses Recommendation based on LDA.

[...]

Rel Guzman Apaza, Elizabeth Vera Cervantes, Laura V. C. Quispe, José Eduardo Ochoa Luna

01 Jan 2014

TL;DR: A course recommendation system based on historical grades of students in college, able to recommend available courses in sites such as: Coursera, Udacity, Edx, etc, is proposed.

...read moreread less

Abstract: In this paper we propose a course recommendation system based on historical grades of students in college. Our model will be able to recommend available courses in sites such as: Coursera, Udacity, Edx, etc. To do so, probabilistic topic models are used as follows. On one hand, Latent Dirichlet Allocation (LDA) topic model infers topics from content given in a college course syllabus. On the other hand, topics are also extracted from a massive online open course (MOOC) syllabus. These two sets of topics and grading information are matched using a content based recommendation system so as to recommend relevant online courses to students. Preliminary results show suitability of our approach.

...read moreread less

51 citations

Posted Content•

WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling

[...]

Hao Zhang¹, Bo Chen¹, Dandan Guo¹, Mingyuan Zhou²•Institutions (2)

Xidian University¹, University of Texas at Austin²

04 Mar 2018-arXiv: Machine Learning

TL;DR: Weibull hybrid autoencoding inference (WHAI) as mentioned in this paper infers posterior samples via a hybrid of stochastic gradient MCMC and auto-encoding variational Bayes.

...read moreread less

Abstract: To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarchy of gamma distributions, while the inference network of WHAI is a Weibull upward-downward variational autoencoder, which integrates a deterministic-upward deep neural network, and a stochastic-downward deep generative model based on a hierarchy of Weibull distributions. The Weibull distribution can be used to well approximate a gamma distribution with an analytic Kullback-Leibler divergence, and has a simple reparameterization via the uniform noise, which help efficiently compute the gradients of the evidence lower bound with respect to the parameters of the inference network. The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora.

...read moreread less

51 citations

Proceedings Article•

Fast search for Dirichlet process mixture models

[...]

Hal Daumé

11 Mar 2007

TL;DR: Experimental results show that using these techniques is it possible to apply DP mixture models to very large data sets and that search algorithms provide a practical alternative to expensive MCMC and variational techniques.

...read moreread less

Abstract: Dirichlet process (DP) mixture models provide a flexible Bayesian framework for density estimation. Unfortunately, their flexibility comes at a cost: inference in DP mixture models is computationally expensive, even when conjugate distributions are used. In the common case when one seeks only a maximum a posteriori assignment of data points to clusters, we show that search algorithms provide a practical alternative to expensive MCMC and variational techniques. When a true posterior sample is desired, the solution found by search can serve as a good initializer for MCMC. Experimental results show that using these techniques is it possible to apply DP mixture models to very large data sets.

...read moreread less

51 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
…
115
116
117
118
119
120
121
…
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics