Home
/
Topics
/
Latent Dirichlet allocation

Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1992
1990
1989
1988
1985
1979
1976
1969
1965

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A topic-enhanced word embedding for Twitter sentiment classification

[...]

Yafeng Ren¹, Ruimin Wang¹, Donghong Ji¹•Institutions (1)

Wuhan University¹

10 Nov 2016-Information Sciences

TL;DR: Experimental results on the dataset show that topic-enhanced word embedding is very effective for Twitter sentiment classification.

...read moreread less

121 citations

Posted Content•

TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

[...]

Adji B. Dieng¹, Chong Wang², Jianfeng Gao³, John Paisley¹•Institutions (3)

Columbia University¹, Chinese Academy of Sciences², Microsoft³

05 Nov 2016-arXiv: Computation and Language

TL;DR: The authors proposed TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via Latent Topic models.

...read moreread less

Abstract: In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence - both semantic and syntactic - but might face difficulty remembering long-range dependencies. Intuitively, these long-range dependencies are of semantic nature. In contrast, latent topic models are able to capture the global underlying semantic structure of a document but do not account for word ordering. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics. Unlike previous work on contextual RNN language modeling, our model is learned end-to-end. Empirical results on word prediction show that TopicRNN outperforms existing contextual RNN baselines. In addition, TopicRNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis on the IMDB movie review dataset and report an error rate of $6.28\%$. This is comparable to the state-of-the-art $5.91\%$ resulting from a semi-supervised approach. Finally, TopicRNN also yields sensible topics, making it a useful alternative to document models such as latent Dirichlet allocation.

...read moreread less

120 citations

Proceedings Article•

TopicTiling: A Text Segmentation Algorithm based on LDA

[...]

Martin Riedl¹, Chris Biemann¹•Institutions (1)

Technische Universität Darmstadt¹

09 Jul 2012

TL;DR: This work presents a Text Segmentation algorithm called TopicTiling, which is based on the well-known TextTiling algorithm, and segments documents using the Latent Dirichlet Allocation topic model, and is computationally less expensive than other LDA-based segmentation methods.

...read moreread less

Abstract: This work presents a Text Segmentation algorithm called TopicTiling. This algorithm is based on the well-known TextTiling algorithm, and segments documents using the Latent Dirichlet Allocation (LDA) topic model. We show that using the mode topic ID assigned during the inference method of LDA, used to annotate unseen documents, improves performance by stabilizing the obtained topics. We show significant improvements over state of the art segmentation algorithms on two standard datasets. As an additional benefit, TopicTiling performs the segmentation in linear time and thus is computationally less expensive than other LDA-based segmentation methods.

...read moreread less

119 citations

Journal Article•DOI•

Customer segmentation of multiple category data in e-commerce using a soft-clustering approach

[...]

Roung-Shiunn Wu¹, Po-Hsuan Chou¹•Institutions (1)

National Chung Cheng University¹

01 May 2011-Electronic Commerce Research and Applications

TL;DR: A soft clustering method that uses a latent mixed-class membership clustering approach to classify online customers based on their purchasing data across categories, and yields more promising results than hard clustering and greater within-segment clustering quality than the finite mixture model.

...read moreread less

119 citations

Posted Content•

Stochastic Variational Inference

[...]

Matthew D. Hoffman¹, David M. Blei², Chong Wang³, John Paisley⁴•Institutions (4)

Adobe Systems¹, Princeton University², Carnegie Mellon University³, University of California, Berkeley⁴

29 Jun 2012-arXiv: Machine Learning

TL;DR: This article developed stochastic variational inference, a scalable algorithm for approximating posterior distributions for a large class of probabilistic models, including the hierarchical Dirichlet process topic model.

...read moreread less

Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.

...read moreread less

119 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
…
48
49
50
51
52
53
54
…
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics