Home
/
Topics
/
Latent Dirichlet allocation

Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1992
1990
1989
1988
1985
1979
1976
1969
1965

Papers

PDF

Open Access

More filters

Proceedings Article•

Sparse Overcomplete Latent Variable Decomposition of Counts Data

[...]

Madhusudana Shashanka, Bhiksha Raj¹, Paris Smaragdis²•Institutions (2)

Mitsubishi Electric Research Laboratories¹, Adobe Systems²

03 Dec 2007

TL;DR: This paper starts with the PLSA framework and uses an entropic prior in a maximum a posteriori formulation to enforce sparsity and shows that this allows the extraction of overcomplete sets of latent components which better characterize the data.

...read moreread less

Abstract: An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and lack an explicit provision to control the "expressiveness" of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity. We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations.

...read moreread less

84 citations

Proceedings Article•DOI•

Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training

[...]

Bing Xiang¹, Liang Zhou•Institutions (1)

IBM¹

01 Jun 2014

TL;DR: The proposed sentiment model outperforms the top system in the task of Sentiment Analysis in Twitter in SemEval-2013 in terms of averaged F scores.

...read moreread less

Abstract: In this paper, we present multiple approaches to improve sentiment analysis on Twitter data. We first establish a state-of-the-art baseline with a rich feature set. Then we build a topic-based sentiment mixture model with topic-specific data in a semi-supervised training framework. The topic information is generated through topic modeling based on an efficient implementation of Latent Dirichlet Allocation (LDA). The proposed sentiment model outperforms the top system in the task of Sentiment Analysis in Twitter in SemEval-2013 in terms of averaged F scores.

...read moreread less

83 citations

Journal Article•DOI•

Parametric Embedding for Class Visualization

[...]

Tomoharu Iwata¹, Kazumi Saito¹, Naonori Ueda¹, Sean Stromsten², Thomas L. Griffiths³, Joshua B. Tenenbaum⁴ - Show less +2 more•Institutions (4)

Nippon Telegraph and Telephone¹, BAE Systems², University of California, Berkeley³, Massachusetts Institute of Technology⁴

01 Dec 2004

TL;DR: A new method, parametric embedding (PE), that embeds objects with the class structure into a low-dimensional visualization space, providing insight into the classifier's behavior in supervised, semisupervised, and unsupervised settings is proposed.

...read moreread less

Abstract: We propose a new method, parametric embedding (PE), that embeds objects with the class structure into a low-dimensional visualization space. PE takes as input a set of class conditional probabilities for given data points and tries to preserve the structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a gaussian mixture with equal covariances in the embedding space. PE has many potential uses depending on the source of the input data, providing insight into the classifier's behavior in supervised, semisupervised, and unsupervised settings. The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. We demonstrate PE by visualizing supervised categorization of Web pages, semisupervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, latent Dirichlet allocation.

...read moreread less

83 citations

Journal Article•DOI•

Topic detection using paragraph vectors to support active learning in systematic reviews

[...]

Kazuma Hashimoto¹, Georgios Kontonatsios², Makoto Miwa³, Sophia Ananiadou²•Institutions (3)

University of Tokyo¹, University of Manchester², Toyota Technological Institute³

01 Aug 2016-Journal of Biomedical Informatics

TL;DR: In this paper, a topic detection method based on paragraph vectors is proposed to accelerate citation screening in clinical and public health reviews. But the method is not suitable for the task of biomedical journal articles, since it requires expert reviewers to manually screen thousands of citations to identify all relevant articles to the review.

...read moreread less

83 citations

Proceedings Article•DOI•

Automatic categorization of bug reports using latent Dirichlet allocation

[...]

Kalyanasundaram Somasundaram¹, Gail C. Murphy²•Institutions (2)

Anna University¹, University of British Columbia²

22 Feb 2012

TL;DR: Three approaches to automating bug report categorization are investigated: an approach similar to previous ones based on an SVM classifier and Term Frequency Inverse Document Frequency(svm-tf-idf), an approach using Latent Dirichlet Allocation (LDA) with SVM (sVM-lda) and an approachUsing LDA and Kullback Leibler divergence (lda-kl).

...read moreread less

Abstract: Software developers, particularly in open-source projects, rely on bug repositories to organize their work. On a bug report, the component field is used to indicate to which team of developers a bug should be routed. Researchers have shown that incorrect categorization of newly received bug reports to components can cause potential delays in the resolution of bug reports. Approaches have been developed that consider the use of machine learning approaches, specifically Support Vector Machines (svm), to automatically categorize bug reports into the appropriate component to help streamline the process of solving a bug. One drawback of an SVM-based approach is that the results of categorization can be uneven across various components in the system if some components receive less reports than others. In this paper, we consider broadening the consistency of the recommendations produced by an automatic approach by investigating three approaches to automating bug report categorization: an approach similar to previous ones based on an SVM classifier and Term Frequency Inverse Document Frequency(svm-tf-idf), an approach using Latent Dirichlet Allocation (LDA) with SVM (svm-lda) and an approach using LDA and Kullback Leibler divergence (lda-kl). We found that lda-kl produced recalls similar to those found previously but with better consistency across all components for which bugs must be categorized.

...read moreread less

83 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
…
72
73
74
75
76
77
78
…
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics