Home
/
Topics
/
Latent Dirichlet allocation

Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1992
1990
1989
1988
1985
1979
1976
1969
1965

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Point-of-interest recommendation in location based social networks with topic and location awareness

[...]

Bin Liu¹, Hui Xiong¹•Institutions (1)

Rutgers University¹

01 Jan 2013

TL;DR: A Topic and Location-aware probabilistic matrix factorization (TL-PMF) method is proposed for POI recommendation to consider both the extent to which a user interest matches the POI in terms of topic distribution and the word-of-mouth opinions of the POIs.

...read moreread less

Abstract: The wide spread use of location based social networks (LBSNs) has enabled the opportunities for better location based services through Point-of-Interest (POI) recommendation. Indeed, the problem of POI recommendation is to provide personalized recommendations of places of interest. Unlike traditional recommendation tasks, POI recommendation is personalized, locationaware, and context depended. In light of this difference, this paper proposes a topic and location aware POI recommender system by exploiting associated textual and context information. Specifically, we first exploit an aggregated latent Dirichlet allocation (LDA) model to learn the interest topics of users and to infer the interest POIs by mining textual information associated with POIs. Then, a Topic and Location-aware probabilistic matrix factorization (TL-PMF) method is proposed for POI recommendation. A unique perspective of TL-PMF is to consider both the extent to which a user interest matches the POI in terms of topic distribution and the word-of-mouth opinions of the POIs. Finally, experiments on real-world LBSNs data show that the proposed recommendation method outperforms state-of-the-art probabilistic latent factor models with a significant margin. Also, we have studied the impact of personalized interest topics and word-of-mouth opinions on POI recommendations.

...read moreread less

194 citations

Proceedings Article•DOI•

On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery

[...]

Rocco Oliveto¹, Malcom Gethers², Denys Poshyvanyk², Andrea De Lucia¹•Institutions (2)

University of Salerno¹, College of William & Mary²

30 Jun 2010

TL;DR: An empirical study to statistically analyze the equivalence of several traceability recovery methods based on Information Retrieval techniques shows that while JS, VSM, and LSI are almost equivalent, LDA is able to capture a dimension unique to the set of techniques which the authors considered.

...read moreread less

Abstract: We present an empirical study to statistically analyze the equivalence of several traceability recovery methods based on Information Retrieval (IR) techniques. The analysis is based on Principal Component Analysis and on the analysis of the overlap of the set of candidate links provided by each method. The studied techniques are the Jensen-Shannon (JS) method, Vector Space Model (VSM), Latent Semantic Indexing (LSI), and Latent Dirichlet Allocation (LDA). The results show that while JS, VSM, and LSI are almost equivalent, LDA is able to capture a dimension unique to the set of techniques which we considered.

...read moreread less

192 citations

Proceedings Article•DOI•

Finding expert users in community question answering

[...]

Fatemeh Riahi¹, Zainab Zolaktaf¹, Mahdi Shafiei¹, Evangelos E. Milios¹•Institutions (1)

Dalhousie University¹

16 Apr 2012

TL;DR: It is shown that for a dataset constructed from the Stackoverflow website, these topic models outperform other methods in retrieving a candidate set of best experts for a question and that the Segmented Topic Model gives consistently better performance compared to the Latent Dirichlet Allocation Model.

...read moreread less

Abstract: Community Question Answering (CQA) websites provide a rapidly growing source of information in many areas. This rapid growth, while offering new opportunities, puts forward new challenges. In most CQA implementations there is little effort in directing new questions to the right group of experts. This means that experts are not provided with questions matching their expertise, and therefore new matching questions may be missed and not receive a proper answer. We focus on finding experts for a newly posted question. We investigate the suitability of two statistical topic models for solving this issue and compare these methods against more traditional Information Retrieval approaches. We show that for a dataset constructed from the Stackoverflow website, these topic models outperform other methods in retrieving a candidate set of best experts for a question. We also show that the Segmented Topic Model gives consistently better performance compared to the Latent Dirichlet Allocation Model.

...read moreread less

192 citations

Journal Article•DOI•

PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

[...]

Zhiyuan Liu¹, Yuzhou Zhang¹, Edward Y. Chang¹, Maosong Sun²•Institutions (2)

Google¹, Tsinghua University²

06 May 2011-ACM Transactions on Intelligent Systems and Technology

TL;DR: Data placement, pipeline processing, word bundling, and priority-based scheduling are proposed to improve scalability of LDA and significantly reduce the unparallelizable communication bottleneck and achieve good load balancing.

...read moreread less

Abstract: Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies significantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA.

...read moreread less

190 citations

Proceedings Article•DOI•

Mining business topics in source code using latent dirichlet allocation

[...]

Girish Maskeri¹, Santonu Sarkar¹, Kenneth Heafield²•Institutions (2)

Infosys¹, California Institute of Technology²

19 Feb 2008

TL;DR: Preliminary results indicate that LDA is able to identify some of the domain topics and is a satisfactory starting point for further manual refinement of topics, and a human assisted approach based on LDA for extracting domain topics from source code is proposed.

...read moreread less

Abstract: One of the difficulties in maintaining a large software system is the absence of documented business domain topics and correlation between these domain topics and source code. Without such a correlation, people without any prior application knowledge would find it hard to comprehend the functionality of the system. Latent Dirichlet Allocation (LDA), a statistical model, has emerged as a popular technique for discovering topics in large text document corpus. But its applicability in extracting business domain topics from source code has not been explored so far. This paper investigates LDA in the context of comprehending large software systems and proposes a human assisted approachbased on LDA for extracting domain topics from source code. This method has been applied on a number of open source and proprietary systems. Preliminary results indicate that LDA is able to identify some of the domain topics and isa satisfactory starting point for further manual refinement of topics

...read moreread less

188 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
…
28
29
30
31
32
33
34
…
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics