Home
/
Topics
/
Latent Dirichlet allocation

Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1992
1990
1989
1988
1985
1979
1976
1969
1965

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A new approach to cross-modal multimedia retrieval

[...]

Nikhil Rasiwasia¹, Jose Costa Pereira¹, Emanuele Coviello¹, Gabriel Doyle¹, Gert R. G. Lanckriet¹, Roger Levy¹, Nuno Vasconcelos¹ - Show less +3 more•Institutions (1)

University of California, San Diego¹

25 Oct 2010

TL;DR: It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy and are shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.

...read moreread less

Abstract: The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned with latent Dirichlet allocation, and images are represented as bags of visual (SIFT) features. Two hypotheses are investigated: that 1) there is a benefit to explicitly modeling correlations between the two components, and 2) this modeling is more effective in feature spaces with higher levels of abstraction. Correlations between the two components are learned with canonical correlation analysis. Abstraction is achieved by representing text and images at a more general, semantic level. The two hypotheses are studied in the context of the task of cross-modal document retrieval. This includes retrieving the text that most closely matches a query image, or retrieving the images that most closely match a query text. It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy. The cross-modal model is also shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.

...read moreread less

1,284 citations

Proceedings Article•DOI•

Latent semantic indexing: a probabilistic analysis

[...]

Christos H. Papadimitriou¹, Hisao Tamaki², Prabhakar Raghavan³, Santosh Vempala⁴•Institutions (4)

University of California, Berkeley¹, Meiji University², IBM³, Massachusetts Institute of Technology⁴

01 May 1998

TL;DR: It is proved that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance.

...read moreread less

Abstract: Latent semantic indexing LSI is an information retrieval technique based on the spectral analysis of the term document matrix whose empirical success had heretofore been without rigorous prediction and explanation We prove that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance We also propose the technique of random projection as a way of speeding up LSI We complement our theorems with encouraging experimental results We also argue that our results may be viewed in a more general framework as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative ltering

...read moreread less

1,235 citations

Proceedings Article•DOI•

Modeling annotated data

[...]

David M. Blei¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

28 Jul 2003

TL;DR: Three hierarchical probabilistic mixture models which aim to describe annotated data with multiple types, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type.

...read moreread less

Abstract: We consider the problem of modeling annotated data---data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical probabilistic mixture models which aim to describe such data, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type. We conduct experiments on the Corel database of images and captions, assessing performance in terms of held-out likelihood, automatic annotation, and text-based image retrieval.

...read moreread less

1,199 citations

Proceedings Article•DOI•

LDA-based document models for ad-hoc retrieval

[...]

Xing Wei¹, W. Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

06 Aug 2006

TL;DR: This paper proposes an LDA-based document model within the language modeling framework, and evaluates it on several TREC collections, and shows that improvements over retrieval using cluster-based models can be obtained with reasonable efficiency.

...read moreread less

Abstract: Search algorithms incorporating some form of topic model have a long history in information retrieval. For example, cluster-based retrieval has been studied since the 60s and has recently produced good results in the language model framework. An approach to building topic models based on a formal generative model of documents, Latent Dirichlet Allocation (LDA), is heavily cited in the machine learning literature, but its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use LDA to improve ad-hoc retrieval. We propose an LDA-based document model within the language modeling framework, and evaluate it on several TREC collections. Gibbs sampling is employed to conduct approximate inference in LDA and the computational complexity is analyzed. We show that improvements over retrieval using cluster-based models can be obtained with reasonable efficiency.

...read moreread less

1,148 citations

Proceedings Article•DOI•

Topic modeling: beyond bag-of-words

[...]

Hanna Wallach¹•Institutions (1)

University of Cambridge¹

25 Jun 2006

TL;DR: A hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model is explored.

...read moreread less

Abstract: Some models of textual corpora employ text generation methods involving n-gram statistics, while others use latent topic variables inferred using the "bag-of-words" assumption, in which word order is ignored. Previously, these methods have not been combined. In this work, I explore a hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model. The model hyperparameters are inferred using a Gibbs EM algorithm. On two data sets, each of 150 documents, the new model exhibits better predictive accuracy than either a hierarchical Dirichlet bigram language model or a unigram topic model. Additionally, the inferred topics are less dominated by function words than are topics discovered using unigram statistics, potentially making them more meaningful.

...read moreread less

1,133 citations

1
…
2
3
4
5
6
7
8
…
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics