A new approach to cross-modal multimedia retrieval

It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy and are shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.

Abstract:

The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned with latent Dirichlet allocation, and images are represented as bags of visual (SIFT) features. Two hypotheses are investigated: that 1) there is a benefit to explicitly modeling correlations between the two components, and 2) this modeling is more effective in feature spaces with higher levels of abstraction. Correlations between the two components are learned with canonical correlation analysis. Abstraction is achieved by representing text and images at a more general, semantic level. The two hypotheses are studied in the context of the task of cross-modal document retrieval. This includes retrieving the text that most closely matches a query image, or retrieving the images that most closely match a query text. It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy. The cross-modal model is also shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Multimodal Machine Learning: A Survey and Taxonomy

Tadas Baltrusaitis,Chaitanya Ahuja,Louis-Philippe Morency +2 moreMicrosoft,Carnegie Mellon University

- 01 Feb 2019 -

IEEE Transactions on Pattern Analysis an...

Show Less

TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.

...read moreread less

Journal ArticleDOI

Framing image description as a ranking task: data, models and evaluation metrics

Micah Hodosh,Peter Young,Julia Hockenmaier +2 moreUniversity of Illinois at Urbana–Champaign

- 01 May 2013 -

Journal of Artificial Intelligence Resea...

Show Less

TL;DR: This paper proposed to frame sentence-based image annotation as the task of ranking a given pool of captions and showed that the importance of training on multiple captions per image, and of capturing syntactic (word order-based) and semantic features of these captions, is emphasized.

...read moreread less

Journal ArticleDOI

Visual Domain Adaptation: A survey of recent advances

Vishal M. Patel,Raghuraman Gopalan,Ruonan Li,Rama Chellappa +3 moreUniversity of Maryland, College Park,AT&T Labs,Harvard University

- 02 Apr 2015 -

IEEE Signal Processing Magazine

Show Less

TL;DR: A survey of domain adaptation methods for visual recognition discusses the merits and drawbacks of existing domain adaptation approaches and identifies promising avenues for research in this rapidly evolving field.

...read moreread less

Proceedings ArticleDOI

Generalized Multiview Analysis: A discriminative latent space

Abhishek Sharma,Abhishek Kumar,Hal Daumé,David W. Jacobs +3 moreUniversity of Maryland, College Park

Show Less

TL;DR: GMA solves a joint, relaxed QCQP over different feature spaces to obtain a single (non)linear subspace and is a supervised extension of Canonical Correlational Analysis (CCA), which is useful for cross-view classification and retrieval.

...read moreread less

Proceedings Article

Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics (Extended Abstract)

Micah Hodosh,Peter Young,Julia Hockenmaier +2 moreUniversity of Illinois at Urbana–Champaign

Show Less

TL;DR: This work proposes to frame sentence-based image annotation as the task of ranking a given pool of captions, and introduces a new benchmark collection, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events.

...read moreread less

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

David G. LoweUniversity of British Columbia

- 01 Nov 2004 -

International Journal of Computer Vision

Show Less

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Book

Applied Logistic Regression

David W. Hosmer,Stanley Lemeshow +1 more

Show Less

TL;DR: Hosmer and Lemeshow as discussed by the authors provide an accessible introduction to the logistic regression model while incorporating advances of the last decade, including a variety of software packages for the analysis of data sets.

...read moreread less

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 moreUniversity of California, Berkeley,Stanford University

- 01 Mar 2003 -

Journal of Machine Learning Research

Show Less

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Journal ArticleDOI

Applied Logistic Regression.

A. J. Scott,David W. Hosmer,Stanley Lemeshow +2 more

- 01 Dec 1991 -

Biometrics

Show Less

TL;DR: Applied Logistic Regression, Third Edition provides an easily accessible introduction to the logistic regression model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 moreUniversity of California, Berkeley

Show Less

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less