Adversarial Cross-Modal Retrieval

Comprehensive experimental results show that the proposed ACMR method is superior in learning effective subspace representation and that it significantly outperforms the state-of-the-art cross-modal retrieval methods.

Abstract:

Cross-modal retrieval aims to enable flexible retrieval experience across different modalities (e.g., texts vs. images). The core of cross-modal retrieval research is to learn a common subspace where the items of different modalities can be directly compared to each other. In this paper, we present a novel Adversarial Cross-Modal Retrieval (ACMR) method, which seeks an effective common subspace based on adversarial learning. Adversarial learning is implemented as an interplay between two processes. The first process, a feature projector, tries to generate a modality-invariant representation in the common subspace and to confuse the other process, modality classifier, which tries to discriminate between different modalities based on the generated representation. We further impose triplet constraints on the feature projector in order to minimize the gap among the representations of all items from different modalities with same semantic labels, while maximizing the distances among semantically different images and texts. Through the joint exploitation of the above, the underlying cross-modal semantic structure of multimedia data is better preserved when this data is projected into the common subspace. Comprehensive experimental results on four widely used benchmark datasets show that the proposed ACMR method is superior in learning effective subspace representation and that it significantly outperforms the state-of-the-art cross-modal retrieval methods.

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Supervised Cross-Modal Retrieval

Liangli Zhen,Peng Hu,Xu Wang,Dezhong Peng +3 moreInstitute of High Performance Computing Singapore,Sichuan University

Show Less

TL;DR: Deep Supervised Cross-modal Retrieval (DSCMR) aims to find a common representation space, in which the samples from different modalities can be compared directly and minimises the discrimination loss in both the label space and theCommon representation space to supervise the model learning discriminative features.

...read moreread less

Proceedings ArticleDOI

Cross-Modality Person Re-Identification with Generative Adversarial Training.

Pingyang Dai,Rongrong Ji,Haibin Wang,Qiong Wu,Yuyu Huang +4 moreXiamen University

Show Less

TL;DR: This paper proposes a novel cross-modality generative adversarial network (termed cmGAN) that integrates both identification loss and cross- modality triplet loss, which minimize inter-class ambiguity while maximizing cross-Modality similarity among instances.

...read moreread less

Journal ArticleDOI

Deep Multimodal Representation Learning: A Survey

Wenzhong Guo,Jianwen Wang,Shiping Wang +2 moreFuzhou University,Fujian Normal University

- 15 May 2019 -

IEEE Access

Show Less

TL;DR: The key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of the knowledge, have never been reviewed previously are highlighted.

...read moreread less

Proceedings ArticleDOI

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

Chao Li,Cheng Deng,Ning Li,Wei Liu,Xinbo Gao,Dacheng Tao +5 moreXidian University,Tencent,University of Sydney

Show Less

TL;DR: Li et al. as discussed by the authors proposed a self-supervised adversarial hashing (SSAH) approach, which leveraged two adversarial networks to maximize the semantic correlation and consistency of the representations between different modalities.

...read moreread less

Journal ArticleDOI

Empowering Things With Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things

Jing Zhang,Dacheng Tao +1 moreUniversity of Sydney

- 15 May 2021 -

IEEE Internet of Things Journal

Show Less

TL;DR: In this article, the authors present a comprehensive survey on AIoT to show how AI can empower the IoT to make it faster, smarter, greener, and safer, and highlight the challenges facing AI-oT and some potential research opportunities.

...read moreread less

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129

Collapse

Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 moreCornell University,California Institute of Technology,University of California, Irvine,Microsoft

Show Less

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

SciSpace

About Careers Resources Support Browse Papers Pricing SciSpace Affiliate Program Cancellation & Refund Policy Terms Privacy

Tools

Citation generator AI Detector Paraphraser Citation Booster

Extensions

SciSpace

Directories

Papers Topics Journals Authors Conferences Institutions Questions Citation Styles

Contact

support@typeset.io +91 8431021544

Adversarial Cross-Modal Retrieval

Citations

Deep Supervised Cross-Modal Retrieval

Cross-Modality Person Re-Identification with Generative Adversarial Training.

Deep Multimodal Representation Learning: A Survey

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

Empowering Things With Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things

References

Adam: A Method for Stochastic Optimization

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Generative Adversarial Nets

Microsoft COCO: Common Objects in Context

Related Papers (5)

A new approach to cross-modal multimedia retrieval

Deep Residual Learning for Image Recognition

NUS-WIDE: a real-world web image database from National University of Singapore

Generative Adversarial Nets

Deep Canonical Correlation Analysis