scispace - formally typeset
Proceedings ArticleDOI

Adversarial Cross-Modal Retrieval

Reads0
Chats0
TLDR
Comprehensive experimental results show that the proposed ACMR method is superior in learning effective subspace representation and that it significantly outperforms the state-of-the-art cross-modal retrieval methods.
Abstract
Cross-modal retrieval aims to enable flexible retrieval experience across different modalities (e.g., texts vs. images). The core of cross-modal retrieval research is to learn a common subspace where the items of different modalities can be directly compared to each other. In this paper, we present a novel Adversarial Cross-Modal Retrieval (ACMR) method, which seeks an effective common subspace based on adversarial learning. Adversarial learning is implemented as an interplay between two processes. The first process, a feature projector, tries to generate a modality-invariant representation in the common subspace and to confuse the other process, modality classifier, which tries to discriminate between different modalities based on the generated representation. We further impose triplet constraints on the feature projector in order to minimize the gap among the representations of all items from different modalities with same semantic labels, while maximizing the distances among semantically different images and texts. Through the joint exploitation of the above, the underlying cross-modal semantic structure of multimedia data is better preserved when this data is projected into the common subspace. Comprehensive experimental results on four widely used benchmark datasets show that the proposed ACMR method is superior in learning effective subspace representation and that it significantly outperforms the state-of-the-art cross-modal retrieval methods.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Deep Supervised Cross-Modal Retrieval

TL;DR: Deep Supervised Cross-modal Retrieval (DSCMR) aims to find a common representation space, in which the samples from different modalities can be compared directly and minimises the discrimination loss in both the label space and theCommon representation space to supervise the model learning discriminative features.
Proceedings ArticleDOI

Cross-Modality Person Re-Identification with Generative Adversarial Training.

TL;DR: This paper proposes a novel cross-modality generative adversarial network (termed cmGAN) that integrates both identification loss and cross- modality triplet loss, which minimize inter-class ambiguity while maximizing cross-Modality similarity among instances.
Journal ArticleDOI

Deep Multimodal Representation Learning: A Survey

TL;DR: The key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of the knowledge, have never been reviewed previously are highlighted.
Proceedings ArticleDOI

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

TL;DR: Li et al. as discussed by the authors proposed a self-supervised adversarial hashing (SSAH) approach, which leveraged two adversarial networks to maximize the semantic correlation and consistency of the representations between different modalities.
Journal ArticleDOI

Empowering Things With Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things

TL;DR: In this article, the authors present a comprehensive survey on AIoT to show how AI can empower the IoT to make it faster, smarter, greener, and safer, and highlight the challenges facing AI-oT and some potential research opportunities.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Related Papers (5)