Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval

doi:10.1109/TCYB.2019.2928180

Journal ArticleDOI

Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval

Xing Xu, +5 more

- 01 Jun 2020 -

IEEE Transactions on Systems, Man, and C...

- Vol. 50, Iss: 6, pp 2400-2413

Chats0

TLDR

A novel model called ternary adversarial networks with self-supervision (TANSS) is proposed, inspired by zero-shot learning, to overcome the limitation of the existing methods on this challenging task of cross-modal retrieval.

Abstract:

Given a query instance from one modality (e.g., image), cross-modal retrieval aims to find semantically similar instances from another modality (e.g., text). To perform cross-modal retrieval, existing approaches typically learn a common semantic space from a labeled source set and directly produce common representations in the learned space for the instances in a target set. These methods commonly require that the instances of both two sets share the same classes. Intuitively, they may not generalize well on a more practical scenario of zero-shot cross-modal retrieval , that is, the instances of the target set contain unseen classes that have inconsistent semantics with the seen classes in the source set. Inspired by zero-shot learning, we propose a novel model called ternary adversarial networks with self-supervision (TANSS) in this paper, to overcome the limitation of the existing methods on this challenging task. Our TANSS approach consists of three paralleled subnetworks: 1) two semantic feature learning subnetworks that capture the intrinsic data structures of different modalities and preserve the modality relationships via semantic features in the common semantic space; 2) a self-supervised semantic subnetwork that leverages the word vectors of both seen and unseen labels as guidance to supervise the semantic feature learning and enhances the knowledge transfer to unseen labels; and 3) we also utilize the adversarial learning scheme in our TANSS to maximize the consistency and correlation of the semantic features between different modalities. The three subnetworks are integrated in our TANSS to formulate an end-to-end network architecture which enables efficient iterative parameter optimization. Comprehensive experiments on three cross-modal datasets show the effectiveness of our TANSS approach compared with the state-of-the-art methods for zero-shot cross-modal retrieval.

Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval

Citations

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

Deep Fuzzy Hashing Network for Efficient Image Retrieval

Cross-Modal Attention With Semantic Consistence for Image–Text Matching

Exploiting Subspace Relation in Semantic Labels for Cross-Modal Hashing

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

References

Very Deep Convolutional Networks for Large-Scale Image Recognition

Generative Adversarial Nets

Visualizing Data using t-SNE

Efficient Estimation of Word Representations in Vector Space

A Survey on Transfer Learning

Related Papers (5)

Brain Intelligence: Go beyond Artificial Intelligence

Deep Residual Learning for Image Recognition

Adversarial Cross-Modal Retrieval

Generative Adversarial Nets

Adam: A Method for Stochastic Optimization