Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval

doi:10.1109/ICCV.2017.441

Open AccessProceedings ArticleDOI

Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval

Yuming Shen, +3 more

- pp 4117-4126

Chats0

TLDR

A novel integrated deep architecture is developed to effectively encode the detailed semantics of informative images and long descriptive sentences, named as Textual-Visual Deep Binaries (TVDB), where region-based convolutional networks with long short-term memory units are introduced to fully explore image regional details while semantic cues of sentences are modeled by a text Convolutional network.

Abstract:

Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cross retrieval, where data from different modalities are mapped into a shared Hamming space for matching. Most of the traditional textual-visual binary encoding methods only consider holistic image representations and fail to model descriptive sentences. This renders existing methods inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To address the problem of hashing cross-modal data with semantic-rich cues, in this paper, a novel integrated deep architecture is developed to effectively encode the detailed semantics of informative images and long descriptive sentences, named as Textual-Visual Deep Binaries (TVDB). In particular, region-based convolutional networks with long short-term memory units are introduced to fully explore image regional details while semantic cues of sentences are modeled by a text convolutional network. Additionally, we propose a stochastic batch-wise training routine, where high-quality binary codes and deep encoding functions are efficiently optimized in an alternating manner. Experiments are conducted on three multimedia datasets, i.e. Microsoft COCO, IAPR TC-12, and INRIA Web Queries, where the proposed TVDB model significantly outperforms state-of-the-art binary coding methods in the task of cross-modal retrieval.

Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval

Citations

Encyclopedia of Statistics in Behavioral Science

The quarterly journal of experimental psychology

Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval

Collective Reconstructive Embeddings for Cross-Modal Hashing

References

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

ImageNet: A large-scale hierarchical image database

Microsoft COCO: Common Objects in Context

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

NUS-WIDE: a real-world web image database from National University of Singapore

Supervised Discrete Hashing

Simultaneous feature learning and hash coding with deep neural networks

Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval