scispace - formally typeset
Open AccessProceedings ArticleDOI

Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval

Reads0
Chats0
TLDR
A novel integrated deep architecture is developed to effectively encode the detailed semantics of informative images and long descriptive sentences, named as Textual-Visual Deep Binaries (TVDB), where region-based convolutional networks with long short-term memory units are introduced to fully explore image regional details while semantic cues of sentences are modeled by a text Convolutional network.
Abstract
Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cross retrieval, where data from different modalities are mapped into a shared Hamming space for matching. Most of the traditional textual-visual binary encoding methods only consider holistic image representations and fail to model descriptive sentences. This renders existing methods inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To address the problem of hashing cross-modal data with semantic-rich cues, in this paper, a novel integrated deep architecture is developed to effectively encode the detailed semantics of informative images and long descriptive sentences, named as Textual-Visual Deep Binaries (TVDB). In particular, region-based convolutional networks with long short-term memory units are introduced to fully explore image regional details while semantic cues of sentences are modeled by a text convolutional network. Additionally, we propose a stochastic batch-wise training routine, where high-quality binary codes and deep encoding functions are efficiently optimized in an alternating manner. Experiments are conducted on three multimedia datasets, i.e. Microsoft COCO, IAPR TC-12, and INRIA Web Queries, where the proposed TVDB model significantly outperforms state-of-the-art binary coding methods in the task of cross-modal retrieval.

read more

Citations
More filters
Journal ArticleDOI

Encyclopedia of Statistics in Behavioral Science

Martin Guha
TL;DR: An apposite and eminently readable reference for all behavioral science research and development.
Journal ArticleDOI

Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

TL;DR: This paper proposes a novel deep generative approach to cross-modal retrieval to learn hash functions in the absence of paired training samples through the cycle consistency loss, and employs adversarial training scheme to learn a couple of hash functions enabling translation between modalities while assuming the underlying semantic relationship.
Proceedings ArticleDOI

Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval

TL;DR: The unsupervised deep cross-modal hash coding is studied and the proposed Deep Joint-Semantics Reconstructing Hashing (DJSRH) is proposed, which has the following two main advantages: first, to learn binary codes that preserve the neighborhood structure of the original data, DJSRH constructs a novel joint-semantics affinity matrix which elaborately integrates the original neighborhood information from different modalities.
Journal ArticleDOI

Collective Reconstructive Embeddings for Cross-Modal Hashing

TL;DR: This paper unify the projections of text and image to the Hamming space into a common reconstructive embedding through rigid mathematical reformulation, which not only reduces the optimization complexity significantly but also facilitates the inter-modal similarity preservation among different modalities.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Related Papers (5)