Collective Reconstructive Embeddings for Cross-Modal Hashing

This paper unify the projections of text and image to the Hamming space into a common reconstructive embedding through rigid mathematical reformulation, which not only reduces the optimization complexity significantly but also facilitates the inter-modal similarity preservation among different modalities.

Abstract:

In this paper, we study the problem of cross-modal retrieval by hashing-based approximate nearest neighbor search techniques. Most existing cross-modal hashing works mainly address the issue of multi-modal integration complexity using the same mapping and similarity calculation for data from different media types. Nonetheless, this may cause information loss during the mapping process due to overlooking the specifics of each individual modality. In this paper, we propose a simple yet effective cross-modal hashing approach, termed collective reconstructive embeddings (CRE), which can simultaneously solve the heterogeneity and integration complexity of multi-modal data. To address the heterogeneity challenge, we propose to process heterogeneous types of data using different modality-specific models. Specifically, we model textual data with cosine similarity-based reconstructive embedding to alleviate the data sparsity to the greatest extent, while for image data, we utilize the Euclidean distance to characterize the relationships of the projected hash codes. Meanwhile, we unify the projections of text and image to the Hamming space into a common reconstructive embedding through rigid mathematical reformulation, which not only reduces the optimization complexity significantly but also facilitates the inter-modal similarity preservation among different modalities. We further incorporate the code balance and uncorrelation criteria into the problem and devise an efficient iterative algorithm for optimization. Comprehensive experiments on four widely used multimodal benchmarks show that the proposed CRE can achieve a superior performance compared with the state of the art on several challenging cross-modal tasks.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Multi-View Enhancement Hashing for Image Retrieval

Chenggang Yan,Biao Gong,Yuxuan Wei,Yue Gao +3 moreHangzhou Dianzi University,Tsinghua University

- 01 Apr 2021 -

IEEE Transactions on Pattern Analysis an...

Show Less

TL;DR: Zhang et al. as discussed by the authors proposed a supervised multi-view hash model which can enhance the multiview information through neural networks, and the proposed method utilizes an effective view stability evaluation method to actively explore the relationship among views, which will affect the optimization direction of the entire network.

...read moreread less

Journal ArticleDOI

Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval

Xing Xu,Huimin Lu,Jingkuan Song,Yang Yang,Heng Tao Shen,Xuelong Li +5 moreUniversity of Electronic Science and Technology of China,Shanghai Jiao Tong University,Northwestern Polytechnical University

- 01 Jun 2020 -

IEEE Transactions on Systems, Man, and C...

Show Less

TL;DR: A novel model called ternary adversarial networks with self-supervision (TANSS) is proposed, inspired by zero-shot learning, to overcome the limitation of the existing methods on this challenging task of cross-modal retrieval.

...read moreread less

Posted Content

Deep Multi-View Enhancement Hashing for Image Retrieval

Chenggang Yan,Biao Gong,Yuxuan Wei,Yue Gao +3 moreHangzhou Dianzi University,Tsinghua University

- 01 Feb 2020 -

arXiv: Computer Vision and Pattern Recog...

Show Less

TL;DR: This paper proposes a supervised multi-view hash model which can enhance the multi- view information through neural networks, and significantly outperforms the state-of-the-art single-view and multi-View hashing methods.

...read moreread less

Journal ArticleDOI

Scalable Deep Hashing for Large-Scale Social Image Retrieval

Hui Cui,Lei Zhu,Jingjing Li,Yang Yang,Liqiang Nie +4 moreShandong Normal University,University of Electronic Science and Technology of China,Shandong University

- 01 Jan 2020 -

IEEE Transactions on Image Processing

Show Less

TL;DR: This paper proposes a unified scalable deep hash learning framework which explores the weak but free supervision of discriminative user tags that are commonly accompanied with social images and proposes a discrete hash optimization method based on Augmented Lagrangian Multiplier to directly solve the hash codes and avoid the binary quantization information loss.

...read moreread less

Proceedings ArticleDOI

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

Tan Wang,Xing Xu,Yang Yang,Alan Hanjalic,Heng Tao Shen,Jingkuan Song +5 moreUniversity of Electronic Science and Technology of China,Delft University of Technology

Show Less

TL;DR: This work proposes a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image- text instance.

...read moreread less

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Aude Oliva,Antonio Torralba +1 moreBrigham and Women's Hospital,Carleton College

- 01 May 2001 -

International Journal of Computer Vision

Show Less

TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

...read moreread less

Journal ArticleDOI

Content-based image retrieval at the end of the early years

Arnold W. M. Smeulders,Marcel Worring,Simone Santini,Amarnath Gupta,Ramesh Jain +4 moreUniversity of Amsterdam,University of California, San Diego

- 01 Dec 2000 -

IEEE Transactions on Pattern Analysis an...

Show Less

TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.

...read moreread less

Journal ArticleDOI

Canonical Correlation Analysis: An Overview with Application to Learning Methods

David R. Hardoon,Sandor Szedmak,John Shawe-Taylor +2 moreUniversity of Southampton

- 01 Dec 2004 -

Neural Computation

Show Less

TL;DR: A general method using kernel canonical correlation analysis to learn a semantic representation to web images and their associated text and compares orthogonalization approaches against a standard cross-representation retrieval technique known as the generalized vector space model is presented.

...read moreread less

Proceedings ArticleDOI

NUS-WIDE: a real-world web image database from National University of Singapore

Tat-Seng Chua,Jinhui Tang,Richang Hong,Haojie Li,Zhiping Luo,Yan-Tao Zheng +5 moreNational University of Singapore

Show Less

TL;DR: The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval and four research issues on web image annotation and retrieval are identified.

...read moreread less

Proceedings Article

Spectral Hashing

Yair Weiss,Antonio Torralba,Rob Fergus +2 moreHebrew University of Jerusalem,Massachusetts Institute of Technology,Courant Institute of Mathematical Sciences

Show Less

TL;DR: The problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard and a spectral method is obtained whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian.

...read moreread less

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15

Collapse

IEEE Transactions on Pattern Analysis an...

Show Less

SciSpace

About Careers Resources Support Browse Papers Pricing SciSpace Affiliate Program Cancellation & Refund Policy Terms Privacy

Tools

Citation generator AI Detector Paraphraser Citation Booster

Extensions

SciSpace

Directories

Papers Topics Journals Authors Conferences Institutions Questions Citation Styles

Contact

support@typeset.io +91 8431021544

Collective Reconstructive Embeddings for Cross-Modal Hashing

Citations

Deep Multi-View Enhancement Hashing for Image Retrieval

Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval

Deep Multi-View Enhancement Hashing for Image Retrieval

Scalable Deep Hashing for Large-Scale Social Image Retrieval

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

References

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Content-based image retrieval at the end of the early years

Canonical Correlation Analysis: An Overview with Application to Learning Methods

NUS-WIDE: a real-world web image database from National University of Singapore

Spectral Hashing

Related Papers (5)

NUS-WIDE: a real-world web image database from National University of Singapore

The MIR flickr retrieval evaluation

Large-scale supervised multimodal hashing with semantic correlation maximization

Supervised Discrete Hashing

Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval