Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Semantic Hashing with Generative Adversarial Networks

[...]

Zhaofan Qiu¹, Yingwei Pan¹, Ting Yao², Tao Mei²•Institutions (2)

University of Science and Technology of China¹, Microsoft²

07 Aug 2017

TL;DR: This paper studies the exploration of generating synthetic data through semi-supervised generative adversarial networks (GANs), which leverages largely unlabeled and limited labeled training data to produce highly compelling data with intrinsic invariance and global coherence, for better understanding statistical structures of natural data.

...read moreread less

Abstract: Hashing has been a widely-adopted technique for nearest neighbor search in large-scale image retrieval tasks. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, the cost of annotating data is often an obstacle when applying supervised hashing to a new domain. Moreover, the results can suffer from the robustness problem as the data at training and test stage may come from different distributions. This paper studies the exploration of generating synthetic data through semi-supervised generative adversarial networks (GANs), which leverages largely unlabeled and limited labeled training data to produce highly compelling data with intrinsic invariance and global coherence, for better understanding statistical structures of natural data. We demonstrate that the above two limitations can be well mitigated by applying the synthetic data for hashing. Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream. The whole architecture is trained end-to-end by jointly optimizing three losses, i.e., adversarial loss to correct label of synthetic or real for each sample, triplet ranking loss to preserve the relative similarity ordering in the input real-synthetic triplets and classification loss to classify each sample accurately. Extensive experiments conducted on both CIFAR-10 and NUS-WIDE image benchmarks validate the capability of exploiting synthetic images for hashing. Our framework also achieves superior results when compared to state-of-the-art deep hash models.

...read moreread less

97 citations

Proceedings Article•DOI•

Principles of hash-based text retrieval

[...]

Benno Stein¹•Institutions (1)

Bauhaus University, Weimar¹

23 Jul 2007

TL;DR: The design principles behind hash-based search methods are revealed and it is shown how optimum hash functions for similarity search can be derived and the rationale of their effectiveness is explained.

...read moreread less

Abstract: Hash-based similarity search reduces a continuous similarity relation to the binary concept "similar or not similar": two feature vectors are considered as similar if they are mapped on the same hash key. From its runtime performance this principle is unequaled--while being unaffected by dimensionality concerns at the same time. Similarity hashing is applied with great success for near similarity search in large document collections, and it is considered as a key technology for near-duplicate detection and plagiarism analysis. This papers reveals the design principles behind hash-based search methods and presents them in a unified way. We introduce new stress statistics that are suited to analyze the performance of hash-based search methods, and we explain the rationale of their effectiveness. Based on these insights, we show how optimum hash functions for similarity search can be derived. We also present new results of a comparative study between different hash-based search methods.

...read moreread less

97 citations

Journal Article•DOI•

Instance-Aware Hashing for Multi-Label Image Retrieval

[...]

Hanjiang Lai¹, Pan Yan¹, Xiangbo Shu, Yunchao Wei², Shuicheng Yan³ - Show less +1 more•Institutions (3)

Sun Yat-sen University¹, Beijing Jiaotong University², National University of Singapore³

01 Jun 2016-IEEE Transactions on Image Processing

TL;DR: Zhang et al. as discussed by the authors proposed a deep architecture that learns instance-aware image representations for multi-label image data, which are organized in multiple groups, with each group containing the features for one category.

...read moreread less

Abstract: Similarity-preserving hashing is a commonly used method for nearest neighbor search in large-scale image retrieval. For image retrieval, deep-network-based hashing methods are appealing, since they can simultaneously learn effective image representations and compact hash codes. This paper focuses on deep-network-based hashing for multi-label images, each of which may contain objects of multiple categories. In most existing hashing methods, each image is represented by one piece of hash code, which is referred to as semantic hashing. This setting may be suboptimal for multi-label image retrieval. To solve this problem, we propose a deep architecture that learns instance-aware image representations for multi-label image data, which are organized in multiple groups, with each group containing the features for one category. The instance-aware representations not only bring advantages to semantic hashing but also can be used in category-aware hashing, in which an image is represented by multiple pieces of hash codes and each piece of code corresponds to a category. Extensive evaluations conducted on several benchmark data sets demonstrate that for both the semantic hashing and the category-aware hashing, the proposed method shows substantial improvement over the state-of-the-art supervised and unsupervised hashing methods.

...read moreread less

96 citations

Journal Article•DOI•

Robust and Flexible Discrete Hashing for Cross-Modal Similarity Search

[...]

Di Wang¹, Quan Wang¹, Xinbo Gao¹•Institutions (1)

Xidian University¹

01 Oct 2018-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A novel hashing model is proposed to efficiently learn robust discrete binary codes, which is referred as Robust and Flexible Discrete Hashing (RFDH), which is directly learned based on discrete matrix decomposition so that the large quantization error caused by relaxation is avoided.

...read moreread less

Abstract: Multimodal hashing approaches have gained great success on large-scale cross-modal similarity search applications, due to their appealing computation and storage efficiency. However, it is still a challenge work to design binary codes to represent the original features with good performance in an unsupervised manner. We argue that there are some limitations that need to be further considered for unsupervised multimodal hashing: 1) most existing methods drop the discrete constraints to simplify the optimization, which will cause large quantization error; 2) many methods are sensitive to outliers and noises since they use $\ell _{2}$ -norm in their objective functions which can amplify the errors; and 3) the weight of each modality, which greatly influences the retrieval performance, is manually or empirically determined and may not fully fit the specific training set. The above limitations may significantly degrade the retrieval accuracy of unsupervised multimodal hashing methods. To address these problems, in this paper, a novel hashing model is proposed to efficiently learn robust discrete binary codes, which is referred as Robust and Flexible Discrete Hashing (RFDH). In the proposed RFDH model, binary codes are directly learned based on discrete matrix decomposition, so that the large quantization error caused by relaxation is avoided. Moreover, the $\ell _{2,1}$ -norm is used in the objective function to improve the robustness, such that the learned model is not sensitive to data outliers and noises. In addition, the weight of each modality is adaptively adjusted according to training data. Hence the important modality will get large weights during the hash learning procedure. Owing to above merits of RFDH, it can generate more effective hash codes. Besides, we introduce two kinds of hash function learning methods to project unseen instances into hash codes. Extensive experiments on several well-known large databases demonstrate superior performance of the proposed hash model over most state-of-the-art unsupervised multimodal hashing methods.

...read moreread less

94 citations

Posted Content•

Learning to Hash for Indexing Big Data - A Survey

[...]

Jun Wang¹, Wei Liu², Sanjiv Kumar³, Shih-Fu Chang⁴•Institutions (4)

East China Normal University¹, IBM², Google³, Columbia University⁴

17 Sep 2015-arXiv: Learning

TL;DR: A comprehensive survey of the learning-to-hash framework and representative techniques of various types, including unsupervised, semisupervised, and supervised, is provided and recent hashing approaches utilizing the deep learning models are summarized.

...read moreread less

Abstract: The explosive growth in big data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, Approximate Nearest Neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros and cons of the emerging techniques. We provide a comprehensive survey of the learning to hash framework and representative techniques of various types, including unsupervised, semi-supervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area.

...read moreread less

93 citations

Collapse

Network Information

Performance

Metrics

1,120

Papers

57,460

Citations

No. of papers in the topic in previous years
Year	Papers
2023	33
2022	89
2021	11
2020	16
2019	16
2018	38

Feature hashing

Papers published on a yearly basis

Papers

Trending Questions (2)

Network Information

Related Topics (5)

Performance

Metrics