scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a spline regression hashing method, in which both the local and global data similarity structures are exploited, and outperforms the state-of-the-art techniques on generating hash codes.
Abstract: Techniques for fast image retrieval over large databases have attracted considerable attention due to the rapid growth of web images. One promising way to accelerate image search is to use hashing technologies, which represent images by compact binary codewords. In this way, the similarity between images can be efficiently measured in terms of the Hamming distance between their corresponding binary codes. Although plenty of methods on generating hash codes have been proposed in recent years, there are still two key points that needed to be improved: 1) how to precisely preserve the similarity structure of the original data and 2) how to obtain the hash codes of the previously unseen data. In this paper, we propose our spline regression hashing method, in which both the local and global data similarity structures are exploited. To better capture the local manifold structure, we introduce splines developed in Sobolev space to find the local data mapping function. Furthermore, our framework simultaneously learns the hash codes of the training data and the hash function for the unseen data, which solves the out-of-sample problem. Extensive experiments conducted on real image datasets consisting of over one million images show that our proposed method outperforms the state-of-the-art techniques.

29 citations

Journal ArticleDOI
TL;DR: This paper proposes a generic method to speed up the process of joining two large datasets using LSH by identifying a set of representative points to reduce the number of LSH lookups and demonstrates the generality of the method by showing that the same principle can be applied to LSH algorithms for three different metrics.
Abstract: Locality sensitive hashing (LSH) is an efficient method for solving the problem of approximate similarity search in high-dimensional spaces. Through LSH, a high-dimensional similarity join can be processed in the same way as hash join, making the cost of joining two large datasets linear. By judicially analyzing the properties of multiple LSH algorithms, we propose a generic method to speed up the process of joining two large datasets using LSH. The crux of our method lies in the way which we identify a set of representative points to reduce the number of LSH lookups. Theoretical analyzes show that our proposed method can greatly reduce the number of lookup operations and retain the same result accuracy compared to executing LSH lookups for every query point. Furthermore, we demonstrate the generality of our method by showing that the same principle can be applied to LSH algorithms for three different metrics: the Euclidean distance (QALSH), Jaccard similarity measure (MinHash), and Hamming distance (sequence hashing). Results from experimental studies using real datasets confirm our error analyzes and show significant improvements of our method over the state-of-the-art LSH method: to achieve over 0.95 recall, we only need to operate LSH lookups for at most 15 percent of the query points.

29 citations

Proceedings ArticleDOI
06 Jun 2017
TL;DR: A novel hashing method, i.e., Discrete Multi-view Hashing (DMVH), which can work on multi-view data directly and make full use of rich information in multi-View data, and a novel approach to construct similarity matrix, which can not only preserve local similarity structure, but also keep semantic similarity between data points.
Abstract: Recently, hashing techniques have witnessed an increase in popularity due to their low storage cost and high query speed for large scale data retrieval task, eg, image retrieval Many methods have been proposed; however, most existing hashing techniques focus on single view data In many scenarios, there are multiple views in data samples Thus, those methods working on single view can not make full use of rich information contained in multi-view data Although some methods have been proposed for multi-view data; they usually relax binary constraints or separate the process of learning hash functions and binary codes into two independent stages to bypass the obstacle of handling the discrete constraints on binary codes for optimization, which may generate large quantization error To consider these problems, in this paper, we propose a novel hashing method, ie, Discrete Multi-view Hashing (DMVH), which can work on multi-view data directly and make full use of rich information in multi-view data Moreover, in DMVH, we optimize discrete codes directly instead of relaxing the binary constraints so that we could obtain high-quality hash codes Simultaneously, we present a novel approach to construct similarity matrix, which can not only preserve local similarity structure, but also keep semantic similarity between data points To solve the optimization problem in DMVH, we further propose an alternate algorithm We test the proposed model on three large scale data sets Experimental results show that it outperforms or is comparable to several state-of-the-arts

29 citations

Journal ArticleDOI
TL;DR: A novel key-dependent robust speech hashing based on speech production model is proposed in this letter, which is highly robust to content preserving operations as well as having high accuracy of tampering localization.
Abstract: Robust hashing for multimedia authentication is an emerging research area. A novel key-dependent robust speech hashing based on speech production model is proposed in this letter. Robust hash is calculated based on linear spectrum frequencies (LSFs) which model the vocal tract. The correlation between LSFs is decoupled by discrete cosine transformation (DCT). A randomization scheme controlled by a secret key is applied in hash generation for random feature selection. The hash function is key-dependent and collision resistant. Meanwhile, it is highly robust to content preserving operations as well as having high accuracy of tampering localization.

28 citations

Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work investigates and benchmark the scalability properties of the state-of-the-art object recognition techniques: the forest of k-d trees, the locality sensitive hashing (LSH) method, and the approximate clustering procedure with the tf-idf inverted index.
Abstract: Scaling from hundreds to millions of objects is the next challenge in visual recognition. We investigate and benchmark the scalability properties (memory requirements, runtime, recognition performance) of the state-of-the-art object recognition techniques: the forest of k-d trees, the locality sensitive hashing (LSH) method, and the approximate clustering procedure with the tf-idf inverted index. The characterization of the images was performed with SIFT features. We conduct experiments on two new datasets of more than 100,000 images each, and quantify the performance using artificial and natural deformations. We analyze the results and point out the pitfalls of each of the compared methodologies suggesting potential new research avenues for the field.

28 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139