scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Posted Content
TL;DR: This survey detailedly investigates current deep hashing algorithms including deep supervised hashing and deep unsupervised hashing, and categorizes deep supervised hash methods into pairwise methods, ranking-based methods, pointwise methods as well as quantization according to how measuring the similarities of the learned hash codes.
Abstract: Nearest neighbor search is to find the data points in the database such that the distances from them to the query are the smallest, which is a fundamental problem in various domains, such as computer vision, recommendation systems and machine learning. Hashing is one of the most widely used methods for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional methods. In this paper, we present a comprehensive survey of the deep hashing algorithms. Specifically, we categorize deep supervised hashing methods into pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, classification-oriented preserving as well as quantization according to the manners of preserving the similarities. In addition, we also introduce some other topics such as deep unsupervised hashing and multi-modal deep hashing methods. Meanwhile, we also present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally, we discussed some potential research directions in conclusion.

59 citations

Proceedings ArticleDOI
21 Oct 2013
TL;DR: Topology Preserving Hashing is proposed, a novel hashing method that is distinct from prior works by preserving the neighborhood rankings of data points in Hamming space and is formulated as a generalized eigendecomposition problem with closed form solutions.
Abstract: Binary hashing has been widely used for efficient similarity search. Learning efficient codes has become a research focus and it is still a challenge. In many cases, the real-world data often lies on a low-dimensional manifold, which should be taken into account to capture meaningful neighbors with hashing. The importance of a manifold is its topology, which represents the neighborhood relationships between its subregions and the relative proximities between the neighbors of each subregion, e.g. the relative ranking of neighbors of each subregion. Most existing hashing methods try to preserve the neighborhood relationships by mapping similar points to close codes, while ignoring the neighborhood rankings. Moreover, most hashing methods lack in providing a good ranking for query results since they use Hamming distance as the similarity metric, and in practice, there are often a lot of results sharing the same distance to a query. In this paper, we propose a novel hashing method to solve these two issues jointly. The proposed method is referred to as Topology Preserving Hashing (TPH). TPH is distinct from prior works by preserving the neighborhood rankings of data points in Hamming space. The learning stage of TPH is formulated as a generalized eigendecomposition problem with closed form solutions. Experimental comparisons with other state-of-the-art methods on three noted image benchmarks demonstrate the efficacy of the proposed method.

58 citations

Journal ArticleDOI
TL;DR: This paper proposes LSH-DDP, an approximate algorithm that exploits Locality Sensitive Hashing for partitioning data, performs local computation, and aggregates local results to approximate the final results, and presents formal analysis of this algorithm.
Abstract: Density Peaks (DP) is a recently proposed clustering algorithm that has distinctive advantages over existing clustering algorithms. It has already been used in a wide range of applications. However, DP requires computing the distance between every pair of input points, therefore incurring quadratic computation overhead, which is prohibitive for large data sets. In this paper, we study efficient distributed algorithms for DP. We first show that a naive MapReduce solution (Basic-DDP) has high communication and computation overhead. Then, we propose LSH-DDP, an approximate algorithm that exploits Locality Sensitive Hashing for partitioning data, performs local computation, and aggregates local results to approximate the final results. We address several challenges in employing LSH for DP. We leverage the characteristics of DP to deal with the fact that some of the result values cannot be directly approximated in local partitions. We present formal analysis of LSH-DDP, and show that the approximation quality and the runtime can be controlled by tuning the parameters of LSH-DDP. Experimental results on both a local cluster and EC2 show that LSH-DDP achieves a factor of 1.7–70x speedup over the naive Basic-DDP and 2x speedup over the state-of-the-art EDDPC approach, while returning comparable cluster results. Compared to the popular K-means clustering, LSH-DDP also has comparable or better performance. Furthermore, LSH-DDP could achieve even higher efficiency with a lower accuracy requirement.

58 citations

Proceedings ArticleDOI
10 Apr 2016
TL;DR: A new searchable encryption scheme which can efficiently and securely enable nearest neighbor search over encrypted data on untrusted clouds is proposed, which can be used for secure k-nearest neighbor search, and it is compatible with other similar tree structures.
Abstract: Nearest neighbor search (or k-nearest neighbor search in general) is one of the most fundamental queries on massive datasets, and it has extensive applications such as pattern recognition, statistical classification, graph algorithms, Location-Based Services and online recommendations. With the raising trend of outsourcing massive sensitive datasets to public clouds, it is urgent for companies and organizations to demand fast and secure nearest neighbor search solutions over their outsourced data, but without revealing privacy to untrusted clouds. However, existing solutions for secure nearest neighbor search still face significant limitations, which make them far from practice. In this paper, we propose a new searchable encryption scheme, which can efficiently and securely enable nearest neighbor search over encrypted data on untrusted clouds. Specifically, we modify the search algorithm of nearest neighbors with tree structures (e.g., R-trees), where the modified algorithm adapts to lightweight cryptographic primitives (e.g., Order-Preserving Encryption) without affecting the original faster-than-linear search complexity. As a result, we address all the limitations in the previous works while still maintaining correctness and security. Moreover, our design is general, which can be used for secure k-nearest neighbor search, and it is compatible with other similar tree structures. Our experimental results on Amazon EC2 show that our scheme is extremely practical over massive datasets.

58 citations

Journal ArticleDOI
Shumeet Baluja1, Michele Covell1
TL;DR: A method to learn a similarity function from only weakly labeled positive examples is described, used as the basis of a hash function to severely constrain the number of points considered for each lookup in a large corpus of high-dimensional data points.
Abstract: The problem of efficiently finding similar items in a large corpus of high-dimensional data points arises in many real-world tasks, such as music, image, and video retrieval. Beyond the scaling difficulties that arise with lookups in large data sets, the complexity in these domains is exacerbated by an imprecise definition of similarity. In this paper, we describe a method to learn a similarity function from only weakly labeled positive examples. Once learned, this similarity function is used as the basis of a hash function to severely constrain the number of points considered for each lookup. Tested on a large real-world audio dataset, only a tiny fraction of the points (~0.27%) are ever considered for each lookup. To increase efficiency, no comparisons in the original high-dimensional space of points are required. The performance far surpasses, in terms of both efficiency and accuracy, a state-of-the-art Locality-Sensitive-Hashing-based (LSH) technique for the same problem and data set.

57 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139