Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search

[...]

Anshumali Shrivastava¹, Ping Li²•Institutions (2)

Cornell University¹, Rutgers University²

21 Jun 2014

TL;DR: The heart of the proposed hash function is a "rotation" scheme which densifies the sparse sketches of one permutation hashing in an unbiased fashion thereby maintaining the LSH property, which makes the obtained sketches suitable for hash table construction.

...read moreread less

Abstract: The query complexity of locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size (Indyk & Motwani, 1998). In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), minwise hashing is widely adopted, which requires applying a large number of permutations on the data. This is costly in computation and energy-consumption. In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation. The heart of the proposed hash function is a "rotation" scheme which densifies the sparse sketches of one permutation hashing (Li et al., 2012) in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent interest for densifying other types of sparse sketches. Using our proposed hashing method, the query time of a (K,L)-parameterized LSH is reduced from the typical O(dKL) complexity to merely O(KL + dL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies.

...read moreread less

120 citations

Journal Article•DOI•

Analysis of Minimum Distances in High-Dimensional Musical Spaces

[...]

Michael A. Casey¹, Christophe Rhodes¹, Malcolm Slaney²•Institutions (2)

University of London¹, Yahoo!²

01 Jul 2008-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems and compatible with locality-sensitive hashing-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations.

...read moreread less

Abstract: We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles.

...read moreread less

118 citations

Journal Article•DOI•

Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features

[...]

Xiaoqiang Lu¹, Yaxiong Chen¹, Xuelong Li¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2018-IEEE Transactions on Image Processing

TL;DR: A deep hashing method is proposed to extensively exploit both spatial details and semantic information, in which, it leverage hierarchical convolutional features to construct image pyramid representation and a new loss function is proposed that maintains the semantic similarity and balanceable property of hash codes.

...read moreread less

Abstract: Hashing has been an important and effective technology in image retrieval due to its computational efficiency and fast search speed. The traditional hashing methods usually learn hash functions to obtain binary codes by exploiting hand-crafted features, which cannot optimally represent the information of the sample. Recently, deep learning methods can achieve better performance, since deep learning architectures can learn more effective image representation features. However, these methods only use semantic features to generate hash codes by shallow projection but ignore texture details. In this paper, we proposed a novel hashing method, namely hierarchical recurrent neural hashing (HRNH), to exploit hierarchical recurrent neural network to generate effective hash codes. There are three contributions of this paper. First, a deep hashing method is proposed to extensively exploit both spatial details and semantic information, in which, we leverage hierarchical convolutional features to construct image pyramid representation. Second, our proposed deep network can exploit directly convolutional feature maps as input to preserve the spatial structure of convolutional feature maps. Finally, we propose a new loss function that considers the quantization error of binarizing the continuous embeddings into the discrete binary codes, and simultaneously maintains the semantic similarity and balanceable property of hash codes. Experimental results on four widely used data sets demonstrate that the proposed HRNH can achieve superior performance over other state-of-the-art hashing methods.

...read moreread less

118 citations

Proceedings Article•DOI•

Lower bounds for high dimensional nearest neighbor search and related problems

[...]

Allan Borodin¹, Rafail Ostrovsky, Yuval Rabani²•Institutions (2)

University of Toronto¹, Technion – Israel Institute of Technology²

01 May 1999

TL;DR: This work investigates the exact nearest neighbors search problem and the related problem of exact partial match within the asymmetric communication model first used by Miltersen to study data structure problems and derives non-trivial asymptotic lower bounds for the exact problem that stand in contrast to known algorithms for approximate nearest neighbor search.

...read moreread less

Abstract: In spite of extensive and continuing research, for various geometric search problems (such as nearest neighbor search), the best algorithms known have performance that degrades exponentially in the dimension. This phenomenon is sometimes called the curse of dimensionality. Recent results [37, 38, 40] show that in some sense it is possible to avoid the curse of dimensionality for the approximate nearest neighbor search problem. But must the exact nearest neighbor search problem suffer this curse? We provide some evidence in support of the curse. Specifically we investigate the exact nearest neighbor search problem and the related problem of exact partial match within the asymmetric communication model first used by Miltersen [43] to study data structure problems. We derive non-trivial asymptotic lower bounds for the exact problem that stand in contrast to known algorithms for approximate nearest neighbor search.

...read moreread less

118 citations

Journal Article•DOI•

Linear Subspace Ranking Hashing for Cross-Modal Retrieval

[...]

Kai Li¹, Guo-Jun Qi¹, Jun Ye¹, Kien A. Hua¹•Institutions (1)

University of Central Florida¹

01 Sep 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes a novel ranking-based hashing framework that maps data from different modalities into a common Hamming space where the cross-modal similarity can be measured using Hamming distance and shows that theranking-based hash function has a natural probabilistic approximation which transforms the original highly discontinuous optimization problem into one that can be efficiently solved using simple gradient descent algorithms.

...read moreread less

Abstract: Hashing has attracted a great deal of research in recent years due to its effectiveness for the retrieval and indexing of large-scale high-dimensional multimedia data. In this paper, we propose a novel ranking-based hashing framework that maps data from different modalities into a common Hamming space where the cross-modal similarity can be measured using Hamming distance. Unlike existing cross-modal hashing algorithms where the learned hash functions are binary space partitioning functions, such as the sign and threshold function, the proposed hashing scheme takes advantage of a new class of hash functions closely related to rank correlation measures which are known to be scale-invariant, numerically stable, and highly nonlinear. Specifically, we jointly learn two groups of linear subspaces, one for each modality, so that features’ ranking orders in different linear subspaces maximally preserve the cross-modal similarities. We show that the ranking-based hash function has a natural probabilistic approximation which transforms the original highly discontinuous optimization problem into one that can be efficiently solved using simple gradient descent algorithms. The proposed hashing framework is also flexible in the sense that the optimization procedures are not tied up to any specific form of loss function, which is typical for existing cross-modal hashing methods, but rather we can flexibly accommodate different loss functions with minimal changes to the learning steps. We demonstrate through extensive experiments on four widely-used real-world multimodal datasets that the proposed cross-modal hashing method can achieve competitive performance against several state-of-the-arts with only moderate training and testing time.

...read moreread less

117 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics