scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Experiments carried out on an archive of aerial images point out that the presented hashing methods are much faster, while keeping a similar (or even higher) retrieval accuracy, than those typically used in RS, which exploit an exact nearest neighbor search.
Abstract: Large-scale remote sensing (RS) image search and retrieval have recently attracted great attention, due to the rapid evolution of satellite systems, that results in a sharp growing of image archives. An exhaustive search through linear scan from such archives is time demanding and not scalable in operational applications. To overcome such a problem, this paper introduces hashing-based approximate nearest neighbor search for fast and accurate image search and retrieval in large RS data archives. The hashing aims at mapping high-dimensional image feature vectors into compact binary hash codes, which are indexed into a hash table that enables real-time search and accurate retrieval. Such binary hash codes can also significantly reduce the amount of memory required for storing the RS images in the auxiliary archives. In particular, in this paper, we introduce in RS two kernel-based nonlinear hashing methods. The first hashing method defines hash functions in the kernel space by using only unlabeled images, while the second method leverages on the semantic similarity extracted by annotated images to describe much distinctive hash functions in the kernel space. The effectiveness of considered hashing methods is analyzed in terms of RS image retrieval accuracy and retrieval time. Experiments carried out on an archive of aerial images point out that the presented hashing methods are much faster, while keeping a similar (or even higher) retrieval accuracy, than those typically used in RS, which exploit an exact nearest neighbor search.

131 citations

Journal ArticleDOI
TL;DR: It is shown that hashing on the basis of t-distributed stochastic neighbor embedding outperforms state-of-the-art hashing methods on large-scale benchmark data sets, and is very effective for image classification with very short code lengths, and the proposed framework can be further improved.
Abstract: Learning-based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes preserving the Euclidean similarity in the original space. Manifold learning techniques, in contrast, are better able to model the intrinsic structure embedded in the original high-dimensional data. The complexities of these models, and the problems with out-of-sample data, have previously rendered them unsuitable for application to large-scale embedding, however. In this paper, how to learn compact binary embeddings on their intrinsic manifolds is considered. In order to address the above-mentioned difficulties, an efficient, inductive solution to the out-of-sample data problem, and a process by which nonparametric manifold learning may be used as the basis of a hashing method are proposed. The proposed approach thus allows the development of a range of new hashing techniques exploiting the flexibility of the wide variety of manifold learning approaches available. It is particularly shown that hashing on the basis of t-distributed stochastic neighbor embedding outperforms state-of-the-art hashing methods on large-scale benchmark data sets, and is very effective for image classification with very short code lengths. It is shown that the proposed framework can be further improved, for example, by minimizing the quantization error with learned orthogonal rotations without much computation overhead. In addition, a supervised inductive manifold hashing framework is developed by incorporating the label information, which is shown to greatly advance the semantic retrieval performance.

131 citations

Journal ArticleDOI
01 Sep 2015
TL;DR: A novel concept of query-aware bucket partition which uses a given query as the "anchor" for bucket partition, which removes random shift required by traditional query-oblivious LSH functions is introduced.
Abstract: Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes for the c-Approximate Nearest Neighbor (c-ANN) search problem in high-dimensional Euclidean space. Traditionally, LSH functions are constructed in a query-oblivious manner in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer c ≥ 2.In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the "anchor" for bucket partition. Accordingly, a query-aware LSH function is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. Notably, query-aware bucket partition can be easily implemented so that query performance is guaranteed. We propose a novel query-aware LSH scheme named QALSH for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c > 1. Extensive experiments show that QALSH outperforms C2LSH and LSB-Forest, especially in high-dimensional space. Specifically, by using a ratio c

130 citations

Proceedings ArticleDOI
01 Nov 2011
TL;DR: An efficient GPU-based parallel LSH algorithm to perform approximate k-nearest neighbor computation in high-dimensional spaces and demonstrates the results on large image datasets with 200,000 images which are represented as 512 dimensional vectors.
Abstract: We present an efficient GPU-based parallel LSH algorithm to perform approximate k-nearest neighbor computation in high-dimensional spaces. We use the Bi-level LSH algorithm, which can compute k-nearest neighbors with higher accuracy and is amenable to parallelization. During the first level, we use the parallel RP-tree algorithm to partition datasets into several groups so that items similar to each other are clustered together. The second level involves computing the Bi-Level LSH code for each item and constructing a hierarchical hash table. The hash table is based on parallel cuckoo hashing and Morton curves. In the query step, we use GPU-based work queues to accelerate short-list search, which is one of the main bottlenecks in LSH-based algorithms. We demonstrate the results on large image datasets with 200,000 images which are represented as 512 dimensional vectors. In practice, our GPU implementation can obtain more than 40X acceleration over a single-core CPU-based LSH implementation.

130 citations

Journal ArticleDOI
TL;DR: This work tackles the challenge of supporting large-scale similarity search over encrypted feature-rich multimedia data, by considering the search criteria as a high-dimensional feature vector instead of a keyword, and built on carefully-designed fuzzy Bloom filters which utilize locality sensitive hashing to encode an index associating the file identifiers and feature vectors.
Abstract: Storage services allow data owners to store their huge amount of potentially sensitive data, such as audios, images, and videos, on remote cloud servers in encrypted form. To enable retrieval of encrypted files of interest, searchable symmetric encryption (SSE) schemes have been proposed. However, many schemes construct indexes based on keyword-file pairs and focus on boolean expressions of exact keyword matches. Moreover, most dynamic SSE schemes cannot achieve forward privacy and reveal unnecessary information when updating the encrypted databases. We tackle the challenge of supporting large-scale similarity search over encrypted feature-rich multimedia data, by considering the search criteria as a high-dimensional feature vector instead of a keyword. Our solutions are built on carefully-designed fuzzy Bloom filters which utilize locality sensitive hashing (LSH) to encode an index associating the file identifiers and feature vectors. Our schemes are proven to be secure against adaptively chosen query attack and forward private in the standard model . We have evaluated the performance of our scheme on real-world high-dimensional datasets, and achieved a search quality of 99 percent recall with only a few number of hash tables for LSH. This shows that our index is compact and searching is not only efficient but also accurate.

129 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139