Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Hashing-Based Scalable Remote Sensing Image Search and Retrieval in Large Archives

[...]

Begum Demir¹, Lorenzo Bruzzone¹•Institutions (1)

University of Trento¹

01 Feb 2016-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Experiments carried out on an archive of aerial images point out that the presented hashing methods are much faster, while keeping a similar (or even higher) retrieval accuracy, than those typically used in RS, which exploit an exact nearest neighbor search.

...read moreread less

Abstract: Large-scale remote sensing (RS) image search and retrieval have recently attracted great attention, due to the rapid evolution of satellite systems, that results in a sharp growing of image archives. An exhaustive search through linear scan from such archives is time demanding and not scalable in operational applications. To overcome such a problem, this paper introduces hashing-based approximate nearest neighbor search for fast and accurate image search and retrieval in large RS data archives. The hashing aims at mapping high-dimensional image feature vectors into compact binary hash codes, which are indexed into a hash table that enables real-time search and accurate retrieval. Such binary hash codes can also significantly reduce the amount of memory required for storing the RS images in the auxiliary archives. In particular, in this paper, we introduce in RS two kernel-based nonlinear hashing methods. The first hashing method defines hash functions in the kernel space by using only unlabeled images, while the second method leverages on the semantic similarity extracted by annotated images to describe much distinctive hash functions in the kernel space. The effectiveness of considered hashing methods is analyzed in terms of RS image retrieval accuracy and retrieval time. Experiments carried out on an archive of aerial images point out that the presented hashing methods are much faster, while keeping a similar (or even higher) retrieval accuracy, than those typically used in RS, which exploit an exact nearest neighbor search.

...read moreread less

131 citations

Journal Article•DOI•

Hashing on Nonlinear Manifolds

[...]

Fumin Shen¹, Chunhua Shen², Qinfeng Shi², Anton van den Hengel², Zhenmin Tang³, Heng Tao Shen⁴ - Show less +2 more•Institutions (4)

University of Electronic Science and Technology of China¹, University of Adelaide², Nanjing University of Science and Technology³, University of Queensland⁴

24 Feb 2015-IEEE Transactions on Image Processing

TL;DR: It is shown that hashing on the basis of t-distributed stochastic neighbor embedding outperforms state-of-the-art hashing methods on large-scale benchmark data sets, and is very effective for image classification with very short code lengths, and the proposed framework can be further improved.

...read moreread less

Abstract: Learning-based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes preserving the Euclidean similarity in the original space. Manifold learning techniques, in contrast, are better able to model the intrinsic structure embedded in the original high-dimensional data. The complexities of these models, and the problems with out-of-sample data, have previously rendered them unsuitable for application to large-scale embedding, however. In this paper, how to learn compact binary embeddings on their intrinsic manifolds is considered. In order to address the above-mentioned difficulties, an efficient, inductive solution to the out-of-sample data problem, and a process by which nonparametric manifold learning may be used as the basis of a hashing method are proposed. The proposed approach thus allows the development of a range of new hashing techniques exploiting the flexibility of the wide variety of manifold learning approaches available. It is particularly shown that hashing on the basis of t-distributed stochastic neighbor embedding outperforms state-of-the-art hashing methods on large-scale benchmark data sets, and is very effective for image classification with very short code lengths. It is shown that the proposed framework can be further improved, for example, by minimizing the quantization error with learned orthogonal rotations without much computation overhead. In addition, a supervised inductive manifold hashing framework is developed by incorporating the label information, which is shown to greatly advance the semantic retrieval performance.

...read moreread less

131 citations

Journal Article•DOI•

Query-aware locality-sensitive hashing for approximate nearest neighbor search

[...]

Qiang Huang¹, Jianlin Feng¹, Yikai Zhang¹, Qiong Fang², Wilfred Ng³ - Show less +1 more•Institutions (3)

Sun Yat-sen University¹, South China University of Technology², Hong Kong University of Science and Technology³

01 Sep 2015

TL;DR: A novel concept of query-aware bucket partition which uses a given query as the "anchor" for bucket partition, which removes random shift required by traditional query-oblivious LSH functions is introduced.

...read moreread less

Abstract: Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes for the c-Approximate Nearest Neighbor (c-ANN) search problem in high-dimensional Euclidean space. Traditionally, LSH functions are constructed in a query-oblivious manner in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer c ≥ 2.In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the "anchor" for bucket partition. Accordingly, a query-aware LSH function is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. Notably, query-aware bucket partition can be easily implemented so that query performance is guaranteed. We propose a novel query-aware LSH scheme named QALSH for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c > 1. Extensive experiments show that QALSH outperforms C2LSH and LSB-Forest, especially in high-dimensional space. Specifically, by using a ratio c

...read moreread less

130 citations

Proceedings Article•DOI•

Fast GPU-based locality sensitive hashing for k-nearest neighbor computation

[...]

Jia Pan¹, Dinesh Manocha¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Nov 2011

TL;DR: An efficient GPU-based parallel LSH algorithm to perform approximate k-nearest neighbor computation in high-dimensional spaces and demonstrates the results on large image datasets with 200,000 images which are represented as 512 dimensional vectors.

...read moreread less

Abstract: We present an efficient GPU-based parallel LSH algorithm to perform approximate k-nearest neighbor computation in high-dimensional spaces. We use the Bi-level LSH algorithm, which can compute k-nearest neighbors with higher accuracy and is amenable to parallelization. During the first level, we use the parallel RP-tree algorithm to partition datasets into several groups so that items similar to each other are clustered together. The second level involves computing the Bi-Level LSH code for each item and constructing a hierarchical hash table. The hash table is based on parallel cuckoo hashing and Morton curves. In the query step, we use GPU-based work queues to accelerate short-list search, which is one of the main bottlenecks in LSH-based algorithms. We demonstrate the results on large image datasets with 200,000 images which are represented as 512 dimensional vectors. In practice, our GPU implementation can obtain more than 40X acceleration over a single-core CPU-based LSH implementation.

...read moreread less

130 citations

Journal Article•DOI•

Searchable Encryption over Feature-Rich Data

[...]

Qian Wang¹, Meiqi He², Minxin Du¹, Sherman S. M. Chow³, Russell W. F. Lai³, Qin Zou¹ - Show less +2 more•Institutions (3)

Wuhan University¹, University of Hong Kong², The Chinese University of Hong Kong³

01 May 2018-IEEE Transactions on Dependable and Secure Computing

TL;DR: This work tackles the challenge of supporting large-scale similarity search over encrypted feature-rich multimedia data, by considering the search criteria as a high-dimensional feature vector instead of a keyword, and built on carefully-designed fuzzy Bloom filters which utilize locality sensitive hashing to encode an index associating the file identifiers and feature vectors.

...read moreread less

Abstract: Storage services allow data owners to store their huge amount of potentially sensitive data, such as audios, images, and videos, on remote cloud servers in encrypted form. To enable retrieval of encrypted files of interest, searchable symmetric encryption (SSE) schemes have been proposed. However, many schemes construct indexes based on keyword-file pairs and focus on boolean expressions of exact keyword matches. Moreover, most dynamic SSE schemes cannot achieve forward privacy and reveal unnecessary information when updating the encrypted databases. We tackle the challenge of supporting large-scale similarity search over encrypted feature-rich multimedia data, by considering the search criteria as a high-dimensional feature vector instead of a keyword. Our solutions are built on carefully-designed fuzzy Bloom filters which utilize locality sensitive hashing (LSH) to encode an index associating the file identifiers and feature vectors. Our schemes are proven to be secure against adaptively chosen query attack and forward private in the standard model . We have evaluated the performance of our scheme on real-world high-dimensional datasets, and achieved a search quality of 99 percent recall with only a few number of hash tables for LSH. This shows that our index is compact and searching is not only efficient but also accurate.

...read moreread less

129 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics