Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Distribution-dependent hashing functions and their characteristics

[...]

R. F. Deutscher¹, Paul G. Sorenson¹, J. P. Tremblay¹•Institutions (1)

University of Saskatchewan¹

14 May 1975

TL;DR: A study of the performance measures obtained during tests of "Distribution-dependent" hashing functions indicates that in certain cases, distribution-dependent methods perform better than the division method.

...read moreread less

Abstract: In this paper procedures are studied for storing, accessing, updating, and reorganizing data in large files whose organization is direct, an organization used when a fast response time is required. "Distribution-dependent" hashing functions and the division method are compared as methods of indirect addressing."Distribution-dependent" hashing functions are characterized. These hashing functions generate addresses from a set of keys by using knowledge of the distribution of that key set within the key space or range of keys. A study of the performance measures obtained during tests of these functions on several key sets indicates that in certain cases, distribution-dependent methods perform better than the division method. This result is extended by a demonstration that distribution-dependent hashing functions can accommodate a change in the distribution of keys without being redefined. A number of insertions to and deletions from the key set can be made before a distribution-dependent hashing function gives poorer performance than the division method under identical circumstances.If many additions are made to a set of keys, it becomes necessary to reorganize, in a larger storage area, the direct file of records identified by that key set. Although processor time must be sacrificed in order to redefine a distribution-dependent hashing function, the division method requires substantially greater access time in a reorganizational situation.

...read moreread less

9 citations

Proceedings Article•DOI•

Kernel-based hashing for content-based image retrval in large remote sensing data archive

[...]

Begiim Demir¹, Lorenzo Bruzzo¹•Institutions (1)

University of Trento¹

13 Jul 2014

TL;DR: Experiments carried out on an archive of aerial images show that the presented hashing methods are one hundred times faster than those that exploit an exact nearest neighbor search while keeping a high retrieval accuracy.

...read moreread less

Abstract: This paper presents hashing based approximate nearest neighbor search algorithms that allow fast and accurate image retrieval in huge remote sensing data archives. Hashing methods aim at mapping high-dimensional image feature vectors into short binary codes based on hashing functions. Then, the image retrieval is accomplished according to Hamming distances of image hash codes. In particular, in this paper two hashing methods are adopted for RS image retrieval problems. The former aims at defining hash functions in the kernel space by using only unlabeled images. The latter leverages on the semantic similarity given in terms of annotated images to define much distinctive hash functions in the kernel space. The effectiveness of both methods is analyzed in terms of RS image retrieval accuracy as well as retrieval time. Experiments carried out on an archive of aerial images show that the presented hashing methods are one hundred times faster than those that exploit an exact nearest neighbor search while keeping a high retrieval accuracy.

...read moreread less

9 citations

Journal Article•DOI•

A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

[...]

Deng Cai¹•Institutions (1)

Zhejiang University¹

01 Jun 2021-IEEE Transactions on Knowledge and Data Engineering

TL;DR: In this article, a simple but effective novel hash index search approach was proposed and made a thorough comparison of eleven popular hashing algorithms, and the random-projection-based LSH ranked the first, which is in contradiction to the claims in all the other 10 hashing articles.

...read moreread less

Abstract: Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims to outperform Locality Sensitive Hashing (LSH), which is the most popular hashing method. However, the evaluation of these hashing article was not thorough enough, and the claim should be re-examined. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing article only report the performance with the code length shorter than 128. In this article, we carefully revisit the problem of search-with-a-hash-index and analyze the pros and cons of two popular hash index search procedures. Then we proposed a simple but effective novel hash index search approach and made a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing ranked the first, which is in contradiction to the claims in all the other 10 hashing article. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the article are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms.

...read moreread less

9 citations

Proceedings Article•DOI•

Duplicate detection in web shops using LSH to reduce the number of computations

[...]

Iris van Dam¹, Gerhard van Ginkel¹, Wim Kuipers¹, Nikki Nijenhuis¹, Damir Vandic¹, Flavius Frasincar¹ - Show less +2 more•Institutions (1)

Erasmus University Rotterdam¹

04 Apr 2016

TL;DR: This paper focuses on reducing the computation time of a state-of-the-art duplicate detection algorithm, and constructs uniform vector representations for the products and applies the Multi-component Similarity Method (MSM).

...read moreread less

Abstract: The amount of online shops is growing daily and many Web shops focus on the same product types, like consumer electronics. Since Web shops use different product representations, it is hard to compare products among different Web shops. Duplicate detection methods aim to solve this problem by identifying the same products in differentWeb shops. In this paper, we focus on reducing the computation time of a state-of-the-art duplicate detection algorithm. First, we construct uniform vector representations for the products. We use these vectors as input for a Locality Sensitive Hashing (LSH) algorithm, which pre-selects potential duplicates. Finally, duplicate products are found by applying the Multi-component Similarity Method (MSM). Compared to original MSM, the number of needed computations can be reduced by 95% with only a minor decrease by 9% in the F1-measure.

...read moreread less

9 citations

Posted Content•

Sub-Linear Privacy-Preserving Near-Neighbor Search

[...]

M. Sadegh Riazi¹, Beidi Chen², Anshumali Shrivastava², Dan S. Wallach², Farinaz Koushanfar¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, Rice University²

06 Dec 2016-arXiv: Cryptography and Security

TL;DR: Li et al. as mentioned in this paper proposed a secure binary embedding scheme generated from a novel probabilistic transformation over locality sensitive hashing family, which has a sublinear query time and the ability to handle honest-but-curious parties.

...read moreread less

Abstract: In Near-Neighbor Search (NNS), a new client queries a database (held by a server) for the most similar data (near-neighbors) given a certain similarity metric. The Privacy-Preserving variant (PP-NNS) requires that neither server nor the client shall learn information about the other party's data except what can be inferred from the outcome of NNS. The overwhelming growth in the size of current datasets and the lack of a truly secure server in the online world render the existing solutions impractical; either due to their high computational requirements or non-realistic assumptions which potentially compromise privacy. PP-NNS having query time {\it sub-linear} in the size of the database has been suggested as an open research direction by Li et al. (CCSW'15). In this paper, we provide the first such algorithm, called Secure Locality Sensitive Indexing (SLSI) which has a sub-linear query time and the ability to handle honest-but-curious parties. At the heart of our proposal lies a secure binary embedding scheme generated from a novel probabilistic transformation over locality sensitive hashing family. We provide information theoretic bound for the privacy guarantees and support our theoretical claims using substantial empirical evidence on real-world datasets.

...read moreread less

9 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics