Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Survey on Deep Hashing Methods

[...]

20 Feb 2023-ACM Transactions on Knowledge Discovery From Data

TL;DR: In this article , the authors classified deep hashing methods into pairwise methods, ranking-based methods, pointwise methods as well as quantization according to how measuring the similarities of the learned hash codes.

...read moreread less

Abstract: Nearest neighbor search aims at obtaining the samples in the database with the smallest distances from them to the queries, which is a basic task in a range of fields, including computer vision and data mining. Hashing is one of the most widely used methods for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional methods. In this survey, we detailedly investigate current deep hashing algorithms including deep supervised hashing and deep unsupervised hashing. Specifically, we categorize deep supervised hashing methods into pairwise methods, ranking-based methods, pointwise methods as well as quantization according to how measuring the similarities of the learned hash codes. Moreover, deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods, and prediction-free self-supervised learning-based methods based on their semantic learning manners. We also introduce three related important topics including semi-supervised deep hashing, domain adaption deep hashing, and multi-modal deep hashing. Meanwhile, we present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally, we discuss some potential research directions in conclusion.

...read moreread less

9 citations

Proceedings Article•

Data Driven Resource Allocation for Distributed Learning

[...]

Travis Dick¹, Mu Li², Venkata Krishna Pillutla², Colin White², Maria-Florina Balcan², Alexander J. Smola² - Show less +2 more•Institutions (2)

University of Alberta¹, Carnegie Mellon University²

01 Dec 2015

TL;DR: In this article, the authors propose data dependent dispatching that takes advantage of the structure of similar data points to improve the performance of distributed machine learning, and demonstrate that their technique strongly scales with the available computing power.

...read moreread less

Abstract: In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based on random partitioning, balanced partition trees, and locality sensitive hashing, showing that we achieve significantly higher accuracy on both synthetic and real world image and advertising datasets. We also demonstrate that our technique strongly scales with the available computing power.

...read moreread less

9 citations

Journal Article•DOI•

Non-Expansive Hashing

[...]

Nathan Linial¹, Ori Sasson¹•Institutions (1)

Hebrew University of Jerusalem¹

01 Jan 1998-Combinatorica

TL;DR: A non-expansive hashing scheme wherein any set of size from a large universe may be stored in a memory of size (any, and ), and where retrieval takes operations.

...read moreread less

Abstract: hashing scheme, similar inputs are stored in memory locations which are close. We develop a non-expansive hashing scheme wherein any set of size from a large universe may be stored in a memory of size (any , and ), and where retrieval takes operations. We explain how to use non-expansive hashing schemes for efficient storage and retrieval of noisy data. A dynamic version of this hashing scheme is presented as well.

...read moreread less

9 citations

SpotSigs: robust and efficient near duplicate detection in large web collections.

[...]

Martin Theobald¹, Jonathan Siddharth², Andreas Paepcke²•Institutions (2)

Max Planck Society¹, Stanford University²

01 Jan 2008

TL;DR: SpotSigs as mentioned in this paper is a new algorithm for extracting and matching signatures for near-duplicate detection in large Web crawls, which is designed to favor natural language portions of web pages over advertisements and navigational bars.

...read moreread less

Abstract: Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching signatures for near duplicate detection in large Web crawls Our spot signatures are designed to favor natural-language portions of Web pages over advertisements and navigational barsThe contributions of SpotSigs are twofold: 1) by combining stopword antecedents with short chains of adjacent content terms, we create robust document signatures with a natural ability to filter out noisy components of Web pages that would otherwise distract pure n-gram-based approaches such as Shingling; 2) we provide an exact and efficient, self-tuning matching algorithm that exploits a novel combination of collection partitioning and inverted index pruning for high-dimensional similarity search Experiments confirm an increase in combined precision and recall of more than 24 percent over state-of-the-art approaches such as Shingling or I-Match and up to a factor of 3 faster execution times than Locality Sensitive Hashing (LSH), over a demonstrative "Gold Set" of manually assessed near-duplicate news articles as well as the TREC WT10g Web collection

...read moreread less

9 citations

Proceedings Article•DOI•

Set-based approach for lossless graph summarization using Locality Sensitive Hashing

[...]

Kifayat Ullah Khan¹•Institutions (1)

Kyung Hee University¹

13 Apr 2015

TL;DR: This paper proposes a set-based summarization method that aggregates the sets of similar nodes in each iteration, thus provides scalability and presents the scalable solutions for lossless summarization of both attributed and non-attributed graphs.

...read moreread less

Abstract: Graph summarization is a valuable approach for in-memory processing of a big graph. A summary graph is compact, yet it maintains the overall characteristics of the underlying graph, thus suitable for querying and visualization. To summarize a big graph, the idea is to compress the similar nodes in dense regions of the graph. The existing approaches find these similar nodes either by nodes ordering or pair-wise similarity computations. The former approaches are scalable but cannot simultaneously consider the attributes and neighborhood similarity among the nodes. In contrast, the pair-wise summarization methods can consider both the similarity aspects but are impractical for a big graph. In this paper, we propose a set-based summarization method that aggregates the sets of similar nodes in each iteration, thus provides scalability. To find each set, we approximate the candidate similar nodes without nodes ordering and explicit similarity computations by using Locality Sensitive Hashing, LSH. In conjunction with an information theoretic approach, we present the scalable solutions for lossless summarization of both attributed and non-attributed graphs.

...read moreread less

9 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics