Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Random projection with filtering for nearly duplicate search

[...]

Yue Lin¹, Rong Jin², Deng Cai¹, Xiaofei He¹•Institutions (2)

Zhejiang University¹, Michigan State University²

22 Jul 2012

TL;DR: The key idea is to introduce a filtering procedure within the search algorithm, based on the compressed sensing theory, that effectively removes the false positive answers.

...read moreread less

Abstract: High dimensional nearest neighbor search is a fundamental problem and has found applications in many domains. Although many hashing based approaches have been proposed for approximate nearest neighbor search in high dimensional space, one main drawback is that they often return many false positives that need to be filtered out by a post procedure. We propose a novel method to address this limitation in this paper. The key idea is to introduce a filtering procedure within the search algorithm, based on the compressed sensing theory, that effectively removes the false positive answers. We first obtain a sparse representation for each data point by the landmark based approach, after which we solve the nearly duplicate search that the difference between the query and its nearest neighbors forms a sparse vector living in a small lp ball, where p ≤ 1. Our empirical study on real-world datasets demonstrates the effectiveness of the proposed approach compared to the state-of-the-art hashing methods.

...read moreread less

10 citations

Proceedings Article•

Evolutionary techniques applied to hashing: an efficient data retrieval method

[...]

Daniar Hussain, Steven G. Malliaris

10 Jul 2000

TL;DR: An evolutionary algorithm to locate efficient hashing functions for specific data sets by sampling and evolving from the set of polynomials is presented, showing consistently better performance than other common hashing methods.

...read moreread less

Abstract: Hashing is an efficient method for storage and retrieval of large amounts of data. Presented here is an evolutionary algorithm to locate efficient hashing functions for specific data sets by sampling and evolving from the set of polynomials. Functions derived in this way-show consistently better performance than other common hashing methods, and indicate the power of evolutionary algorithms in search and retrieval.

...read moreread less

10 citations

Proceedings Article•DOI•

Locality-Sensitive Hashing for Massive String-Based Ontology Matching

[...]

Michael Cochez¹•Institutions (1)

Information Technology University¹

11 Aug 2014

TL;DR: Initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies show that using LSH for ontology matching could lead to a very fast matching process.

...read moreread less

Abstract: This paper reports initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies. Two ways of transforming the matching problem into a LSH problem are proposed and experimental results are reported. The performed experiments show that using LSH for ontology matching could lead to a very fast matching process. The quality of the alignment achieved in these experiments is comparable to state-of-the-art matchers, but much faster. Further research is needed to find out whether the use of different metrics or specific hardware would improve the results.

...read moreread less

10 citations

Posted Content•DOI•

Privacy-Preserving Read Mapping Using Locality Sensitive Hashing and Secure Kmer Voting

[...]

Popic¹, Serafim Batzoglou¹•Institutions (1)

Stanford University¹

03 Apr 2016-bioRxiv

TL;DR: BALAUR as discussed by the authors is a privacy-preserving read mapping algorithm based on locality sensitive hashing and secure kmer voting, which outsources a significant portion of the computation to the public cloud by formulating the alignment task as a voting scheme between encrypted read and reference kmers.

...read moreread less

Abstract: The recent explosion in the amount of available genome sequencing data imposes high computational demands on the tools designed to analyze it. Low-cost cloud computing has the potential to alleviate this burden. However, moving personal genome data analysis to the cloud raises serious privacy concerns. Read alignment is a critical and computationally intensive first step of most genomic data analysis pipelines. While significant effort has been dedicated to optimize the sensitivity and runtime efficiency of this step, few approaches have addressed outsourcing this computation securely to an untrusted party. The few secure solutions that have been proposed either do not scale to whole genome sequencing datasets or are not competitive with the state of the art in read mapping. In this paper, we present BALAUR, a privacy-preserving read mapping algorithm based on locality sensitive hashing and secure kmer voting. BALAUR securely outsources a significant portion of the computation to the public cloud by formulating the alignment task as a voting scheme between encrypted read and reference kmers. Our approach can easily handle typical genome-scale datasets and is highly competitive with non-cryptographic state-of-the-art read aligners in both accuracy and runtime performance on simulated and real read data. Moreover, our approach is significantly faster than state-of-the-art read aligners in long read mapping.

...read moreread less

10 citations

Journal Article•DOI•

Large-scale multilabel propagation based on efficient sparse graph construction

[...]

Xiangyu Chen¹, Yadong Mu², Hairong Liu³, Shuicheng Yan⁴, Yong Rui⁵, Tat-Seng Chua⁴ - Show less +2 more•Institutions (5)

Institute for Infocomm Research Singapore¹, Columbia University², Purdue University³, National University of Singapore⁴, Microsoft⁵

27 Dec 2013-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: A novel sparse graph based multilabel propagation (SGMP) scheme for super large scale datasets, which encodes the label information of an image as a unit label confidence vector and naturally imposes inter-label constraints and manipulates labels interactively.

...read moreread less

Abstract: With the popularity of photo-sharing websites, the number of web images has exploded into unseen magnitude. Annotating such large-scale data will cost huge amount of human resources and is thus unaffordable. Motivated by this challenging problem, we propose a novel sparse graph based multilabel propagation (SGMP) scheme for super large scale datasets. Both the efficacy and accuracy of the image annotation are further investigated under different graph construction strategies, where Gaussian noise and non-Gaussian sparse noise are simultaneously considered in the formulations of these strategies. Our proposed approach outperforms the state-of-the-art algorithms by focusing on: (1) For large-scale graph construction, a simple yet efficient LSH (Locality Sensitive Hashing)-based sparse graph construction scheme is proposed to speed up the construction. We perform the multilabel propagation on this hashing-based graph construction, which is derived with LSH approach followed by sparse graph construction within the individual hashing buckets; (2) To further improve the accuracy, we propose a novel sparsity induced scalable graph construction scheme, which is based on a general sparse optimization framework. Sparsity essentially implies a very strong prior: for large scale optimization, the values of most variables shall be zeros when the solution reaches the optimum. By utilizing this prior, the solutions of large-scale sparse optimization problems can be derived by solving a series of much smaller scale subproblems; (3) For multilabel propagation, different from the traditional algorithms that propagate over individual label independently, our proposed propagation first encodes the label information of an image as a unit label confidence vector and naturally imposes inter-label constraints and manipulates labels interactively. Then, the entire propagation problem is formulated on the concept of Kullback-Leibler divergence defined on probabilistic distributions, which guides the propagation of the supervision information. Extensive experiments on the benchmark dataset NUS-WIDE with 270k images and its lite version NUS-WIDE-LITE with 56k images well demonstrate the effectiveness and scalability of the proposed multi-label propagation scheme.

...read moreread less

10 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics