Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Query-adaptive Image Retrieval by Deep Weighted Hashing

[...]

Jian Zhang¹, Yuxin Peng¹•Institutions (1)

Peking University¹

08 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: The query-adaptive deep weighted hashing approach is proposed, which can perform fine-grained ranking for different queries by weighted Hamming distance and outperforms eight state-of-the-art hashing methods.

...read moreread less

Abstract: Hashing methods have attracted much attention for large scale image retrieval. Some deep hashing methods have achieved promising results by taking advantage of the strong representation power of deep networks recently. However, existing deep hashing methods treat all hash bits equally. On one hand, a large number of images share the same distance to a query image due to the discrete Hamming distance, which raises a critical issue of image retrieval where fine-grained rankings are very important. On the other hand, different hash bits actually contribute to the image retrieval differently, and treating them equally greatly affects the retrieval accuracy of image. To address the above two problems, we propose the query-adaptive deep weighted hashing (QaDWH) approach, which can perform fine-grained ranking for different queries by weighted Hamming distance. First, a novel deep hashing network is proposed to learn the hash codes and corresponding class-wise weights jointly, so that the learned weights can reflect the importance of different hash bits for different image classes. Second, a query-adaptive image retrieval method is proposed, which rapidly generates hash bit weights for different query images by fusing its semantic probability and the learned class-wise weights. Fine-grained image retrieval is then performed by the weighted Hamming distance, which can provide more accurate ranking than the traditional Hamming distance. Experiments on four widely used datasets show that the proposed approach outperforms eight state-of-the-art hashing methods.

...read moreread less

37 citations

Journal Article•DOI•

CellAtlasSearch: a scalable search engine for single cells.

[...]

Divyanshu Srivastava¹, Arvind Iyer¹, Vibhor Kumar¹, Debarka Sengupta¹•Institutions (1)

Indraprastha Institute of Information Technology¹

02 Jul 2018-Nucleic Acids Research

TL;DR: CellAtlasSearch is proposed, a novel search architecture for high dimensional expression data, which is massively parallel as well as light-weight, thus infinitely scalable, and aims to assist researchers and clinicians in characterizing unannotated single cells.

...read moreread less

Abstract: Owing to the advent of high throughput single cell transcriptomics, past few years have seen exponential growth in production of gene expression data. Recently efforts have been made by various research groups to homogenize and store single cell expression from a large number of studies. The true value of this ever increasing data deluge can be unlocked by making it searchable. To this end, we propose CellAtlasSearch, a novel search architecture for high dimensional expression data, which is massively parallel as well as light-weight, thus infinitely scalable. In CellAtlasSearch, we use a Graphical Processing Unit (GPU) friendly version of Locality Sensitive Hashing (LSH) for unmatched speedup in data processing and query. Currently, CellAtlasSearch features over 300 000 reference expression profiles including both bulk and single-cell data. It enables the user query individual single cell transcriptomes and finds matching samples from the database along with necessary meta information. CellAtlasSearch aims to assist researchers and clinicians in characterizing unannotated single cells. It also facilitates noise free, low dimensional representation of single-cell expression profiles by projecting them on a wide variety of reference samples. The web-server is accessible at: http://www.cellatlassearch.com.

...read moreread less

37 citations

Journal Article•

Spherical LSH for Approximate Nearest Neighbor Search on Unit Hypersphere

[...]

Tanaka Yuzuru, 憲吾寺沢

01 Aug 2007-Lecture Notes in Computer Science

37 citations

Journal Article•DOI•

Locality-sensitive hashing for earthquake detection: a case study of scaling data-driven science

[...]

Kexin Rong¹, C. E. Yoon¹, Karianne J. Bergen¹, Hashem Elezabi¹, Peter Bailis¹, Philip Levis¹, Gregory C. Beroza¹ - Show less +3 more•Institutions (1)

Stanford University¹

01 Jul 2018

TL;DR: In this article, a novel application of Locality Sensitive Hashing (LSH) to seismic data at scale is reported. But, a straightforward implementation of this LSH-enabled application has difficulty scaling beyond 3 months of continuous time series data measured at a single seismic station.

...read moreread less

Abstract: In this work, we report on a novel application of Locality Sensitive Hashing (LSH) to seismic data at scale. Based on the high waveform similarity between reoccurring earthquakes, our application identifies potential earthquakes by searching for similar time series segments via LSH. However, a straightforward implementation of this LSH-enabled application has difficulty scaling beyond 3 months of continuous time series data measured at a single seismic station. As a case study of a data-driven science workflow, we illustrate how domain knowledge can be incorporated into the workload to improve both the efficiency and result quality. We describe several end-to-end optimizations of the analysis pipeline from pre-processing to post-processing, which allow the application to scale to time series data measured at multiple seismic stations. Our optimizations enable an over 100× speedup in the end-to-end analysis pipeline. This improved scalability enabled seismologists to perform seismic analysis on more than ten years of continuous time series data from over ten seismic stations, and has directly enabled the discovery of 597 new earthquakes near the Diablo Canyon nuclear power plant in California and 6123 new earthquakes in New Zealand.

...read moreread less

37 citations

Journal Article•DOI•

An incremental clustering scheme for data de-duplication

[...]

Gianni Costa¹, Giuseppe Manco¹, Riccardo Ortale¹•Institutions (1)

Indian Council of Agricultural Research¹

01 Jan 2010-Data Mining and Knowledge Discovery

TL;DR: An incremental technique for discovering duplicates in large databases of textual sequences, i.e., syntactically different tuples, that refer to the same real-world entity that is efficiently identified by simply retrieving those tuples that appear in the same buckets associated to the query tuple itself, without completely scanning the original database.

...read moreread less

Abstract: We propose an incremental technique for discovering duplicates in large databases of textual sequences, i.e., syntactically different tuples, that refer to the same real-world entity. The problem is approached from a clustering perspective: given a set of tuples, the objective is to partition them into groups of duplicate tuples. Each newly arrived tuple is assigned to an appropriate cluster via nearest-neighbor classification. This is achieved by means of a suitable hash-based index, that maps any tuple to a set of indexing keys and assigns tuples with high syntactic similarity to the same buckets. Hence, the neighbors of a query tuple can be efficiently identified by simply retrieving those tuples that appear in the same buckets associated to the query tuple itself, without completely scanning the original database. Two alternative schemes for computing indexing keys are discussed and compared. An extensive experimental evaluation on both synthetic and real data shows the effectiveness of our approach.

...read moreread less

36 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics