Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Self-taught Hashing for Image Retrieval

[...]

Yu Liu¹, Jingkuan Song², Ke Zhou¹, Lingyu Yan³, Li Liu⁴, Fuhao Zou¹, Ling Shao⁴ - Show less +3 more•Institutions (4)

Huazhong University of Science and Technology¹, University of Electronic Science and Technology of China², Hubei University of Technology³, University of East Anglia⁴

13 Oct 2015

TL;DR: A deep self-taught hashing algorithm (DSTH), which generates a set of pseudo labels by analyzing the data itself, and then learns the hash functions for novel data using discriminative deep models and generalizes to support both supervised and unsupervised cases by adaptively incorporating label information.

...read moreread less

Abstract: Hashing algorithm has been widely used to speed up image retrieval due to its compact binary code and fast distance calculation The combination with deep learning boosts the performance of hashing by learning accurate representations and complicated hashing functions So far, the most striking success in deep hashing have mostly involved discriminative models, which require labels To apply deep hashing on datasets without labels, we propose a deep self-taught hashing algorithm (DSTH), which generates a set of pseudo labels by analyzing the data itself, and then learns the hash functions for novel data using discriminative deep models Furthermore, we generalize DSTH to support both supervised and unsupervised cases by adaptively incorporating label information We use two different deep learning framework to train the hash functions to deal with out-of-sample problem and reduce the time complexity without loss of accuracy We have conducted extensive experiments to investigate different settings of DSTH, and compared it with state-of-the-art counterparts in six publicly available datasets The experimental results show that DSTH outperforms the others in all datasets

...read moreread less

55 citations

Book Chapter•DOI•

Fast near neighbor search in high-dimensional binary data

[...]

Anshumali Shrivastava¹, Ping Li¹•Institutions (1)

Cornell University¹

24 Sep 2012

TL;DR: A very simple and effective strategy for sub-linear time near neighbor search is developed, by creating hash tables directly using the bits generated by b-bit minwise hashing.

...read moreread less

Abstract: Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections.

...read moreread less

54 citations

Journal Article•DOI•

FRASH: A framework to test algorithms of similarity hashing

[...]

Frank Breitinger¹, Georgios Stivaktakis¹, Harald Baier¹•Institutions (1)

Darmstadt University of Applied Sciences¹

01 Aug 2013-Digital Investigation

TL;DR: The paper at hand aims at providing an assessment methodology and a sample implementation called FRASH: a framework to test algorithms of similarity hashing, and applies it to the well-known similarity hashing approaches ssdeep and sdhash to show their strengths and weaknesses.

...read moreread less

54 citations

Proceedings Article•

Learning Space Partitions for Nearest Neighbor Search

[...]

Yihe Dong¹, Piotr Indyk², Ilya Razenshteyn¹, Tal Wagner²•Institutions (2)

Microsoft¹, Massachusetts Institute of Technology²

30 Apr 2020

TL;DR: A new framework for building space partitions reducing the problem to balanced graph partitioning followed by supervised classification is developed and the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods as well as classic, data-oblivious LSH.

...read moreread less

Abstract: Space partitions of $\mathbb{R}^d$ underlie a vast and important class of fast nearest neighbor search (NNS) algorithms. Inspired by recent theoretical work on NNS for general metric spaces (Andoni et al. 2018b,c), we develop a new framework for building space partitions reducing the problem to balanced graph partitioning followed by supervised classification. We instantiate this general approach with the KaHIP graph partitioner (Sanders and Schulz 2013) and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS (Aumuller et al. 2017), our experiments show that the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods as well as classic, data-oblivious LSH.

...read moreread less

53 citations

Journal Article•DOI•

Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search

[...]

Debojyoti Dutta¹, Ting Chen¹•Institutions (1)

University of Southern California¹

10 Feb 2007-Bioinformatics

TL;DR: This article shows that it can avoid the one-against-all comparisons of a query spectrum against a very large number of peptides generated from in silico digestion of protein sequences in a database, and can be effectively used for other mass spectra mining applications such as finding clusters of spectra efficiently and accurately.

...read moreread less

Abstract: Motivation: Due to the recent advances in technology of mass spectrometry, there has been an exponential increase in the amount of data being generated in the past few years. Database searches have not been able to keep with this data explosion. Thus, speeding up the data searches becomes increasingly important in mass-spectrometry-based applications. Traditional database search methods use one-against-all comparisons of a query spectrum against a very large number of peptides generated from in silico digestion of protein sequences in a database, to filter potential candidates from this database followed by a detailed scoring and ranking of those filtered candidates. Results: In this article, we show that we can avoid the one-against-all comparisons. The basic idea is to design a set of hash functions to pre-process peptides in the database such that for each query spectrum we can use the hash functions to find only a small subset of peptide sequences that are most likely to match the spectrum. The construction of each hash function is based on a random spectrum and the hash value of a peptide is the normalized shared peak counts score (cosine) between the random spectrum and the hypothetical spectrum of the peptide. To implement this idea, we first embed each peptide into a unit vector in a high-dimensional metric space. The random spectrum is represented by a random vector, and we use random vectors to construct a set of hash functions called locality sensitive hashing (LSH) for preprocessing. We demonstrate that our mapping is accurate. We show that our method can filter out >95.65% of the spectra without missing any correct sequences, or gain 111 times speedup by filtering out 99.64% of spectra while missing at most 0.19% (2 out of 1014) of the correct sequences. In addition, we show that our method can be effectively used for other mass spectra mining applications such as finding clusters of spectra efficiently and accurately. Contact: tingchen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

53 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics