scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
11 Dec 2011
TL;DR: This paper base their semi-supervised hashing on linear discriminant analysis, where hash functions are learned such that labeled data are used to maximize the separability between binary codes associated with different classes while unlabeledData are used for regularization as well as for balancing condition and pair wise decor relation of bits.
Abstract: Hashing refers to methods for embedding high dimensional data into a similarity-preserving low-dimensional Hamming space such that similar objects are indexed by binary codes whose Hamming distances are small. Learning hash functions from data has recently been recognized as a promising approach to approximate nearest neighbor search for high dimensional data. Most of i®learning to hash' methods resort to either unsupervised or supervised learning to determine hash functions. Recently semi-supervised learning approach was introduced in hashing where pair wise constraints (must link and cannot-link) using labeled data are leveraged while unlabeled data are used for regularization to avoid over-fitting. In this paper we base our semi-supervised hashing on linear discriminant analysis, where hash functions are learned such that labeled data are used to maximize the separability between binary codes associated with different classes while unlabeled data are used for regularization as well as for balancing condition and pair wise decor relation of bits. The resulting method is referred to as semi-supervised discriminant hashing (SSDH). Numerical experiments on MNIST and CIFAR-10 datasets demonstrate that our method outperforms existing methods, especially in the case of short binary codes.

27 citations

Book ChapterDOI
09 Jan 2009
TL;DR: This work proposes Principal Component Hashing (PCH), which exploits the distribution of the stored data and is confirmed that PCH is faster than ANN and LSH at the same accuracy.
Abstract: Nearest Neighbor (NN) search is a basic algorithm for data mining and machine learning applications. However, its acceleration in high dimensional space is a difficult problem. For solving this problem, approximate NN search algorithms have been investigated. Especially, LSH is getting highlighted recently, because it has a clear relationship between relative error ratio and the computational complexity. However, the p-stable LSH computes hash values independent of the data distributions, and hence, sometimes the search fails or consumes considerably long time. For solving this problem, we propose Principal Component Hashing (PCH), which exploits the distribution of the stored data. Through experiments, we confirmed that PCH is faster than ANN and LSH at the same accuracy.

27 citations

Proceedings ArticleDOI
18 Mar 2005
TL;DR: In this paper, a new scheme for fast video retrieval is proposed, in the scheme, a video is represented by a set of feature vectors which are computed using the robust alpha-trimmed average color histogram to efficiently retrieve videos.
Abstract: In this paper, a new scheme for fast video retrieval is proposed. In the scheme, a video is represented by a set of feature vectors which are computed using the robust alpha-trimmed average color histogram. To efficiently retrieve videos, the locality sensitive hashing technique, which involves a uniform distance shrinking projection, is applied. Such a technique does not suffer from the notorious "curse of dimensionality" problem in handling high-dimensional data point sets and guarantees that geometrically close vectors are hashed to the same bucket with high probability. In addition, unlike the conventional techniques, the involved similarity measure incorporates the temporal order of video sequences. The experimental results demonstrate that the proposed scheme outperforms the conventional approaches in accuracy and efficiency.

27 citations

Journal ArticleDOI
01 Jun 1988
TL;DR: This paper considers the problem of achieving analytical performance of hashing techniques in practice with reference to successful search lengths, unsuccessful search lengths and the expected worst case performance (expected length of the longest probe sequence).
Abstract: Much of the literature on hashing deals with overflow handling (collision resolution) techniques and its analysis. What does all the analytical results mean in practice and how can they be achieved with practical files? This paper considers the problem of achieving analytical performance of hashing techniques in practice with reference to successful search lengths, unsuccessful search lengths and the expected worst case performance (expected length of the longest probe sequence). There has been no previous attempt to explicitly link the analytical results to performance of real life files. Also, the previously reported experimental results deal mostly with successful search lengths. We show why the well known division method performs “well” under a specific model of selecting the test file. We formulate and justify an hypothesis that by choosing functions from a particular class of hashing functions, the analytical performance can be obtained in practice on real life files. Experimental results presented strongly support our hypothesis. Several interesting problems arising are mentioned in conclusion.

27 citations

Journal ArticleDOI
Zhenyu Wu1, Ming Zou1
TL;DR: Tag assignments stream clustering (TASC), an incremental scalable community detection method, is proposed based on locality-sensitive hashing, and results indicate that TASC can detect communities more efficiently and effectively.

27 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139