scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In implementing the method described in [1], several misprints were discovered in Algorithms R and T which will cause them to fail to perform their intended function.
Abstract: In addition to the above, possible difficulties in implementing Algorithm T may be avoided by the following comments. The symbol \\ in step (a) denotes set difference. Also, although 8t in step (c2) is an element of the set h of kth differences, when used in step (e), 8j is an element of the set of first differences as defined on page 842 [1]. Finally, note that the value of d which is output in step (g) of Algorithm R corresponds only to the final rotation value discovered by Algorithm T, the others having already been output by step (j) ofT. Now, if only one remainder reduction phf is desired, it is more efficient to modify step (j) of T to determine d and exit. Thus R outputs the first rather than last phf determined by T. When output of more than one phf is desired, the part of step (g) in R which outputs M, N, d, and q more properly belongs as part of step (j) of T, and R should not exit at step (g) until enough phf have been found. Explicitly then: In Algorithm R: (g) [rotation reduction phf] apply Algorithm T to qlM; if sufficient rotation reduction phfs have been found exit, else continue; In Algorithm T: (j) [find rotation value] output M, N, d <-s mod M, and q; In this case, step (j) of T may not exit to step (g) of R to output the phf for each rotation reduction because only the first rotation value for each phf will be found. In implementing the method described in [1], several misprints were discovered in Algorithms R and T which will cause them to fail to perform their intended function. The steps in error may be corrected as follows. In Algorithm R: 1) (e) [loop on M] if M < NLn/aJ, then go to step (c), else go to step (j); 2) (i) [loop on q] if q # 1 andq#M-1 andq'<=K, go to step (d); In Algorithm T: 1) (c3) [update V] V~-VN (i~[1, n]l[i, i + n-k-1] C D})Dhas to contain (n-k) consecutive differences satisfying (2.2)); 2) (j) [fmd rotation value] output d *-s mod M; Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the …

9 citations

Proceedings ArticleDOI
14 May 2012
TL;DR: A batch-mode active learning algorithm for efficient training of kNN classifiers, that substantially reduces the amount of training required, and uses locality sensitive hashing for fast retrieval of nearest neighbors during active selection as well as classification, thus allowing real-time classification with large datasets.
Abstract: Fast image recognition and classification is extremely important in various robotics applications such as exploration, rescue, localization, etc. k-nearest neighbor (kNN) classifiers are popular tools used in classification since they involve no explicit training phase, and are simple to implement. However, they often require large amounts of training data to work well in practice. In this paper, we propose a batch-mode active learning algorithm for efficient training of kNN classifiers, that substantially reduces the amount of training required. As opposed to much previous work on iterative single-sample active selection, the proposed system selects samples in batches. We propose a coverage formulation that enforces selected samples to be distributed such that all data points have labeled samples at a bounded maximum distance, given the training budget, so that there are labeled neighbors in a small neighborhood of each point. Using submodular function optimization, the proposed algorithm presents a near-optimal selection strategy for an otherwise intractable problem. Further we employ uncertainty sampling along with coverage to incorporate model information and improve classification. Finally, we use locality sensitive hashing for fast retrieval of nearest neighbors during active selection as well as classification, which provides 1–2 orders of magnitude speedups thus allowing real-time classification with large datasets.

9 citations

Journal ArticleDOI
TL;DR: An adaptation of the ReliefF algorithm that simplifies the costliest of its step by approximating the nearest neighbor graph using locality‐sensitive hashing (LSH), which can process data sets that are too large for the original ReliefF.
Abstract: Feature selection algorithms, such as ReliefF, are very important for processing high‐dimensionality data sets. However, widespread use of popular and effective such algorithms is limited by their computational cost. We describe an adaptation of the ReliefF algorithm that simplifies the costliest of its step by approximating the nearest neighbor graph using locality‐sensitive hashing (LSH). The resulting ReliefF‐LSH algorithm can process data sets that are too large for the original ReliefF, a capability further enhanced by distributed implementation in Apache Spark. Furthermore, ReliefF‐LSH obtains better results and is more generally applicable than currently available alternatives to the original ReliefF, as it can handle regression and multiclass data sets. The fact that it does not require any additional hyperparameters with respect to ReliefF also avoids costly tuning. A set of experiments demonstrates the validity of this new approach and confirms its good scalability.

9 citations

Proceedings ArticleDOI
Ke Xia1, Yuqing Ma1, Xianglong Liu1, Yadong Mu2, Li Liu 
19 Oct 2017
TL;DR: This work proposes a temporal binary coding solution in an unsupervised manner, which simultaneously considers the intrinsic relations among the visual content and the temporal consistency among the successive frames, and devise an alternating optimization algorithm that enjoys fast training and discriminative hash functions.
Abstract: Recent years have witnessed the success of the emerging hash-based approximate nearest neighbor search techniques in large-scale image retrieval. However, for large-scale video search, most of the existing hashing methods mainly focus on the visual content contained in the still frames, without considering their temporal relations. Therefore, they usually suffer greatly from the insufficient capability of capturing the intrinsic video similarities, from both the visual and the temporal aspects. To address the problem, we propose a temporal binary coding solution in an unsupervised manner, which simultaneously considers the intrinsic relations among the visual content and the temporal consistency among the successive frames. To capture the inherent data similarities among videos, we adopt the sparse, nonnegative feature to characterize the common local visual content and approximate their intrinsic similarities using a low-rank matrix. Then a standard graph-based loss is adopted to guarantee that the learnt hash codes can well preserve the similarities. Furthermore, we introduce a subspace rotation to model the small variation among the successive frames, and thus essentially preserve the temporal consistency in Hamming space. Finally, we formulate the video hashing problem as a joint learning of the binary codes, the hash functions and the temporal variation, and devise an alternating optimization algorithm that enjoys fast training and discriminative hash functions. Extensive experiments on three large video datasets demonstrate the proposed method significantly outperforms a number of state-of-the-art hashing methods.

8 citations

Book ChapterDOI
21 Sep 2018
TL;DR: The experimental results in the tasks of face recognition and video de-duplication demonstrate that the proposed method significantly outperforms the state-of-the-art vector and subspace hashing methods, in terms of both accuracy and efficiency.
Abstract: Finding the nearest subspace is a fundamental problem in many tasks such as recognition, retrieval and optimization. This hard topic has been seldom touched in the literature, except a very few studies that address it using the locality sensitive hashing for subspaces. The existing solutions severely suffer from poor scaling with expensive computational cost or unsatisfying accuracy, when the subspaces originally distribute with arbitrary dimensions. To address these problems, this paper proposes a new and efficient family of locality sensitive hash for linear subspaces of arbitrary dimension. It preserves the angular distances among subspaces by randomly projecting their orthonormal basis and further encoding them with binary codes. The proposed method enjoys fast computation and meanwhile keeps the strong collision probability. Moreover, it owns flexibility to easily balance the performance between the accuracy and efficiency. The experimental results in the tasks of face recognition and video de-duplication demonstrate that the proposed method significantly outperforms the state-of-the-art vector and subspace hashing methods, in terms of both accuracy and efficiency (up to \(16\times \) speedup).

8 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139