scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel latent factor model called latent collaborative relations (LCR) is proposed, which transforms the recommendation problem into a nearest neighbor search problem by using the proposed scoring function, and provides an elegant way to incorporate with locality sensitive hashing to provide a fast recommendation while retaining recommendation accuracy and coverage.
Abstract: Devise a recommendation algorithm based on latent factor model.Combines latent factors and ź2 norm to formulate the recommendation problem as a k-nearest-neighbor problem, in which we further use locality sensitive hashing (LSH) to reduce search time complexity.Speedup the retrieval by 5X -313X on three data sets used in the experiments. One important property of collaborative filtering recommender systems is that popular items are recommended disproportionately often because they provide extensive usage data and, thus, can be recommended to more users. Compared to popular products, the niches can be as economically attractive as mainstream fare for online retailers. The online retailers can stock virtually everything, and the number of available niche products exceeds the hits by several orders of magnitude. This work addresses accuracy, coverage and prediction time issues to propose a novel latent factor model called latent collaborative relations (LCR), which transforms the recommendation problem into a nearest neighbor search problem by using the proposed scoring function. We project users and items to the latent space, and calculate their similarities based on Euclidean metric. Additionally, the proposed model provides an elegant way to incorporate with locality sensitive hashing (LSH) to provide a fast recommendation while retaining recommendation accuracy and coverage. The experimental results indicate that the speedup is significant, especially when one is confronted with large-scale data sets. As for recommendation accuracy and coverage, the proposed method is competitive on three data sets.

18 citations

Journal ArticleDOI
TL;DR: A scalable image retrieval framework which can efficiently support content similarity search and semantic search in the distributed environment is proposed and it is shown that the approach yields high recall rate with good load balance and only requires a few number of hops.
Abstract: The emergence of cloud datacenters enhances the capability of online data storage. Since massive data is stored in datacenters, it is necessary to effectively locate and access interest data in such a distributed system. However, traditional search techniques only allow users to search images over exact-match keywords through a centralized index. These techniques cannot satisfy the requirements of content based image retrieval (CBIR). In this paper, we propose a scalable image retrieval framework which can efficiently support content similarity search and semantic search in the distributed environment. Its key idea is to integrate image feature vectors into distributed hash tables (DHTs) by exploiting the property of locality sensitive hashing (LSH). Thus, images with similar content are most likely gathered into the same node without the knowledge of any global information. For searching semantically close images, the relevance feedback is adopted in our system to overcome the gap between low-level features and high-level features. We show that our approach yields high recall rate with good load balance and only requires a few number of hops.

18 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: A framework of Sparse Ternary Codes (STC) is proposed resulting in sparse, but robust representation and sub-linear complexity of search and is compared with the Locality Sensitive Hashing and the memory vectors on several large-scale synthetic and public image databases showing its superiority.
Abstract: We consider the problem of fast content identification in high-dimensional feature spaces where a sub-linear search complexity is required. By formulating the problem as sparse approximation of projected coefficients, a closed-form solution can be found which we approximate as a ternary representation. Hence, as opposed to dense binary codes, a framework of Sparse Ternary Codes (STC) is proposed resulting in sparse, but robust representation and sub-linear complexity of search. The proposed method is compared with the Locality Sensitive Hashing (LSH) and the memory vectors on several large-scale synthetic and public image databases, showing its superiority.

18 citations

Proceedings ArticleDOI
26 Aug 2015
TL;DR: This work proposes a novel LSH-based inverted index scheme and design an efficient search algorithm, called H-c2kNN, which enables fast high-dimensional kNN search with excellent quality and low space cost, and implements this approach using MapReduce.
Abstract: Finding the k-Nearest Neighbors (kNN) of a query object for a given dataset S is a primitive operation in many application domains. kNN search is very costly, especially many applications witness a quick increase in the amount and dimension of data to be processed. Locality sensitive hashing (LSH) has become a very popular method for this problem. However, most such methods can't obtain good performance in terms of search quality, search efficiency and space cost at the same time, such as RankReduce, which gains good search efficiency at the sacrifice of the search quality. Motivated by these, we propose a novel LSH-based inverted index scheme and design an efficient search algorithm, called H-c2kNN, which enables fast high-dimensional kNN search with excellent quality and low space cost. For efficiency and scalability concerns, we implemented our proposed approach to solve the kNN search in high dimensional space using MapReduce, which is a well-known framework for data-intensive applications and conducted extensive experiments to evaluate our proposed approach using both synthetic and real datasets. The results show that our proposed approach outperforms baseline methods in high dimensional space.

18 citations

Journal ArticleDOI
TL;DR: This study proposes an approach called as Randomized Distributed Hashing (RDH) which uses Locality Sensitive Hashes (LSH) in a distributed scheme which is promising for searching images in large datasets with multiple nodes.

18 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139