scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
21 Oct 2013
TL;DR: Support Vector Machine (SVM) based Data Redundancy Elimination for Data Aggregation in WSN (SDRE) has been proposed in this work and it minimizes the redundancy and to eliminate the false data to improve performance of WSN.
Abstract: The data aggregation is most important in Wireless Sensor Networks (WSN) due to constraint of resources. There is lot of data redundancy in WSN due to dense deployment. So, it is necessary to minimize the data redundancy by adopting suitable aggregation techniques. To resolve this problem, Support Vector Machine (SVM) based Data Redundancy Elimination for Data Aggregation in WSN (SDRE) has been proposed in this work. First, we build aggregation tree for the given size of the sensor network. Then, SVM method was applied on tree to eliminate the redundant data. The Locality Sensitive Hashing (LSH) is used minimize the data redundancy and to eliminate the false data based on the similarity. The LSH codes are sent to the aggregation supervisor node. The aggregation supervisor finds sensor nodes that have the same data and selects only one sensor node among them to send actual data. The benefit of this approach is it minimizes the redundancy and to eliminate the false data to improve performance of WSN. The performance of proposed approach is measured using the network parameters such as Delay, Energy, Packet drops and Overheads. The SDRE perform better in all the scenarios for different size network and varying data rate.

26 citations

Proceedings ArticleDOI
01 Jun 2014
TL;DR: A two-stage unsupervised hashing framework which harmoniously integrates two state-of-theart hashing algorithms Locality Sensitive Hashing and Iterative Quantization is proposed, which capitalizes on both term and topic similarity among documents, leading to precise document retrieval.
Abstract: This work fulfills sublinear time Nearest Neighbor Search (NNS) in massivescale document collections. The primary contribution is to propose a two-stage unsupervised hashing framework which harmoniously integrates two state-of-theart hashing algorithms Locality Sensitive Hashing (LSH) and Iterative Quantization (ITQ). LSH accounts for neighbor candidate pruning, while ITQ provides an efficient and effective reranking over the neighbor pool captured by LSH. Furthermore, the proposed hashing framework capitalizes on both term and topic similarity among documents, leading to precise document retrieval. The experimental results convincingly show that our hashing based document retrieval approach well approximates the conventional Information Retrieval (IR) method in terms of retrieving semantically similar documents, and meanwhile achieves a speedup of over one order of magnitude in query time.

26 citations

Proceedings ArticleDOI
Guoqiang Zhong1, Hui Xu1, Pan Yang1, Sijiang Wang1, Junyu Dong1 
24 Jul 2016
TL;DR: This paper proposes a supervised hashing learning method based on a well designed deep convolutional neural network, which tries to learn hashing code and compact representations of data simultaneously.
Abstract: Hashing-based methods seek compact and efficient binary codes that preserve the similarity between data. For most existing hashing methods, an input (e.g. image) is first encoded as a vector of hand-crafted visual feature, followed by a hash projection and quantization step to obtain the compact binary vector. Most of hand-crafted features only encode low-level information of the input, the feature may not preserve semantic similarities of pairwise inputs. Meanwhile, the hash function learning process is independent with the feature representation, so that the feature may not be optimal for the hash projection. In this paper, we propose a supervised hashing learning method based on a well designed deep convolutional neural network, which tries to learn hashing code and compact representations of data simultaneously. Particularly, the proposed model learns binary codes by adding a compact sigmoid layer before the classifier layer. Experiments on several image data sets show that the proposed model outperforms other state-of-the-art hashing learning approaches.

26 citations

Proceedings ArticleDOI
11 Jul 2021
TL;DR: Xia et al. as discussed by the authors proposed a method based on Locality Sensitive Hashing (LSH) that can detect near-duplicates in sublinear time for a given query.
Abstract: Recently, research on explainable recommender systems has drawn much attention from both academia and industry, resulting in a variety of explainable models. As a consequence, their evaluation approaches vary from model to model, which makes it quite difficult to compare the explainability of different models. To achieve a standard way of evaluating recommendation explanations, we provide three benchmark datasets for EXplanaTion RAnking (denoted as EXTRA), on which explainability can be measured by ranking-oriented metrics. Constructing such datasets, however, poses great challenges. First, user-item-explanation triplet interactions are rare in existing recommender systems, so how to find alternatives becomes a challenge. Our solution is to identify nearly identical sentences from user reviews. This idea then leads to the second challenge, i.e., how to efficiently categorize the sentences in a dataset into different groups, since it has quadratic runtime complexity to estimate the similarity between any two sentences. To mitigate this issue, we provide a more efficient method based on Locality Sensitive Hashing (LSH) that can detect near-duplicates in sub-linear time for a given query. Moreover, we make our code publicly available to allow researchers in the community to create their own datasets.

26 citations

Proceedings Article
12 Dec 2011
TL;DR: This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efficient large scale search and significantly outperforms the start-of-the-art learning to hash methods, as well as state of theart high dimensional search algorithms.
Abstract: High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efficiency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efficiency. Consequently such an embedding has to be used with linear scan or another search algorithm. Hence learning to hash does not directly address the search efficiency issue. This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efficient large scale search. Our approach takes both search quality and computational cost into consideration. Specifically, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. The output of this search forest can be efficiently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efficiency. Experimental results show that our approach significantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees).

26 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139