scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings Article
03 Aug 2013
TL;DR: The experimental results show it is not necessary to update all hash bits in order to adapt the model to new input data, and meanwhile the proposals obtain better or similar performance without sacrificing much accuracy against the batch mode update.
Abstract: Recent years have witnessed the growing popularity of hash function learning for large-scale data search. Although most existing hashing-based methods have been proven to obtain high accuracy, they are regarded as passive hashing and assume that the labelled points are provided in advance. In this paper, we consider updating a hashing model upon gradually increased labelled data in a fast response to users, called smart hashing update (SHU). In order to get a fast response to users, SHU aims to select a small set of hash functions to relearn and only updates the corresponding hash bits of all data points. More specifically, we put forward two selection methods for performing efficient and effective update. In order to reduce the response time for acquiring a stable hashing algorithm, we also propose an accelerated method in order to further reduce interactions between users and the computer. We evaluate our proposals on two benchmark data sets. Our experimental results show it is not necessary to update all hash bits in order to adapt the model to new input data, and meanwhile we obtain better or similar performance without sacrificing much accuracy against the batch mode update.

12 citations

Proceedings ArticleDOI
22 Jun 2015
TL;DR: A novel Semantic-aware Hashing method (SaH) is proposed by discovering knowledge from these social media resources to implement approximate similarity search by exploiting heterogeneous information from the textual and visual domains.
Abstract: With the proliferation of large-scale social images, recent years have witnessed the increasing amount of images with user-provided tags, which leads to considerable effort made on hashing based approximate nearest neighbor (ANN) search in huge databases. In this work, we propose a novel Semantic-aware Hashing method (SaH) by discovering knowledge from these social media resources to implement approximate similarity search. Different from the previous work, the proposed method learns semantic hashing codes by exploiting heterogeneous information from the textual and visual domains. The semantic structure in the textual domain is well preserved to learn the binary codes. To handle the noisy, incomplete, or subjective user-provided tags, the visual structure is also leveraged. On the other hand, an information theoretic regularization is exploited by using maximum entropy principle and a row-wise sparse model with l2,p (0 p ≤ 1) mixed norm is introduced to filter certain noisy or redundant visual features. Experiments are conducted on a widely-used social image dataset and the comparison results demonstrate the outperforming performance of the proposed SaH method over state-of-the-art hashing techniques.

12 citations

Proceedings ArticleDOI
Zhou Yu1, Fei Wu1, Yin Zhang1, Siliang Tang1, Jian Shao1, Yueting Zhuang1 
03 Jul 2014
TL;DR: In this paper, the hashing problem is considered from the perspective of optimizing a list-wise learning to rank problem and an approach called List-Wise supervised Hashing (LWH) is proposed, which obtains a significant improvement over the state-of-the-art hashing approaches due to both structural large margin and list- Wise ranking pursuing in a supervised manner.
Abstract: Hashing techniques have been extensively investigated to boost similarity search for large-scale high-dimensional data. Most of the existing approaches formulate the their objective as a pair-wise similarity-preserving problem. In this paper, we consider the hashing problem from the perspective of optimizing a list-wise learning to rank problem and propose an approach called List-Wise supervised Hashing (LWH). In LWH, the hash functions are optimized by employing structural SVM in order to explicitly minimize the ranking loss of the whole list-wise permutations instead of merely the point-wise or pair-wise supervision. We evaluate the performance of LWH on two real-world data sets. Experimental results demonstrate that our method obtains a significant improvement over the state-of-the-art hashing approaches due to both structural large margin and list-wise ranking pursuing in a supervised manner.

12 citations

Patent
01 Jul 2015
TL;DR: In this paper, a device and a method for detecting distributed malicious codes on the basis of textures is presented. But the device and the method have the advantage that unknown malicious code and the types of the unknown malicious codes can be detected by the aid of the device.
Abstract: The invention discloses a device and a method for detecting distributed malicious codes on the basis of textures. The device comprises a texture fingerprint extracting unit, a Bloom-Filter index structure building unit, a distributed LSH (locality sensitive hashing) index structure building unit and a distributed variant detecting unit. The texture fingerprint extracting unit is used for generating vector sets of texture fingerprints of the malicious codes according to PE (portable executable) files of the malicious codes and extracting vectors of texture fingerprints of to-be-detected samples; the Bloom-Filter index structure building unit is used for mapping the vector sets of the texture fingerprints of the malicious codes into Bloom-Filter index structures; the distributed LSH index structure building unit is used for building distributed LSH index structures; the distributed variant detecting unit is used for creating target query sets when a precision detecting unit is missed, computing locality sensitive hash values, machine identification and hash bucket identification of the target query sets, finding vectors of the texture fingerprints of the malicious codes in the distributed LSH index structures according to computation results and obtaining detection results by means of comparison. The device and the method have the advantage that unknown malicious codes and the types of the unknown malicious codes can be detected by the aid of the device and the method.

12 citations

Proceedings ArticleDOI
01 Aug 2016
TL;DR: This work applies the sublinear time, scalable locality-sensitive hashing (LSH) and majority discrimination to the problem of predicting critical events based on physiological waveform time series and demonstrates that the deterioration of accuracy due to approximation at the retrieval step of LSH has a diminishing impact on the prediction accuracy as the speed up gain accelerates.
Abstract: We apply the sublinear time, scalable locality-sensitive hashing (LSH) and majority discrimination to the problem of predicting critical events based on physiological waveform time series. Compared to using the linear exhaustive k-nearest neighbor search, our proposed method vastly speeds up prediction time up to 25 times while sacrificing only 1% of accuracy when demonstrated on an arterial blood pressure dataset extracted from the MIMIC2 database. We compare two widely used variants of LSH, the bit sampling based (L1LSH) and the random projection based (E2LSH) methods to measure their direct impact on retrieval and prediction accuracy. We experimentally show that the more sophisticated E2LSH performs worse than L1LSH in terms of accuracy, correlation, and the ability to detect false negatives. We attribute this to E2LSH's simultaneous integration of all dimensions when hashing the data, which actually makes it more impotent against common noise sources such as data misalignment. We also demonstrate that the deterioration of accuracy due to approximation at the retrieval step of LSH has a diminishing impact on the prediction accuracy as the speed up gain accelerates.

12 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139