Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

LSHWE: Improving Similarity-Based Word Embedding with Locality Sensitive Hashing for Cyberbullying Detection

[...]

Zehua Zhao¹, Min Gao¹, Fengji Luo², Yi Zhang¹, Qingyu Xiong¹ - Show less +1 more•Institutions (2)

Chongqing University¹, University of Sydney²

19 Jul 2020

TL;DR: Empirical experiments prove the effectiveness of LSHWE in cyberbullying detection, particularly on the "deliberately obfuscated words" problem, and it is highly efficient, it can represent tens of thousands of words in a few minutes on a typical single machine.

...read moreread less

Abstract: Word embedding methods use low-dimensional vectors to represent words in the corpus. Such low-dimensional vectors can capture lexical semantics and greatly improve the cyberbullying detection performance. However, existing word embedding methods have a major limitation in cyberbullying detection task: they cannot represent well on "deliberately obfuscated words", which are used by users to replace bullying words in order to evade detection. These deliberately obfuscated words are often regarded as "rare words" with a little contextual information and are removed during preprocessing. In this paper, we propose a word embedding method called LSHWE to solve this limitation, which is based on an idea that deliberately obfuscated words have a high context similarity with their corresponding bullying words. LSHWE has two steps: firstly, it generates the nearest neighbor matrix according to the co-occurrence matrix and the nearest neighbor list obtained by Locality Sensitive Hashing (LSH); secondly, it uses an LSH-based autoencoder to learn word representations based on these two matrices. Especially, the reconstructed nearest neighbor matrix generated by the LSH-based autoencoder is used to make the representations of deliberately obfuscated words close to their corresponding bullying words. In order to improve the algorithm efficiency, LSHWE uses LSH to generate the nearest neighbor list and the reconstructed nearest neighbor list. Empirical experiments prove the effectiveness of LSHWE in cyberbullying detection, particularly on the "deliberately obfuscated words" problem. Moreover, LSHWE is highly efficient, it can represent tens of thousands of words in a few minutes on a typical single machine.

...read moreread less

7 citations

Proceedings Article•DOI•

High-performance, very low power content-based search engine

[...]

Jiang Wenyu¹, Yu Rongshan¹•Institutions (1)

Institute for Infocomm Research Singapore¹

15 Jul 2013

TL;DR: This paper proposes an alternative hardware-assisted search algorithm that is estimated to be able to provide ≥0.95 recall on a 1-Trillion feature vector database within 700μs at <; 150W, when the hashing bit error rate (BER) would have been 20% even with 1-bit quantization.

...read moreread less

Abstract: Content-based search, such as audio/video fingerprinting, identifies a piece of query content by matching its perceptual features against those from a database of reference content. Such matching is challenging in both scalability and robustness, even with state-of-art methods like Locality Sensitive Hashing (LSH). Previously, Vote Count, a hardware-assisted algorithm, was proposed to provide such scalability and robustness. We have analyzed this algorithm and found that it would however consume very high power, to the point of even making cooling impractical. In this paper, we propose an alternative hardware-assisted search algorithm that is estimated to use very low power while providing scalability and robustness. It is estimated to be able to provide ≥0.95 recall on a 1-Trillion feature vector (~23M hours of video at 12fps signature rate) database within 700μs at <; 150W, when the hashing bit error rate (BER) would have been 20% even with 1-bit quantization. This amounts to over 1000× power and energy savings compared to highly competitive configurations of LSH, while at lower expected system cost and a saving of millions of dollars per year in electricity cost alone.

...read moreread less

7 citations

Proceedings Article•DOI•

Fast feature selection and training for AdaBoost-based concept detection with large scale datasets

[...]

Shi Chen, Jinqiao Wang, Yang Liu, Changsheng Xu, Hanqing Lu - Show less +1 more

25 Oct 2010

TL;DR: Experimental results reveal the method can significantly reduce the training time of the best learner searching procedure, and the performance of the method is comparable with the state-of-art methods.

...read moreread less

Abstract: AdaBoost has been proved a successful statistical learning method for concept detection with high performance of discrimination and generalization. However, it is computationally expensive to train a concept detector using boosting, especially on large scale datasets. The bottleneck of training phase is to select the best learner among massive learners. Traditional approaches for selecting a weak classifier usually run in O(NT), with N examples and T learners. In this paper, we treat the best learner selection as a Nearest Neighbor Search problem in the function space instead of feature space. With the help of Locality Sensitive Hashing (LSH) algorithm, the best learner searching procedure can be speeded up in the time of O(NL), where L is the number of buckets in LSH. Compared with the T (~500,000), the L (~600) is much smaller in our experiments. In addition, through studying the distribution of weak learners and candidate query points, we present an efficient method to try to partition the weak learner points and the feasible region of query points uniformly as much as possible, which can achieve significant improvement in both recall and precision compared with the random projection in traditional LSH algorithm. Experimental results reveal our method can significantly reduce the training time. And still the performance of our method is comparable with the state-of-art methods.

...read moreread less

7 citations

Proceedings Article•DOI•

Robust 3D mesh hashing based on shape features

[...]

Suk-Hwan Lee, Eung-Joo Lee, Ki-Ryong Kwon¹•Institutions (1)

Pukyong National University¹

19 Jul 2010

TL;DR: Experimental results confirm that the proposed hashing method shows robustness against geometrical and topological attacks and provides a unique hash for each model and key.

...read moreread less

Abstract: In this paper, a robust 3D mesh hashing method based on a key-dependent 3D surface feature is developed. The main objectives of the proposed hashing method are to show robustness against content-preserved attacks and to enable blind-detection without using any preprocessing techniques for the attacks. To achieve these objectives, the proposed hashing method projects all vertices to the shape coordinates of 3D SSD and curvedness, and then, it segments the shape coordinates into rectangular blocks and computes the block shape intensity using a permutation key and a random key. A hash is generated by binarizing the block shape intensity. Experimental results confirm that the proposed hashing method shows robustness against geometrical and topological attacks and provides a unique hash for each model and key.

...read moreread less

7 citations

Journal Article•DOI•

Probability-based Incremental Association RulesDiscovery Algorithm with Hashing Technique

[...]

Ratchadaporn Amornchewin

01 Jan 2011-International Journal of Machine Learning and Computing

TL;DR: An existing incremental algorithm, Probability-based incremental association rule discovery, modified, which can reduce not only a number of times to scan an original database but also the number of candidate itemsets to generate frequent and expected frequent 2 itemsets has execution time faster than the previous methods.

...read moreread less

Abstract: Discovery of association rule is one of the most interesting areas of research in data mining, which extracts together occurrence of itemset. In a dynamic database where the new transaction are inserted into the database, keeping patterns up-to-date and discovering new pattern are challenging problems of great practical importance. This may introduce new association rules and some existing association rules would become invalid. It is important to study efficient algorithms for incremental update of association rules in large databases. In this paper, we modify an existing incremental algorithm, Probability-based incremental association rule discovery. The previous algorithm, probability-based incremental association rule discovery algorithm uses principle of Bernoulli trials to find frequent and expected frequent k-itemsets. The set of frequent and expected frequent k-itemsets are determined from a candidate k-itemsets. Generating and testing the set of candidate is a time-consuming step in the algorithm. To reduce the number of candidates 2-itemset that need to repeatedly scan the database and check a large set of candidate, our paper is utilizing a hash technique for the generation of the candidate 2-itemset, especially for the frequent and expected frequent 2-itemsets, to improve the performance of probability-based algorithm. Thus, the algorithm can reduce not only a number of times to scan an original database but also the number of candidate itemsets to generate frequent and expected frequent 2 itemsets. As a result, the algorithm has execution time faster than the previous methods. This paper also conducts simulation experiments to show the performance of the proposed algorithm. The simulation results show that the proposed algorithm has a good performance.

...read moreread less

7 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics