Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Stochastically robust personalized ranking for LSH recommendation retrieval

[...]

Dung D. Le¹, Hady W. Lauw¹•Institutions (1)

Singapore Management University¹

03 Apr 2020

TL;DR: This paper proposes a framework named øurmodel, which factors in the stochasticity of LSH hash functions when learning real-valued user and item latent vectors, eventually improving the recommendation accuracy after LSH indexing.

...read moreread less

Abstract: Locality Sensitive Hashing (LSH) has become one of the most commonly used approximate nearest neighbor search techniques to avoid the prohibitive cost of scanning through all data points. For recommender systems, LSH achieves efficient recommendation retrieval by encoding user and item vectors into binary hash codes, reducing the cost of exhaustively examining all the item vectors to identify the top-k items. However, conventional matrix factorization models may suffer from performance degeneration caused by randomly-drawn LSH hash functions, directly affecting the ultimate quality of the recommendations. In this paper, we propose a framework named ourmodel, which factors in the stochasticity of LSH hash functions when learning real-valued user and item latent vectors, eventually improving the recommendation accuracy after LSH indexing. Experiments on publicly available datasets show that the proposed framework not only effectively learns user's preferences for prediction, but also achieves high compatibility with LSH stochasticity, producing superior post-LSH indexing performances as compared to state-of-the-art baselines.

...read moreread less

8 citations

Proceedings Article•DOI•

Parameter-free Locality Sensitive Hashing for Spherical Range Reporting

[...]

Thomas D. Ahle¹, Martin Aumüller¹, Rasmus Pagh¹•Institutions (1)

University of Copenhagen¹

09 May 2016-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors present a data structure for spherical range reporting on a point set, i.e., reporting all points in the point set that lie within radius of a given query point.

...read moreread less

Abstract: We present a data structure for *spherical range reporting* on a point set $S$, i.e., reporting all points in $S$ that lie within radius $r$ of a given query point $q$. Our solution builds upon the Locality-Sensitive Hashing (LSH) framework of Indyk and Motwani, which represents the asymptotically best solutions to near neighbor problems in high dimensions. While traditional LSH data structures have several parameters whose optimal values depend on the distance distribution from $q$ to the points of $S$, our data structure is parameter-free, except for the space usage, which is configurable by the user. Nevertheless, its expected query time basically matches that of an LSH data structure whose parameters have been *optimally chosen for the data and query* in question under the given space constraints. In particular, our data structure provides a smooth trade-off between hard queries (typically addressed by standard LSH) and easy queries such as those where the number of points to report is a constant fraction of $S$, or where almost all points in $S$ are far away from the query point. In contrast, known data structures fix LSH parameters based on certain parameters of the input alone. The algorithm has expected query time bounded by $O(t (n/t)^\rho)$, where $t$ is the number of points to report and $\rho\in (0,1)$ depends on the data distribution and the strength of the LSH family used. We further present a parameter-free way of using multi-probing, for LSH families that support it, and show that for many such families this approach allows us to get expected query time close to $O(n^\rho+t)$, which is the best we can hope to achieve using LSH. The previously best running time in high dimensions was $\Omega(t n^\rho)$. For many data distributions where the intrinsic dimensionality of the point set close to $q$ is low, we can give improved upper bounds on the expected query time.

...read moreread less

8 citations

Proceedings Article•DOI•

Succinct Trit-array Trie for Scalable Trajectory Similarity Search

[...]

Shunsuke Kanda, Koh Takeuchi¹, Keisuke Fujii², Yasuo Tabei•Institutions (2)

Kyoto University¹, Nagoya University²

03 Nov 2020

TL;DR: This work presents the trajectory-indexing succinct trit-array trie (tSTAT), which is a scalable method leveraging LSH for trajectory similarity searches and shows that tSTAT performs superiorly in comparison to state-of-the-art similarity search methods.

...read moreread less

Abstract: Massive datasets of spatial trajectories representing the mobility of a diversity of moving objects are ubiquitous in research and industry. Similarity search of a large collection of trajectories is indispensable for turning these datasets into knowledge. Locality sensitive hashing (LSH) is a powerful technique for fast similarity searches. Recent methods employ LSH and attempt to realize an efficient similarity search of trajectories; however, those methods are inefficient in terms of search time and memory when applied to massive datasets. To address this problem, we present the trajectory-indexing succinct trit-array trie (tSTAT), which is a scalable method leveraging LSH for trajectory similarity searches. tSTAT quickly performs the search on a tree data structure called trie. We also present two novel techniques that enable to dramatically enhance the memory efficiency of tSTAT. One is a node reduction technique that substantially omits redundant trie nodes while maintaining the time performance. The other is a space-efficient representation that leverages the idea behind succinct data structures (i.e., a compressed data structure supporting fast data operations). We experimentally test tSTAT on its ability to retrieve similar trajectories for a query from large collections of trajectories and show that tSTAT performs superiorly in comparison to state-of-the-art similarity search methods.

...read moreread less

8 citations

Journal Article•DOI•

An Online Malicious Spam Email Detection System Using Resource Allocating Network with Locality Sensitive Hashing

[...]

Siti-Hajar-Aminah Ali¹, Seiichi Ozawa², Junji Nakazato³, Tao Ban³, Jumpei Shimamura - Show less +1 more•Institutions (3)

Universiti Tun Hussein Onn Malaysia¹, Kobe University², National Institute of Information and Communications Technology³

15 Apr 2015-Journal of Intelligent Learning Systems and Applications

TL;DR: A new online system that can quickly detect malicious spam emails and adapt to the changes in the email contents and the Uniform Resource Locator links leading to malicious websites by updating the system daily is proposed.

...read moreread less

Abstract: In this paper, we propose a new online system that can quickly detect malicious spam emails and adapt to the changes in the email contents and the Uniform Resource Locator (URL) links leading to malicious websites by updating the system daily. We introduce an autonomous function for a server to generate training examples, in which double-bounce emails are automatically collected and their class labels are given by a crawler-type software to analyze the website maliciousness called SPIKE. In general, since spammers use botnets to spread numerous malicious emails within a short time, such distributed spam emails often have the same or similar contents. Therefore, it is not necessary for all spam emails to be learned. To adapt to new malicious campaigns quickly, only new types of spam emails should be selected for learning and this can be realized by introducing an active learning scheme into a classifier model. For this purpose, we adopt Resource Allocating Network with Locality Sensitive Hashing (RAN-LSH) as a classifier model with a data selection function. In RAN-LSH, the same or similar spam emails that have already been learned are quickly searched for a hash table in Locally Sensitive Hashing (LSH), in which the matched similar emails located in “well-learned” are discarded without being used as training data. To analyze email contents, we adopt the Bag of Words (BoW) approach and generate feature vectors whose attributes are transformed based on the normalized term frequency-inverse document frequency (TF-IDF). We use a data set of double-bounce spam emails collected at National Institute of Information and Communications Technology (NICT) in Japan from March 1st, 2013 until May 10th, 2013 to evaluate the performance of the proposed system. The results confirm that the proposed spam email detection system has capability of detecting with high detection rate.

...read moreread less

7 citations

Proceedings Article•DOI•

Instance-based Inductive Deep Transfer Learning by Cross-Dataset Querying with Locality Sensitive Hashing

[...]

Somnath Basu Roy Chowdhury¹, Annervaz M, Ambedkar Dukkipati²•Institutions (2)

Indian Institute of Technology Kharagpur¹, Indian Institute of Science²

01 Nov 2019

TL;DR: This work proposes an inductive transfer learning method that can augment learning models by infusing similar instances from different learning tasks in Natural Language Processing (NLP) domain and shows that one can achieve competitive/better performance than learning from a single dataset.

...read moreread less

Abstract: Supervised learning models are typically trained on a single dataset and the performance of these models rely heavily on the size of the dataset i.e., the amount of data available with ground truth. Learning algorithms try to generalize solely based on the data that it is presented with during the training. In this work, we propose an inductive transfer learning method that can augment learning models by infusing similar instances from different learning tasks in Natural Language Processing (NLP) domain. We propose to use instance representations from a source dataset, without inheriting anything else from the source learning model. Representations of the instances of source and target datasets are learned, retrieval of relevant source instances is performed using soft-attention mechanism and locality sensitive hashing and then augmented into the model during training on the target dataset. Therefore, while learning from a training data, we also simultaneously exploit and infuse relevant local instance-level information from an external data. Using this approach we have shown significant improvements over the baseline for three major news classification datasets. Experimental evaluations also show that the proposed approach reduces dependency on labeled data by a significant margin for comparable performance. With our proposed cross dataset learning procedure we show that one can achieve competitive/better performance than learning from a single dataset.

...read moreread less

7 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics