scispace - formally typeset
Search or ask a question
Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.


Papers
More filters
Proceedings ArticleDOI
Junsong Yuan1, Wei Wang2, Jingjing Meng2, Ying Wu1, Dongge Li2 
29 Sep 2007
TL;DR: A novel method which translates repetitive clip mining to the continuous path finding problem in a matching trellis, where sequence matching can be accelerated by taking advantage of the temporal redundancies in the videos.
Abstract: Automatically discovering repetitive clips from large video database is a challenging problem due to the enormous computational cost involved in exploring the huge solution space. Without any a priori knowledge of the contents, lengths and total number of the repetitive clips, we need to discover all of them in the video database. To address the large computational cost, we propose a novel method which translates repetitive clip mining to the continuous path finding problem in a matching trellis, where sequence matching can be accelerated by taking advantage of the temporal redundancies in the videos. By applying the locality sensitive hashing (LSH) for efficient similarity query and the proposed continuous path finding algorithm, our method is of only quadratic complexity of the database size. Experiments conducted on a 10.5-hour TRECVID news dataset have shown the effectiveness, which can discover repetitive clips of various lengths and contents in only 25 minutes, with features extracted off-line.

13 citations

Proceedings ArticleDOI
20 Apr 2020
TL;DR: This work proposes a lightweight distributed indexing framework, called ChainLink, that supports approximate kNN queries over TB-scale time series data, and designs a novel hashing technique, called Single Pass Signature (SPS), that successfully tackles the above problem.
Abstract: Scalable subsequence matching is critical for supporting analytics on big time series from mining, prediction to hypothesis testing. However, state-of-the-art subsequence matching techniques do not scale well to TB-scale datasets. Not only does index construction become prohibitively expensive, but also the query response time deteriorates quickly as the length of the query subsequence exceeds several 100s of data points. Although Locality Sensitive Hashing (LSH) has emerged as a promising solution for indexing long time series, it relies on expensive hash functions that perform multiple passes over the data and thus is impractical for big time series. In this work, we propose a lightweight distributed indexing framework, called ChainLink, that supports approximate kNN queries over TB-scale time series data. As a foundation of ChainLink, we design a novel hashing technique, called Single Pass Signature (SPS), that successfully tackles the above problem. In particular, we prove theoretically and demonstrate experimentally that the similarity proximity of the indexed subsequences is preserved by our proposed single-pass SPS scheme. Leveraging this SPS innovation, Chainlink then adopts a three-step approach for scalable index building: (1) in-place data re-organization within each partition to enable efficient record-level random access to all subsequences, (2) parallel building of hash-based local indices on top of the re-organized data using our SPS scheme for efficient search within each partition, and (3) efficient aggregation of the local indices to construct a centralized yet highly compact global index for effective pruning of irrelevant partitions during query processing. ChainLink achieves the above three steps in one single map-reduce process. Our experimental evaluation shows that ChainLink indices are compact at less than 2% of dataset size while state-of-the-art index sizes tend to be almost the same size as the dataset. Better still, ChainLink is up to 2 orders of magnitude faster in its index construction time compared to state-of-the-art techniques, while improving both the final query response time by up to 10 fold and the result accuracy by 15%.

13 citations

Proceedings ArticleDOI
06 Nov 2018
TL;DR: A topological signature that maps each trajectory to a relatively low dimensional Euclidean space, so that now they are amenable to standard analytic techniques and contain enough topological information to reconstruct non-self-intersecting trajectories upto homotopy type.
Abstract: Analytic methods can be difficult to build and costly to train for mobility data. We show that information about the topology of the space and how mobile objects navigate the obstacles can be used to extract insights about mobility at larger distance scales. The main contribution of this paper is a topological signature that maps each trajectory to a relatively low dimensional Euclidean space, so that now they are amenable to standard analytic techniques. Data mining tasks: nearest neighbor search with locality sensitive hashing, clustering, regression, etc., work more efficiently in this signature space. We define the problem of mobility prediction at different distance scales, and show that with the signatures simple k nearest neighbor based regression perform accurate prediction. Experiments on multiple real datasets show that the framework using topological signatures is accurate on all tasks, and substantially more efficient than machine learning applied to raw data. Theoretical results show that the signatures contain enough topological information to reconstruct non-self-intersecting trajectories upto homotopy type. The construction of signatures is based on a differential form that can be generated in a distributed setting using local communication, and a signature can be locally and inexpensively updated and communicated by a mobile agent.

13 citations

Book ChapterDOI
TL;DR: A survey of existing probabilistic state space exploration methods is given, including bitstate hashing, which was introduced in order to lower the probability of producing a wrong result, but maintaining the memory and runtime efficiency.
Abstract: Several methods have been developed to validate the correctness and performance of hard- and software systems. One way to do this is to model the system and carry out a state space exploration in order to detect all possible states. In this paper, a survey of existing probabilistic state space exploration methods is given. The paper starts with a thorough review and analysis of bitstate hashing, as introduced by Holzmann. The main idea of this initial approach is the mapping of each state onto a specific bit within an array by employing a hash function. Thus a state is represented by a single bit, rather than by a full descriptor. Bitstate hashing is efficient concerning memory and runtime, but it is hampered by the non deterministic omission of states. The resulting positive probability of producing wrong results is due to the fact that the mapping of full state descriptors onto much smaller representatives is not injective. – The rest of the paper is devoted to the presentation, analysis, and comparison of improvements of bitstate hashing, which were introduced in order to lower the probability of producing a wrong result, but maintaining the memory and runtime efficiency. These improvements can be mainly grouped into two categories: The approaches of the first group, the so called multiple hashing schemes, employ multiple hash functions on either a single or on multiple arrays. The approaches of the remaining category follow the idea of hash compaction. I.e. the diverse schemes of this category store a hash value for each detected state, rather than associating a single or multiple bit positions with it, leading to persuasive reductions of the probability of error if compared to the original bitstate hashing scheme.

13 citations

Proceedings ArticleDOI
28 Aug 2014
TL;DR: A new automated facial expression analysis system that integrates Locality Sensitive Hashing with Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to improve execution efficiency of emotion classification and continuous identification of unidentified facial expressions is described.
Abstract: This paper describes a new automated facial expression analysis system that integrates Locality Sensitive Hashing (LSH) with Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to improve execution efficiency of emotion classification and continuous identification of unidentified facial expressions. Images are classified using feature-vectors on two most significant segments of face: eye segments and mouth-segment. LSH uses a family of hashing functions to map similar images in a set of collision-buckets. Taking a representative image from each cluster reduces the image space by pruning redundant similar images in the collision-buckets. The application of PCA and LDA reduces the dimension of the data-space. We describe the overall architecture and the implementation. The performance results show that the integration of LSH with PCA and LDA significantly improves computational efficiency, and improves the accuracy by reducing the frequency-bias of similar images during PCA and SVM stage. After the classification of image on database, we tag the collision-buckets with basic emotions, and apply LSH on new unidentified facial expressions to identify the emotions. This LSH based identification is suitable for fast continuous recognition of unidentified facial expressions

13 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
84% related
Feature extraction
111.8K papers, 2.1M citations
83% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202343
2022108
202188
2020110
2019104
2018139