Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Mining repetitive clips through finding continuous paths

[...]

Junsong Yuan¹, Wei Wang², Jingjing Meng², Ying Wu¹, Dongge Li² - Show less +1 more•Institutions (2)

Northwestern University¹, Motorola²

29 Sep 2007

TL;DR: A novel method which translates repetitive clip mining to the continuous path finding problem in a matching trellis, where sequence matching can be accelerated by taking advantage of the temporal redundancies in the videos.

...read moreread less

Abstract: Automatically discovering repetitive clips from large video database is a challenging problem due to the enormous computational cost involved in exploring the huge solution space. Without any a priori knowledge of the contents, lengths and total number of the repetitive clips, we need to discover all of them in the video database. To address the large computational cost, we propose a novel method which translates repetitive clip mining to the continuous path finding problem in a matching trellis, where sequence matching can be accelerated by taking advantage of the temporal redundancies in the videos. By applying the locality sensitive hashing (LSH) for efficient similarity query and the proposed continuous path finding algorithm, our method is of only quadratic complexity of the database size. Experiments conducted on a 10.5-hour TRECVID news dataset have shown the effectiveness, which can discover repetitive clips of various lengths and contents in only 25 minutes, with features extracted off-line.

...read moreread less

13 citations

Proceedings Article•DOI•

ChainLink: Indexing Big Time Series Data For Long Subsequence Matching

[...]

Noura Alghamdi¹, Liang Zhang¹, Huayi Zhang¹, Elke A. Rundensteiner¹, Mohamed Y. Eltabakh¹ - Show less +1 more•Institutions (1)

Worcester Polytechnic Institute¹

20 Apr 2020

TL;DR: This work proposes a lightweight distributed indexing framework, called ChainLink, that supports approximate kNN queries over TB-scale time series data, and designs a novel hashing technique, called Single Pass Signature (SPS), that successfully tackles the above problem.

...read moreread less

Abstract: Scalable subsequence matching is critical for supporting analytics on big time series from mining, prediction to hypothesis testing. However, state-of-the-art subsequence matching techniques do not scale well to TB-scale datasets. Not only does index construction become prohibitively expensive, but also the query response time deteriorates quickly as the length of the query subsequence exceeds several 100s of data points. Although Locality Sensitive Hashing (LSH) has emerged as a promising solution for indexing long time series, it relies on expensive hash functions that perform multiple passes over the data and thus is impractical for big time series. In this work, we propose a lightweight distributed indexing framework, called ChainLink, that supports approximate kNN queries over TB-scale time series data. As a foundation of ChainLink, we design a novel hashing technique, called Single Pass Signature (SPS), that successfully tackles the above problem. In particular, we prove theoretically and demonstrate experimentally that the similarity proximity of the indexed subsequences is preserved by our proposed single-pass SPS scheme. Leveraging this SPS innovation, Chainlink then adopts a three-step approach for scalable index building: (1) in-place data re-organization within each partition to enable efficient record-level random access to all subsequences, (2) parallel building of hash-based local indices on top of the re-organized data using our SPS scheme for efficient search within each partition, and (3) efficient aggregation of the local indices to construct a centralized yet highly compact global index for effective pruning of irrelevant partitions during query processing. ChainLink achieves the above three steps in one single map-reduce process. Our experimental evaluation shows that ChainLink indices are compact at less than 2% of dataset size while state-of-the-art index sizes tend to be almost the same size as the dataset. Better still, ChainLink is up to 2 orders of magnitude faster in its index construction time compared to state-of-the-art techniques, while improving both the final query response time by up to 10 fold and the result accuracy by 15%.

...read moreread less

13 citations

Proceedings Article•DOI•

Topological signatures for fast mobility analysis

[...]

Abhirup Ghosh¹, Benedek Rozemberczki¹, Subramanian Ramamoorthy, Rik Sarkar¹•Institutions (1)

University of Edinburgh¹

06 Nov 2018

TL;DR: A topological signature that maps each trajectory to a relatively low dimensional Euclidean space, so that now they are amenable to standard analytic techniques and contain enough topological information to reconstruct non-self-intersecting trajectories upto homotopy type.

...read moreread less

Abstract: Analytic methods can be difficult to build and costly to train for mobility data. We show that information about the topology of the space and how mobile objects navigate the obstacles can be used to extract insights about mobility at larger distance scales. The main contribution of this paper is a topological signature that maps each trajectory to a relatively low dimensional Euclidean space, so that now they are amenable to standard analytic techniques. Data mining tasks: nearest neighbor search with locality sensitive hashing, clustering, regression, etc., work more efficiently in this signature space. We define the problem of mobility prediction at different distance scales, and show that with the signatures simple k nearest neighbor based regression perform accurate prediction. Experiments on multiple real datasets show that the framework using topological signatures is accurate on all tasks, and substantially more efficient than machine learning applied to raw data. Theoretical results show that the signatures contain enough topological information to reconstruct non-self-intersecting trajectories upto homotopy type. The construction of signatures is based on a differential form that can be generated in a distributed setting using local communication, and a signature can be locally and inexpensively updated and communicated by a mobile agent.

...read moreread less

13 citations

Book Chapter•DOI•

Probabilistic Methods in State Space Analysis

[...]

Matthias Kuntz¹, Kai Lampka¹•Institutions (1)

University of Erlangen-Nuremberg¹

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: A survey of existing probabilistic state space exploration methods is given, including bitstate hashing, which was introduced in order to lower the probability of producing a wrong result, but maintaining the memory and runtime efficiency.

...read moreread less

Abstract: Several methods have been developed to validate the correctness and performance of hard- and software systems. One way to do this is to model the system and carry out a state space exploration in order to detect all possible states. In this paper, a survey of existing probabilistic state space exploration methods is given. The paper starts with a thorough review and analysis of bitstate hashing, as introduced by Holzmann. The main idea of this initial approach is the mapping of each state onto a specific bit within an array by employing a hash function. Thus a state is represented by a single bit, rather than by a full descriptor. Bitstate hashing is efficient concerning memory and runtime, but it is hampered by the non deterministic omission of states. The resulting positive probability of producing wrong results is due to the fact that the mapping of full state descriptors onto much smaller representatives is not injective. – The rest of the paper is devoted to the presentation, analysis, and comparison of improvements of bitstate hashing, which were introduced in order to lower the probability of producing a wrong result, but maintaining the memory and runtime efficiency. These improvements can be mainly grouped into two categories: The approaches of the first group, the so called multiple hashing schemes, employ multiple hash functions on either a single or on multiple arrays. The approaches of the remaining category follow the idea of hash compaction. I.e. the diverse schemes of this category store a hash value for each detected state, rather than associating a single or multiple bit positions with it, leading to persuasive reductions of the probability of error if compared to the original bitstate hashing scheme.

...read moreread less

13 citations

Proceedings Article•DOI•

An integrated approach for efficient analysis of facial expressions

[...]

Mehdi Ghayoumi¹, Arvind K. Bansal¹•Institutions (1)

Kent State University¹

28 Aug 2014

TL;DR: A new automated facial expression analysis system that integrates Locality Sensitive Hashing with Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to improve execution efficiency of emotion classification and continuous identification of unidentified facial expressions is described.

...read moreread less

Abstract: This paper describes a new automated facial expression analysis system that integrates Locality Sensitive Hashing (LSH) with Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to improve execution efficiency of emotion classification and continuous identification of unidentified facial expressions. Images are classified using feature-vectors on two most significant segments of face: eye segments and mouth-segment. LSH uses a family of hashing functions to map similar images in a set of collision-buckets. Taking a representative image from each cluster reduces the image space by pruning redundant similar images in the collision-buckets. The application of PCA and LDA reduces the dimension of the data-space. We describe the overall architecture and the implementation. The performance results show that the integration of LSH with PCA and LDA significantly improves computational efficiency, and improves the accuracy by reducing the frequency-bias of similar images during PCA and SVM stage. After the classification of image on database, we tag the collision-buckets with basic emotions, and apply LSH on new unidentified facial expressions to identify the emotions. This LSH based identification is suitable for fast continuous recognition of unidentified facial expressions

...read moreread less

13 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics