Showing papers on "Locality-sensitive hashing published in 2004"

PDF

Open Access

Proceedings Article•DOI•

Locality-sensitive hashing scheme based on p-stable distributions

[...]

Mayur Datar¹, Nicole Immorlica², Piotr Indyk², Vahab Mirrokni²•Institutions (2)

Stanford University¹, Massachusetts Institute of Technology²

08 Jun 2004

TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.

...read moreread less

Abstract: We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the lp norm. It also yields the first known provably efficient approximate NN algorithm for the case p

...read moreread less

3,109 citations

Proceedings Article•

An Investigation of Practical Approximate Nearest Neighbor Algorithms

[...]

Ting Liu¹, Andrew W. Moore¹, Ke Yang¹, Alexander G. Gray¹•Institutions (1)

Carnegie Mellon University¹

01 Dec 2004

TL;DR: This paper asks the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how and why and introduces a new kind of metric tree that allows overlap.

...read moreread less

Abstract: This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate k-NN search algorithms on this structure. We show why these structures should be able to exploit the same random-projection-based approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31-fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels.

...read moreread less

487 citations

Book Chapter•DOI•

Nearest neighbors in high-dimensional spaces

[...]

Piotr Indyk

13 Apr 2004

261 citations

Journal Article•DOI•

Locally nearest neighbor classifiers for pattern classification

[...]

Wenming Zheng¹, Li Zhao¹, Cairong Zou¹•Institutions (1)

Southeast University¹

01 Jun 2004-Pattern Recognition

TL;DR: Two novel classifiers based on locally nearest neighborhood rule, called nearest neighbor line and nearest neighbor plane, are presented for pattern classification, which take much lower computation cost and achieve competitive performance.

...read moreread less

116 citations

Proceedings Article•DOI•

Robust mesh-based hashing for copy detection and tracing of images

[...]

Chun-Shien Lu, Chao-Yong Hsu, Shih-Wei Sun¹, Pao-Chi Chang¹•Institutions (1)

National Central University¹

30 Jun 2004

TL;DR: This paper proposes a geometry-invariant image hashing scheme, which can be employed for content copy detection and tracing and exhaustive experimental results obtained from benchmark attacks have confirmed the performance of the proposed method.

...read moreread less

Abstract: Due to the desired non-invasive property, non-data hiding (called media hashing here) is considered to be an alternative to achieve many applications previously accomplished with watermarking. Recently, media hashing techniques for content identification have been gradually emerging. However, none of them are really resistant against geometrical attacks. In this paper, our aim is to propose a geometry-invariant image hashing scheme, which can be employed for content copy detection and tracing. Our system is mainly composed of three components: (i) robust mesh extraction; (iii) mesh-based robust hash extraction; and (iii) hash matching for similarity measurement. Exhaustive experimental results obtained from benchmark attacks have confirmed the performance of the proposed method

...read moreread less

62 citations

Proceedings Article•DOI•

Hierarchical, non-uniform locality sensitive hashing and its application to video identification

[...]

Zixiang Kang¹, Wei Tsang Ooi², Qibin Sun³•Institutions (3)

Institute for Infocomm Research Singapore¹, National University of Singapore², Agency for Science, Technology and Research³

27 Jun 2004

TL;DR: Two weaknesses of Locality sensitive hashing are addressed when applied to the video identification problem, and two enhancements to LSH are proposed that improve the performance of LSH significantly in terms of efficiency and accuracy.

...read moreread less

Abstract: Searching for similar video clips in large video database, or video identification, requires finding the nearest neighbor in high-dimensional feature space. Locality sensitive hashing, or LSH, is a well-known indexing method that allows us to efficiently find approximate nearest neighbor in such space. In this paper, we address two weaknesses of LSH when applied to the video identification problem. We propose two enhancements to LSH, and show that our enhancements improve the performance of LSH significantly in terms of efficiency and accuracy

...read moreread less

30 citations

Two Effective Functions on Hashing URL

[...]

LI Xiao-Ming¹, Feng Wang-Sen•Institutions (1)

Peking University¹

01 Jan 2004

TL;DR: The finding is that the well-known function for hashing sequence of symbols, ELFhash, is not very good in this regard, and the other two functions are better and thus recommended.

...read moreread less

Abstract: Hashing large collection of URLs is an inevitable problem in many Web research activities. Through a large scale experiment, three hash functions are compared in this paper. Two metrics were developed for the comparison, which are related to web structure analysis and Web crawling, respectively. The finding is that the well-known function for hashing sequence of symbols, ELFhash, is not very good in this regard, and the other two functions are better and thus recommended.

...read moreread less

24 citations

Proceedings Article•DOI•

Geometric distortion-resilient image hashing system and its application scalability

[...]

Chao-Yong Hsu¹, Chun-Shien Lu¹•Institutions (1)

Academia Sinica¹

20 Sep 2004

TL;DR: A novel geometric distortion-invariant image hashing scheme, which can be employed to perform copy detection and content authentication of digital images, is proposed and exhaustive experimental results obtained from benchmark attacks confirm the excellent performance of the proposed method.

...read moreread less

Abstract: Media hashing is an alternative approach to many applications previously accomplished with watermarking. The major disadvantage of the existing media hashing technologies is their poor resistance to geometric attacks. In this paper, a novel geometric distortion-invariant image hashing scheme, which can be employed to perform copy detection and content authentication of digital images, is proposed. Our major contributions are threefold: (i) mesh-based robust hashing function is proposed; (ii) sophisticated hash database for error-resilient and fast matching is constructed; and (iii) the application scalability of our scheme for content copy tracing and authentication is studied. In addition, we further investigate several media hashing issues, including robustness and discrimination, error analysis, and complexity, for the proposed image hashing system. Exhaustive experimental results obtained from benchmark attacks confirm the excellent performance of the proposed method.

...read moreread less

22 citations

Proceedings Article•DOI•

Spreading the load using consistent hashing: a preliminary report

[...]

G. Swart¹•Institutions (1)

University College Cork¹

05 Jul 2004

TL;DR: This paper analyzes how well consistent hashing does at evenly distributing objects among the nodes in the system and extends current consistent hashing algorithms to allow for dynamic load balancing while retaining the good properties of consistent hashing.

...read moreread less

Abstract: Consistent hashing can be used to assign objects to nodes in a distributed system. It has been used by several distributed systems including Chord, Pastry, and Tornado because of its efficient handling of node failure and repair. In this paper we analyze how well consistent hashing does at evenly distributing objects among the nodes in the system. We also extend current consistent hashing algorithms to allow for dynamic load balancing while retaining the good properties of consistent hashing. Finally we analyze our extensions using both probabilistic analysis and simulations. The algorithms derived appear to achieve much better load balancing.

...read moreread less

18 citations

Proceedings Article•DOI•

A framework for soft hashing and its application to robust image hashing

[...]

E.P. McCarthy¹, Félix Balado¹, G.C.M. Slvestre¹, Neil Hurley¹•Institutions (1)

University College Dublin¹

24 Oct 2004

TL;DR: This work provides one possible approach to undertake the modelling of robust soft hashing, detailing the basic problems involved and shows how some prior schemes partly fit into this model.

...read moreread less

Abstract: Soft hashing, also known as robust hashing or perceptual hashing, consists of summarising multimedia data, so as to obtain a concise representation called a hash value. There has been an increasing interest in the soft hashing problem recently. Techniques implementing soft hashing intend to mirror the behaviour of cryptographic hashing, when the information to be hashed can be subject to different kinds of distortion. Many heuristic techniques for undertaking soft hashing of images and other multimedia data have been devised. Except for some attempts, a framework giving solid guidelines to solve the problem is largely lacking. We provide one possible approach to undertake the modelling of robust soft hashing, detailing the basic problems involved. We show how some prior schemes partly fit into our model.

...read moreread less

17 citations

Book Chapter•DOI•

Probabilistic Methods in State Space Analysis

[...]

Matthias Kuntz¹, Kai Lampka¹•Institutions (1)

University of Erlangen-Nuremberg¹

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: A survey of existing probabilistic state space exploration methods is given, including bitstate hashing, which was introduced in order to lower the probability of producing a wrong result, but maintaining the memory and runtime efficiency.

...read moreread less

Abstract: Several methods have been developed to validate the correctness and performance of hard- and software systems. One way to do this is to model the system and carry out a state space exploration in order to detect all possible states. In this paper, a survey of existing probabilistic state space exploration methods is given. The paper starts with a thorough review and analysis of bitstate hashing, as introduced by Holzmann. The main idea of this initial approach is the mapping of each state onto a specific bit within an array by employing a hash function. Thus a state is represented by a single bit, rather than by a full descriptor. Bitstate hashing is efficient concerning memory and runtime, but it is hampered by the non deterministic omission of states. The resulting positive probability of producing wrong results is due to the fact that the mapping of full state descriptors onto much smaller representatives is not injective. – The rest of the paper is devoted to the presentation, analysis, and comparison of improvements of bitstate hashing, which were introduced in order to lower the probability of producing a wrong result, but maintaining the memory and runtime efficiency. These improvements can be mainly grouped into two categories: The approaches of the first group, the so called multiple hashing schemes, employ multiple hash functions on either a single or on multiple arrays. The approaches of the remaining category follow the idea of hash compaction. I.e. the diverse schemes of this category store a hash value for each detected state, rather than associating a single or multiple bit positions with it, leading to persuasive reductions of the probability of error if compared to the original bitstate hashing scheme.

...read moreread less

Book Chapter•DOI•

Fast Hierarchical Clustering Algorithm Using Locality-Sensitive Hashing

[...]

Hisashi Koga¹, Tetsuo Ishibashi¹, Toshinori Watanabe¹•Institutions (1)

University of Electro-Communications¹

02 Oct 2004

TL;DR: A hierarchical clustering is a clustering method in which each point is regarded as a single cluster initially and then the clustering algorithm repeats connecting the nearest two clusters until only one cluster remains.

...read moreread less

Abstract: A hierarchical clustering is a clustering method in which each point is regarded as a single cluster initially and then the clustering algorithm repeats connecting the nearest two clusters until only one cluster remains. Because the result is presented as a dendrogram, one can easily figure out the distance and the inclusion relation between clusters.

...read moreread less

Journal Article•

Two Effective Functions on Hashing URL

[...]

Li Xiao

01 Jan 2004-Journal of Software

TL;DR: The finding is that the well-known function for hashing sequence of symbols, ELFhash, is not very good in this regard, and the other two functions are better and thus recommended.

...read moreread less

Proceedings Article•

An analysis of average search cost of the external hashing with separate chain

[...]

Ningping Sun, Ryozo Nakamura¹, Hongbing Zhu², Akio Tada³, Wenling Sun⁴ - Show less +1 more•Institutions (4)

Kumamoto University¹, Hiroshima Kokusai Gakuin University², Sojo University³, Central University, India⁴

21 Apr 2004

TL;DR: This work proposes a mathematical analysis to analyze and evaluate the performance of external hashing with separate chain for two cases and provides an approach to clarify the relationship between the insertion order of keys and position that key is located.

...read moreread less

Abstract: External hashing with separate chain algorithm is a well-known method to dealing with the collision problem when hashing technique is employed. The performance of external hashing with separate chain depends on the data structure of separate chain. We provide an approach to clarify the relationship between the insertion order of keys and position that key is located. Introducing the probability distribution of frequency of access to each individual key in the separate chain into the analysis of search cost, we propose a mathematical analysis to analyze and evaluate the performance of external hashing with separate chain for two cases. Some experimental results obtained from the proposed formulae are also presented.

...read moreread less

Proceedings Article•DOI•

An adaptive constant time hashing scheme for dynamic key set

[...]

S. Neogy¹, S. Choudhury, N. Chaki•Institutions (1)

Jadavpur University¹

21 Nov 2004

TL;DR: An adaptive hashing scheme is proposed that works on dynamic key sets and still enables keys to be searched in constant time and, if the hash functions are carefully chosen, then the space requirement of the hash structure is O(n).

...read moreread less

Abstract: Hashing is an important tool in randomized algorithms, with applications in such diverse fields including information retrieval, data mining, cryptology and parallel algorithms. However, the worst case behavior of a regular hash-based searching is O(n). Perfect hashing is a solution to this problem that offers a worst case performance of O(1) only for the static key set. In this paper we have proposed an adaptive hashing scheme that works on dynamic key sets and still enables keys to be searched in constant time. It has been further established that, if the hash functions are carefully chosen, then the space requirement of the hash structure is O(n).

...read moreread less