Showing papers on "Locality-sensitive hashing published in 2006"

PDF

Open Access

Journal Article•DOI•

[...]

Ashwin Swaminathan¹, Yinian Mao¹, Min Wu¹•Institutions (1)

01 Nov 2006-IEEE Transactions on Information Forensics and Security

TL;DR: A novel algorithm for generating an image hash based on Fourier transform features and controlled randomization is developed and it is shown that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions.

...read moreread less

Abstract: Image hash functions find extensive applications in content authentication, database search, and watermarking. This paper develops a novel algorithm for generating an image hash based on Fourier transform features and controlled randomization. We formulate the robustness of image hashing as a hypothesis testing problem and evaluate the performance under various image processing operations. We show that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions. We introduce a general framework to study and evaluate the security of image hashing systems. Under this new framework, we model the hash values as random variables and quantify its uncertainty in terms of differential entropy. Using this security framework, we analyze the security of the proposed schemes and several existing representative methods for image hashing. We then examine the security versus robustness tradeoff and show that the proposed hashing methods can provide excellent security and robustness.

...read moreread less

542 citations

Book Chapter•DOI•

Less hashing, same performance: building a better bloom filter

[...]

Adam Kirsch¹, Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

11 Sep 2006

TL;DR: Only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability, leading to less computation and potentially less need for randomness in practice.

...read moreread less

Abstract: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + i h2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice.

...read moreread less

234 citations

Proceedings Article•DOI•

Entropy based nearest neighbor search in high dimensions

[...]

Rina Panigrahy¹•Institutions (1)

Stanford University¹

22 Jan 2006

TL;DR: The problem of finding the approximate nearest neighbor of a query point in the high dimensional space is studied, focusing on the Euclidean space, and it is shown that the c nearest neighbor can be computed in time and near linear space where ^{≈ 2.06/ becomes large.}

...read moreread less

Abstract: In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different - we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)|q, h) = M and g is a bound on the probability that two far-off points will hash to the same bucket, then we can find the approximate nearest neighbor in O(np) time and near linear O(n) space where p = M/log(1/g). Alternatively we can build a data structure of size O(n1/(1-p)) to answer queries in O(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time O(np) and near linear space where p a 2.06/c as c becomes large.

...read moreread less

225 citations

Proceedings Article•DOI•

Efficient reverse k-nearest neighbor search in arbitrary metric spaces

[...]

Elke Achtert¹, Christian Bohm¹, Peer Kröger¹, Peter Kunath¹, Alexey Pryakhin¹, Matthias Renz¹ - Show less +2 more•Institutions (1)

Ludwig Maximilian University of Munich¹

27 Jun 2006

TL;DR: This paper proposes the first approach for efficient RkNN search in arbitrary metric spaces where the value of k is specified at query time and uses the advantages of existing metric index structures but proposes to use conservative and progressive distance approximations in order to filter out true drops and true hits.

...read moreread less

Abstract: The reverse k-nearest neighbor (RkNN) problem, i.e. finding all objects in a data set the k-nearest neighbors of which include a specified query object, is a generalization of the reverse 1-nearest neighbor problem which has received increasing attention recently. Many industrial and scientific applications call for solutions of the RkNN problem in arbitrary metric spaces where the data objects are not Euclidean and only a metric distance function is given for specifying object similarity. Usually, these applications need a solution for the generalized problem where the value of k is not known in advance and may change from query to query. However, existing approaches, except one, are designed for the specific R1NN problem. In addition - to the best of our knowledge - all previously proposed methods, especially the one for generalized RkNN search, are only applicable to Euclidean vector data but not for general metric objects. In this paper, we propose the first approach for efficient RkNN search in arbitrary metric spaces where the value of k is specified at query time. Our approach uses the advantages of existing metric index structures but proposes to use conservative and progressive distance approximations in order to filter out true drops and true hits. In particular, we approximate the k-nearest neighbor distance for each data object by upper and lower bounds using two functions of only two parameters each. Thus, our method does not generate any considerable storage overhead. We show in a broad experimental evaluation on real-world data the scalability and the usability of our novel approach.

...read moreread less

137 citations

Journal Article•DOI•

A clustering based approach to perceptual image hashing

[...]

Vishal Monga¹, Arindam Banerjee¹, Brian L. Evans¹•Institutions (1)

University of Texas at Austin¹

01 Nov 2006-IEEE Transactions on Information Forensics and Security

TL;DR: This paper proposes a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion and proves that the decision version of the clustering problem is NP complete.

...read moreread less

Abstract: A perceptual image hash function maps an image to a short binary string based on an image's appearance to the human eye. Perceptual image hashing is useful in image databases, watermarking, and authentication. In this paper, we decouple image hashing into feature extraction (intermediate hash) followed by data clustering (final hash). For any perceptually significant feature extractor, we propose a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion. We prove that the decision version of our clustering problem is NP complete. Based on the proposed algorithm, we develop two variations to facilitate perceptual robustness versus fragility tradeoffs. We validate the perceptual significance of our hash by testing under Stirmark attacks. Finally, we develop randomized clustering algorithms for the purposes of secure image hashing.

...read moreread less

123 citations

Proceedings Article•DOI•

SAXually Explicit Images: Finding Unusual Shapes

[...]

Li Wei¹, Eamonn Keogh¹, Xiaopeng Xi¹•Institutions (1)

University of California, Riverside¹

18 Dec 2006

TL;DR: This work introduces the new problem of finding shape discords, the most unusual shapes in a collection, by using locality-sensitive hashing to estimate similarity between shapes which enables us to reorder the search more efficiently.

...read moreread less

Abstract: Over the past three decades, there has been a great deal of research on shape analysis, focusing mostly on shape indexing, clustering, and classification. In this work, we introduce the new problem of finding shape discords, the most unusual shapes in a collection. We motivate the problem by considering the utility of shape discords in diverse domains including zoology, anthropology, and medicine. While the brute force search algorithm has quadratic time complexity, we avoid this by using locality-sensitive hashing to estimate similarity between shapes which enables us to reorder the search more efficiently. An extensive experimental evaluation demonstrates that our approach can speed up computation by three to four orders of magnitude.

...read moreread less

112 citations

Journal Article•DOI•

Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation

[...]

Bogdan Matei¹, Ying Shan¹, Harpreet Sawhney¹, Yi Tan¹, Rakesh Kumar¹, Daniel Huber, Martial Hebert - Show less +3 more•Institutions (1)

Sarnoff Corporation¹

01 Jul 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new method for rapid 3D object indexing that combines feature-based methods with coarse alignment-based matching techniques is proposed, achieving a sublinear complexity on the number of models and maintaining at the same time a high degree of performance for real 3D sensed data that is acquired in largely uncontrolled settings.

...read moreread less

Abstract: We propose a new method for rapid 3D object indexing that combines feature-based methods with coarse alignment-based matching techniques. Our approach achieves a sublinear complexity on the number of models, maintaining at the same time a high degree of performance for real 3D sensed data that is acquired in largely uncontrolled settings. The key component of our method is to first index surface descriptors computed at salient locations from the scene into the whole model database using the locality sensitive hashing (LSH), a probabilistic approximate nearest neighbor method. Progressively complex geometric constraints are subsequently enforced to further prune the initial candidates and eliminate false correspondences due to inaccuracies in the surface descriptors and the errors of the LSH algorithm. The indexed models are selected based on the MAP rule using posterior probability of the models estimated in the joint 3D-signature space. Experiments with real 3D data employing a large database of vehicles, most of them very similar in shape, containing 1,000,000 features from more than 365 models demonstrate a high degree of performance in the presence of occlusion and obscuration, unmodeled vehicle interiors and part articulations, with an average processing time between 50 and 100 seconds per query

...read moreread less

103 citations

Locality-Sensitive Hashing Using Stable Distributions

[...]

Gregory Shakhnarovich, Trevor Darrell, Piotr Indyk

01 Jan 2006

TL;DR: This chapter contains sections titled: The Locality-Sensitive Hashing Scheme Based on s-Stable Distributions, Approximate Near neighbor, Exact Near Neighbor, LSH in Practice: E2LSH, Experimental Results.

...read moreread less

Abstract: This chapter contains sections titled: The Locality-Sensitive Hashing Scheme Based on s-Stable Distributions, Approximate Near Neighbor, Exact Near Neighbor, LSH in Practice: E2LSH, Experimental Results

...read moreread less

68 citations

Journal Article•DOI•

Improving nearest neighbor classification with cam weighted distance

[...]

Changyin Zhou¹, Yan Qiu Chen¹•Institutions (1)

Fudan University¹

01 Apr 2006-Pattern Recognition

TL;DR: The proposed cam weighted distance is orientation and scale adaptive to take advantage of the relevant information of inter-prototype relationship, so that a better classification performance can be achieved.

...read moreread less

63 citations

Proceedings Article•

Song Intersection by Approximate Nearest Neighbor Search.

[...]

Michael A. Casey¹, Malcolm Slaney•Institutions (1)

Goldsmiths, University of London¹

01 Jan 2006

TL;DR: To scale the search to large song databases, an algorithm based on localitysensitive hashing (LSH) of sequences of audio features called audio shingles provides an efficient means to identify approximate nearest neighbors in a high-dimensional feature space.

...read moreread less

Abstract: We present new methods for computing inter-song similarities using intersections between multiple audio pieces. The intersection contains portions that are similar, when one song is a derivative work of the other for example, in two different musical recordings. To scale our search to large song databases we have developed an algorithm based on localitysensitive hashing (LSH) of sequences of audio features called audio shingles. LSH provides an efficient means to identify approximate nearest neighbors in a high-dimensional feature space. We combine these nearest neighbor estimates, each a match from a very large database of audio to a small portion of the query song, to form a measure of the approximate similarity. We demonstrate the utility of our methods on a derivative works retrieval experiment using both exact and approximate (LSH) methods. The results show that LSH is at least an order of magnitude faster than the exact nearest neighbor method and that accuracy is not impacted by the approximate method.

...read moreread less

62 citations

Proceedings Article•DOI•

Lower bounds on locality sensitive hashing

[...]

Rajeev Motwani¹, Assaf Naor², Rina Panigrahi¹•Institutions (2)

Stanford University¹, Microsoft²

05 Jun 2006

TL;DR: In this paper, it was shown that for X = l 1 it is impossible to achieve ⊇ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ⊆ ≤ 1 /c.

...read moreread less

Abstract: Given a metric space (X,dX), c≥1, r>0, and p,q ≡ [0,1], a distribution over mappings H : X → N is called a (r,cr,p,q)-sensitive hash family if any two points in X at distance at most r are mapped by H to the same value with probability at least p, and any two points at distance greater than cr are mapped by H to the same value with probability at most q. This notion was introduced by Indyk and Motwani in 1998 as the basis for an efficient approximate nearest neighbor search algorithm, and has since been used extensively for this purpose. The performance of these algorithms is governed by the parameter ⊇=log(1/p)/log(1/q), and constructing hash families with small ⊇ automatically yields improved nearest neighbor algorithms. Here we show that for X=l1 it is impossible to achieve ⊇ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ⊇ ≤ 1/c.

...read moreread less

Proceedings Article•DOI•

Exploiting asymmetry in hierarchical topic extraction

[...]

Sreenivas Gollapudi¹, Rina Panigrahy²•Institutions (2)

Microsoft¹, Stanford University²

06 Nov 2006

TL;DR: Efficient algorithms using the technique of Locality-Sensitive Hashing (LSH) to extract topics from a document collection based on the asymmetric relationships between terms in a collection are presented.

...read moreread less

Abstract: Topic or feature extraction is often used as an important step in document classification and text mining. Topics are succinct representation of content in a document collection and hence are very effective when used as content identifiers in peer-to-peer systems and other large scale distributed content management systems. Effective topic extraction is dependent on the accuracy of term clustering that often has to deal with problems like synonymy and polysemy. Retrieval techniques based on spectral analysis like Latent Semantic Indexing (LSI) are often used to effectively solve these problems. Most of the spectral retrieval schemes produce term similarity measures that are symmetric and often, not an accurate characterization of term relationships. Another drawback of LSI is its running time that is polynomial in the dimensions of the m x n matrix, A. This can get prohibitively large for some IR applications. In this paper, we present efficient algorithms using the technique of Locality-Sensitive Hashing (LSH) to extract topics from a document collection based on the asymmetric relationships between terms in a collection. The relationship is characterized by the term co-occurrences and other higher-order similarity measures. Our LSH based scheme can be viewed as a simple alternative to LSI. We show the efficacy of our algorithms via experiments on a set of large documents. An interesting feature of our algorithms is that it produces a natural hierarchical decomposition of the topic space instead of a flat clustering.

...read moreread less

Journal Article•DOI•

Scalable partitioning and exploration of chemical spaces using geometric hashing.

[...]

Debojyoti Dutta¹, Rajarshi Guha¹, Peter C. Jurs¹, Ting Chen¹•Institutions (1)

University of Southern California¹

01 Jan 2006-Journal of Chemical Information and Modeling

TL;DR: A data mining framework built on top of a recently developed fast approximate nearest-neighbor-finding algorithm called locality-sensitive hashing (LSH) that can be used to mine huge chemical spaces in a scalable fashion using very modest computational resources is introduced.

...read moreread less

Abstract: Virtual screening (VS) has become a preferred tool to augment high-throughput screening1 and determine new leads in the drug discovery process. The core of a VS informatics pipeline includes several data mining algorithms that work on huge databases of chemical compounds containing millions of molecular structures and their associated data. Thus, scaling traditional applications such as classification, partitioning, and outlier detection for huge chemical data sets without a significant loss in accuracy is very important. In this paper, we introduce a data mining framework built on top of a recently developed fast approximate nearest-neighbor-finding algorithm2 called locality-sensitive hashing (LSH) that can be used to mine huge chemical spaces in a scalable fashion using very modest computational resources. The core LSH algorithm hashes chemical descriptors so that points close to each other in the descriptor space are also close to each other in the hashed space. Using this data structure, one can perf...

...read moreread less

Proceedings Article•DOI•

Robustness and security of a wavelet-based CBIR hashing algorithm

[...]

Albert Meixner¹, Andreas Uhl²•Institutions (2)

Duke University¹, University of Salzburg²

26 Sep 2006

TL;DR: A wavelet-based robust database hashing scheme is analyzed with respect to its resilience against image modifications and hostile attacks and a method to construct a forgery is presented.

...read moreread less

Abstract: A wavelet-based robust database hashing scheme is analyzed with respect to its resilience against image modifications and hostile attacks. A method to construct a forgery is presented and possible countermeasures are discussed.

...read moreread less

Proceedings Article•DOI•

Approximate reverse k-nearest neighbor queries in general metric spaces

[...]

Elke Achtert¹, Christian Bohm¹, Peer Kröger¹, Peter Kunath¹, Alexey Pryakhin¹, Matthias Renz¹ - Show less +2 more•Institutions (1)

Ludwig Maximilian University of Munich¹

06 Nov 2006

TL;DR: This paper proposes an approach for efficient approximative RkNN search in arbitrary metric spaces where the value of k is specified at query time by using an approximation of the nearest-neighbor-distances in order to prune the search space.

...read moreread less

Abstract: In this paper, we propose an approach for efficient approximative RkNN search in arbitrary metric spaces where the value of k is specified at query time. Our method uses an approximation of the nearest-neighbor-distances in order to prune the search space. In several experiments, our solution scales significantly better than existing non-approximative approaches while producing an approximation of the true query result with a high recall.

...read moreread less

Proceedings Article•DOI•

Novel Adaptive Nearest Neighbor Classifiers Based On Hit-Distance

[...]

Zhen Lou¹, Zhong Jin²•Institutions (2)

Nanjing University of Science and Technology¹, Autonomous University of Barcelona²

20 Aug 2006

TL;DR: Novel adaptive nearest neighbor classifiers based on hit-distance were proposed to generalize the representational capacity of available prototypes and it was shown that the proposed classifiers performed much better than the classical nearest neighbour classifier (NN) and the nearest feature line method (NFL).

...read moreread less

Abstract: In this paper, a novel idea of distance, Hit-Distance, was firstly introduced to generalize the representational capacity of available prototypes. Novel adaptive nearest neighbor classifiers based on Hit-Distance were then proposed. Experiments were performed on 8 benchmark datasets from the UCI Machine Learning Repository. It was shown that the proposed classifiers performed much better than the classical nearest neighbor classifier (NN) and the nearest feature line method (NFL), the nearest feature plane method (NFP), the nearest neighbor line method (NNL) and the nearest neighbor plane method (NNP).

...read moreread less

Proceedings Article•DOI•

Adaptive Hashing for IP Address Lookup in Computer Networks

[...]

Christopher J. Martinez¹, Wei-Ming Lin¹•Institutions (1)

University of Texas at San Antonio¹

01 Sep 2006

TL;DR: The proposed technique clearly outperforms all known regular hashing algorithms by a significant margin and follows an ad hoc design that is critical to adapting to real-time situation when there exists a changing database with an irregular non-uniform distribution.

...read moreread less

Abstract: For applications that rely on large databases as the core data structure, the need for a fast search process is essential. Hashing algorithms have widely been adopted as the search algorithm of choice for fast lookups. Hashing algorithms involve the creation of hash values from the target database entries. A hashing algorithm that transforms the database to hash values with a distribution as uniform as possible would lead to a better search performance. When a database is already value-wise uniformly distributed, any regular hashing algorithm, such as bit-extraction, group-XOR, etc., will lead to a statistically perfect hashing result. In almost all known practical applications, the target database rarely demonstrates uniformly distributed characteristic. The use of any known regular hashing algorithm can lead to a performance far less than desirable. This paper aims at designing a hashing algorithm that can deliver a better performance for all practical databases. An analytical preprocess is performed on the original database to extract critical information that would significantly benefit the design of a better hashing algorithm. The process includes sorting database hash bits to provide a priority that would facilitate the decision-making on which bits and how these bits should be combined to generate better hash values. The algorithm follows an ad hoc design that is critical to adapting to real-time situation when there exists a changing database with an irregular non-uniform distribution. The proposed technique clearly outperforms all known regular hashing algorithms by a significant margin.

...read moreread less

Object Recognition using Locality Sensitive Hashing of Shape Contexts

[...]

Gregory Shakhnarovich, Trevor Darrell, Piotr Indyk

01 Jan 2006

Journal Article•

Image Hashing Based on Human Visual System

[...]

Zhang Xin-peng¹•Institutions (1)

Shanghai University¹

01 Jan 2006-Journal of Image and Graphics

TL;DR: An HVS-based image hashing method incorporating a Watson's sensitivity matrix is proposed, which has better robustness against JPEG compression and low-pass filtering.

...read moreread less

Abstract: An HVS-based image hashing method incorporating a Watson's sensitivity matrix is proposed.The transform domain matrix composed of 8-by-8 block DCT coefficients of the image are multiplied by N matrices that are pseudo-randomly generated with a key,and divided by the periodically extended Watson matrix.By quantization,an N-bit image hash is obtained.Compared to some other hashing methods,the HVS-based hash has better robustness against JPEG compression and low-pass filtering.Since a key is used in the algorithm,the hash is hard to be forged.

...read moreread less

Song Intersection by Approximate Nearest Neighbour Retrieval

[...]

Michael A. Casey, Malcolm Slaney

01 Oct 2006

TL;DR: The results show that LSH is at least an order of magnitude faster than the exact nearest neighbor method and that accuracy is not impacted by the approximate method.

...read moreread less

Abstract: We present new methods for computing inter-song similarities using intersections between multiple audio pieces. The intersection contains portions that are similar, when one song is a derivative work of the other for example, in two different musical recordings. To scale our search to large song databaseswe have developed an algorithmbased on localitysensitive hashing (LSH) of sequences of audio features called audio shingles. LSH provides an efficient means to identify approximate nearest neighbors in a high-dimensional feature space. We combine these nearest neighbor estimates, each a match from a very large database of audio to a small portion of the query song, to form a measure of the approximate similarity. We demonstrate the utility of our methods on a derivative works retrieval experiment using both exact and approximate (LSH) methods. The results show that LSH is at least an order of magnitude faster than the exact nearest neighbor method and that accuracy is not impacted by the approximate method.

...read moreread less

Book Chapter•DOI•

Adaptive nearest neighbor classifier based on supervised ellipsoid clustering

[...]

Guo-Jun Zhang¹, Ji-Xiang Du¹, De-Shuang Huang¹, Tat-Ming Lok², Michael R. Lyu² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, The Chinese University of Hong Kong²

24 Sep 2006

TL;DR: A locally adaptive nearest neighbor classification method based on supervised learning style which works well for the multi-classification problems is proposed, and the ellipsoid clustering learning is applied to estimate an effective metric which is used in the K-NN classification.

...read moreread less

Abstract: Nearest neighbor classifier is a widely-used effective method for multi-class problems. However, it suffers from the problem of the curse of dimensionality in high dimensional space. To solve this problem, many adaptive nearest neighbor classifiers were proposed. In this paper, a locally adaptive nearest neighbor classification method based on supervised learning style which works well for the multi-classification problems is proposed. In this method, the ellipsoid clustering learning is applied to estimate an effective metric. This metric is then used in the K-NN classification. Finally, the experimental results show that it is an efficient and robust approach for multi-classification.

...read moreread less

Book•

Hashing, searching, sketching

[...]

Rajeev Motwani¹, Rina Panigrahy¹•Institutions (1)

Stanford University¹

01 Jan 2006

TL;DR: This thesis studies algorithms for different kinds of search using hashing and sketching, and some fundamental limits of what can be realized using some of these approaches are studied.

...read moreread less

Abstract: The Information Age has enabled the search for information in ways never imagined before The simplest search function may be an exact search where the input query is expected to exactly match the search object But some search criteria are fuzzy---for instance image search, news search, and similar document search---making the search problem much harder One common approach is to convert such a search object into a mathematical representation such as a point (vector) in a high dimensional space The search for a similar object then becomes a nearest neighbor search in a high dimensional space Hashing is a simple and effective method for exact search that uses a random hash function to map items into buckets, often viewed as throwing balls into bins A variant of hashing called locality-sensitive hashing that tends to map similar objects to the same hash bucket, can be used to perform nearest neighbor search A related notion is sketching that is used to transform a large complex object into a small sketch---often a tiny bitmap---so that similarity between the sketches can be used to estimate the similarity between the original objects In this thesis we study algorithms for different kinds of search using hashing and sketching, and some fundamental limits of what can be realized using some of these approaches For exact search, we will see how variants of balls-and-bins processes can be used to derive space efficient methods for maintaining hash tables For similarity search, we will see a variant of locality-sensitive hashing that uses linear space and how the underlying ideas can be used in the kd-tree data structure for improved performance We will also probe the fundamental limits of some of these approaches by showing lower bounds on their performance

...read moreread less

Experimental Investigation of Relation Between Near Neighbor Search Methods for Feature Vectors and Efficiency of Object Recognition

[...]

Kazuto Noguchi, Tomohiro Nakai, Koichi Kise, Masakazu Iwamura

08 Sep 2006

TL;DR: Two methods are proposed: one is to eliminate feature vectors that require a number of distance calculations, and the other is to use no distance calculation, which is two to three times efficient.

...read moreread less

Abstract: Efficiency of object recognition methods using local descriptors such as SIFT and PCA-SIFT depends largely on the speed of matching between feature vectors since images are described by a large number of feature vectors. Because the matching is considered to be “nearest neighbor (NN) search” of feature vectors, the problem is paraphrased by “how to make the NN search efficient”. For the object recognition, it is required that the number of incorrect matching does not exceed that of correct matching. In other words, a certain number of incorrect matching is acceptable. This observation allows us to make NN search more efficient using approximate NN search with reduced distance calculation. For this purpose, we propose two methods: one is to eliminate feature vectors that require a number of distance calculations. The other is to use no distance calculation. From experimental results with 10,000 database images and 2,000 query images, it is shown that the proposed method is two to three times efficient as compared to a method using ANN and can achieve, recognition rate of 98% with 8.3 ms/query.

...read moreread less

[...]

Kai Li¹, Qin Lv¹•Institutions (1)

Princeton University¹

01 Jan 2006

TL;DR: A sketch construction algorithm is proposed, such that the weighted (and thresholded) e1 distance between two feature vectors can be estimated by the Hamming distance of their sketches, which can typically reduce the space requirement by an order of magnitude with minimal impact on similarity search quality.

...read moreread less

Abstract: Content-based image similarity search is a difficult problem due to the high dimensionality and usually massive amount of image data. The main challenge is to achieve high-quality similarity search with high speed and low space usage. This thesis proposes several techniques to address the problem of building a similarity search system for large-scale image datasets. A prototype image search system, called CASS-Image (Content-Aware Search System for Images), has been implemented to demonstrate the effectiveness of these techniques. The first contribution of this thesis is a sketch construction algorithm that converts high-dimensional feature vectors into bit vectors (sketches), such that the weighted (and thresholded) e1 distance between two feature vectors can be estimated by the Hamming distance of their sketches. Experimental results show that using sketches can typically reduce the space requirement by an order of magnitude with minimal impact on similarity search quality. The second is a hash-perturbation based LSH (Locality Sensitive Hashing) technique for approximate nearest neighbor search in high dimensions. This technique probes multiple buckets in each hash table by perturbing the hashed value of the query object. Performance evaluations show that this method is both time and space efficient. It has a similar time efficiency as the basic LSH method while reducing the space requirement by a factor of five. Also, its time efficiency is twice that of the point-perturbation based LSH method. The third is a multi-feature filtering algorithm for region-based image similarity search. This method uses approximation algorithms to generate a candidate set, and then ranks the objects in the candidate set with a more sophisticated multi-feature distance measure. It works for both feature vectors and their sketches. It can also be combined with indexing techniques to further speed up the search process. Performance evaluations show that filtering is 4--13 times faster than the brute-force approach, while still maintaining good search quality. This thesis also proposes a new region-based image similarity measure, EMD* match, which uses square-root region weights and region distance thresholding. Experimental results show that EMD* match is 27%--91% more effective than previous image similarity search techniques.

...read moreread less

Book Chapter•DOI•

Hypersphere indexer

[...]

Navneet Panda¹, Edward Y. Chang¹, Arun Qamra¹•Institutions (1)

University of California, Santa Barbara¹

04 Sep 2006

TL;DR: In this article, a hypersphere indexer, named Hydex, is proposed to perform approximate nearest-neighbor search in high-dimensional data, where the data space is partitioned using concentric hyperspheres.

...read moreread less

Abstract: Indexing high-dimensional data for efficient nearest-neighbor searches poses interesting research challenges. It is well known that when data dimension is high, the search time can exceed the time required for performing a linear scan on the entire dataset. To alleviate this dimensionality curse, indexing schemes such as locality sensitive hashing (LSH) and M-trees were proposed to perform approximate searches. In this paper, we propose a hypersphere indexer, named Hydex, to perform such searches. Hydex partitions the data space using concentric hyperspheres. By exploiting geometric properties, Hydex can perform effective pruning. Our empirical study shows that Hydex enjoys three advantages over competing schemes for achieving the same level of search accuracy. First, Hydex requires fewer seek operations. Second, Hydex can maintain sequential disk accesses most of the time. And third, it requires fewer distance computations.

...read moreread less

Higher-dimensional Nearest Neighbor Search by Distributed Coding

[...]

Takao Kobayashi, Masaki Nakagawa

01 Jan 2006

TL;DR: The results show that the proposed method exceeds a popular approximate nearest neighbor library, "ANN" in search time and accuracy in the case of higher-dimension and a large number of prototypes.

...read moreread less

Abstract: In this paper we propose a fast approximate nearest neighbor search algorithm in a high dimensional spherical space using an idea called "distributed coding" which is to represent a vector by a set of many vectors and encode them efficiently. We implemented the algorithm and tested it with synthetic data. The results show that the proposed method exceeds a popular approximate nearest neighbor library, "ANN" in search time and accuracy in the case of higher-dimension and a large number of prototypes. Keyword Approximate Nearest Neighbor, Distributed Coding, k-d tree, Locality Sensitive Hashing 1. はじめに

...read moreread less

Journal Article•

Study on dimensionality curse in the nearest neighbor queries based on statistics

[...]

Shukui Bo¹, Shengyang Li, Chongguang Zhu•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2006-Computer Engineering

TL;DR: It is shown that under some conditions, as dimensionality increases, the distances between query point and data points approach to each other, so the “nearest neighbor” is becoming meaningless.

...read moreread less

Abstract: This paper explores the effect of dimensionality on the “nearest neighbor” problem.Based on statistics,it shows that under some conditions,as dimensionality increases,the distances between query point and data points approach to each other.So the “nearest neighbor” is becoming meaningless.The way of how to evaluate the dimensionality effect is presented.From two distributions of statistics about distance,the effect of dimensionality on the “nearest neighbor” problem is evaluated.Empirical result is presented to demonstrate the two distributions.

...read moreread less

Book Chapter•DOI•

Improving Evolutionary Algorithms with Scouting: High---Dimensional Problems

[...]

Konstantinos Bousmalis¹, Jeffrey O. Pfaffmann², Gillian Hayes¹•Institutions (2)

University of Edinburgh¹, Lafayette College²

22 Jun 2006

TL;DR: It is shown that an SEA significantly improves the equivalent simple EA configuration with higher---dimensional problems in an expeditious manner.

...read moreread less

Abstract: Evolutionary Algorithms (EAs) are common optimization techniques based on the concept of Darwinian evolution. During the search for the global optimum of a search space, a traditional EA will often become trapped in a local optimum. The Scouting-Inspired Evolutionary Algorithms (SEAs) are a recently---introduced family of EAs that use a cross---generational memory mechanism to overcome this problem and discover solutions of higher fitness. The merit of the SEAs has been established in previous work with a number of two and three-dimensional test cases and a variety of configurations. In this paper, we will present two approaches to using SEAs to solve high---dimensional problems. The first one involves the use of Locality Sensitive Hashing (LSH) for the repository of individuals, whereas the second approach entails the use of scouting---driven mutation at a certain rate, the Scouting Rate. We will show that an SEA significantly improves the equivalent simple EA configuration with higher---dimensional problems in an expeditious manner.

...read moreread less

Journal Article•

Practical Hashing Function for URLs Set

[...]

Dai Yafei¹•Institutions (1)

Peking University¹

01 Jan 2006-Journal of Chinese Computer Systems

TL;DR: A hashing function for large scale URLs set is proposed and it has better uniformity and stability than the other two (HfIp and hf) through three experiments of the large scale and is recommended used in the applications of needing to hash URLs.

...read moreread less

Abstract: URL hashing is found many applications in Web research We propose a hashing function for large scale URLs set and find it has better uniformity and stability than the other two (HfIp and hf) through three experiments of the large scale It is a variation of the well-known function (ELFhash) and is recommended used in the applications of needing to hash URLs Moreover, it has low time cost and almost performance compared with MD5 and SHA-1 so that we think it is more practical than the other Finally, some future works are given

...read moreread less

Proceedings Article•DOI•

Intelligent Sub-Object Image Retrieval System

[...]

Wenhui Li¹, Dongfeng Han¹, Yi Wang¹, Xiaosuo Lu¹•Institutions (1)

Jilin University¹

23 Oct 2006

TL;DR: A sub-object retrieval system based on a segmentation method that utilizes the segmentation results to capture the higher-level concept of images and gets a stable and accurate result.

...read moreread less

Abstract: This paper describes a sub-object retrieval system based on a segmentation method. We also use dynamic partial function (DPF) and indexing by locality sensitive hashing (LSH) for improving system performance. Such a system is useful for finding a sub-object from a large image database. In order to obtain the sub-object from a sample image, we use a segmentation method to cut out the object. The system utilizes the segmentation results to capture the higher-level concept of images and gets a stable and accurate result. Experimental and comparison results, which are performed using a general purpose database containing 20,000 images, are encouraging.

...read moreread less