scispace - formally typeset
Search or ask a question

Showing papers on "Locality-sensitive hashing published in 2006"


Journal ArticleDOI
TL;DR: A novel algorithm for generating an image hash based on Fourier transform features and controlled randomization is developed and it is shown that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions.
Abstract: Image hash functions find extensive applications in content authentication, database search, and watermarking. This paper develops a novel algorithm for generating an image hash based on Fourier transform features and controlled randomization. We formulate the robustness of image hashing as a hypothesis testing problem and evaluate the performance under various image processing operations. We show that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions. We introduce a general framework to study and evaluate the security of image hashing systems. Under this new framework, we model the hash values as random variables and quantify its uncertainty in terms of differential entropy. Using this security framework, we analyze the security of the proposed schemes and several existing representative methods for image hashing. We then examine the security versus robustness tradeoff and show that the proposed hashing methods can provide excellent security and robustness.

542 citations


Book ChapterDOI
11 Sep 2006
TL;DR: Only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability, leading to less computation and potentially less need for randomness in practice.
Abstract: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + i h2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice.

234 citations


Proceedings ArticleDOI
22 Jan 2006
TL;DR: The problem of finding the approximate nearest neighbor of a query point in the high dimensional space is studied, focusing on the Euclidean space, and it is shown that the c nearest neighbor can be computed in time and near linear space where ≈ 2.06/ becomes large.
Abstract: In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different - we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)|q, h) = M and g is a bound on the probability that two far-off points will hash to the same bucket, then we can find the approximate nearest neighbor in O(np) time and near linear O(n) space where p = M/log(1/g). Alternatively we can build a data structure of size O(n1/(1-p)) to answer queries in O(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time O(np) and near linear space where p a 2.06/c as c becomes large.

225 citations


Proceedings ArticleDOI
27 Jun 2006
TL;DR: This paper proposes the first approach for efficient RkNN search in arbitrary metric spaces where the value of k is specified at query time and uses the advantages of existing metric index structures but proposes to use conservative and progressive distance approximations in order to filter out true drops and true hits.
Abstract: The reverse k-nearest neighbor (RkNN) problem, i.e. finding all objects in a data set the k-nearest neighbors of which include a specified query object, is a generalization of the reverse 1-nearest neighbor problem which has received increasing attention recently. Many industrial and scientific applications call for solutions of the RkNN problem in arbitrary metric spaces where the data objects are not Euclidean and only a metric distance function is given for specifying object similarity. Usually, these applications need a solution for the generalized problem where the value of k is not known in advance and may change from query to query. However, existing approaches, except one, are designed for the specific R1NN problem. In addition - to the best of our knowledge - all previously proposed methods, especially the one for generalized RkNN search, are only applicable to Euclidean vector data but not for general metric objects. In this paper, we propose the first approach for efficient RkNN search in arbitrary metric spaces where the value of k is specified at query time. Our approach uses the advantages of existing metric index structures but proposes to use conservative and progressive distance approximations in order to filter out true drops and true hits. In particular, we approximate the k-nearest neighbor distance for each data object by upper and lower bounds using two functions of only two parameters each. Thus, our method does not generate any considerable storage overhead. We show in a broad experimental evaluation on real-world data the scalability and the usability of our novel approach.

137 citations


Journal ArticleDOI
TL;DR: This paper proposes a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion and proves that the decision version of the clustering problem is NP complete.
Abstract: A perceptual image hash function maps an image to a short binary string based on an image's appearance to the human eye. Perceptual image hashing is useful in image databases, watermarking, and authentication. In this paper, we decouple image hashing into feature extraction (intermediate hash) followed by data clustering (final hash). For any perceptually significant feature extractor, we propose a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion. We prove that the decision version of our clustering problem is NP complete. Based on the proposed algorithm, we develop two variations to facilitate perceptual robustness versus fragility tradeoffs. We validate the perceptual significance of our hash by testing under Stirmark attacks. Finally, we develop randomized clustering algorithms for the purposes of secure image hashing.

123 citations


Proceedings ArticleDOI
18 Dec 2006
TL;DR: This work introduces the new problem of finding shape discords, the most unusual shapes in a collection, by using locality-sensitive hashing to estimate similarity between shapes which enables us to reorder the search more efficiently.
Abstract: Over the past three decades, there has been a great deal of research on shape analysis, focusing mostly on shape indexing, clustering, and classification. In this work, we introduce the new problem of finding shape discords, the most unusual shapes in a collection. We motivate the problem by considering the utility of shape discords in diverse domains including zoology, anthropology, and medicine. While the brute force search algorithm has quadratic time complexity, we avoid this by using locality-sensitive hashing to estimate similarity between shapes which enables us to reorder the search more efficiently. An extensive experimental evaluation demonstrates that our approach can speed up computation by three to four orders of magnitude.

112 citations


Journal ArticleDOI
TL;DR: A new method for rapid 3D object indexing that combines feature-based methods with coarse alignment-based matching techniques is proposed, achieving a sublinear complexity on the number of models and maintaining at the same time a high degree of performance for real 3D sensed data that is acquired in largely uncontrolled settings.
Abstract: We propose a new method for rapid 3D object indexing that combines feature-based methods with coarse alignment-based matching techniques. Our approach achieves a sublinear complexity on the number of models, maintaining at the same time a high degree of performance for real 3D sensed data that is acquired in largely uncontrolled settings. The key component of our method is to first index surface descriptors computed at salient locations from the scene into the whole model database using the locality sensitive hashing (LSH), a probabilistic approximate nearest neighbor method. Progressively complex geometric constraints are subsequently enforced to further prune the initial candidates and eliminate false correspondences due to inaccuracies in the surface descriptors and the errors of the LSH algorithm. The indexed models are selected based on the MAP rule using posterior probability of the models estimated in the joint 3D-signature space. Experiments with real 3D data employing a large database of vehicles, most of them very similar in shape, containing 1,000,000 features from more than 365 models demonstrate a high degree of performance in the presence of occlusion and obscuration, unmodeled vehicle interiors and part articulations, with an average processing time between 50 and 100 seconds per query

103 citations


01 Jan 2006
TL;DR: This chapter contains sections titled: The Locality-Sensitive Hashing Scheme Based on s-Stable Distributions, Approximate Near neighbor, Exact Near Neighbor, LSH in Practice: E2LSH, Experimental Results.
Abstract: This chapter contains sections titled: The Locality-Sensitive Hashing Scheme Based on s-Stable Distributions, Approximate Near Neighbor, Exact Near Neighbor, LSH in Practice: E2LSH, Experimental Results

68 citations


Journal ArticleDOI
TL;DR: The proposed cam weighted distance is orientation and scale adaptive to take advantage of the relevant information of inter-prototype relationship, so that a better classification performance can be achieved.

63 citations


Proceedings Article
01 Jan 2006
TL;DR: To scale the search to large song databases, an algorithm based on localitysensitive hashing (LSH) of sequences of audio features called audio shingles provides an efficient means to identify approximate nearest neighbors in a high-dimensional feature space.
Abstract: We present new methods for computing inter-song similarities using intersections between multiple audio pieces. The intersection contains portions that are similar, when one song is a derivative work of the other for example, in two different musical recordings. To scale our search to large song databases we have developed an algorithm based on localitysensitive hashing (LSH) of sequences of audio features called audio shingles. LSH provides an efficient means to identify approximate nearest neighbors in a high-dimensional feature space. We combine these nearest neighbor estimates, each a match from a very large database of audio to a small portion of the query song, to form a measure of the approximate similarity. We demonstrate the utility of our methods on a derivative works retrieval experiment using both exact and approximate (LSH) methods. The results show that LSH is at least an order of magnitude faster than the exact nearest neighbor method and that accuracy is not impacted by the approximate method.

62 citations


Proceedings ArticleDOI
05 Jun 2006
TL;DR: In this paper, it was shown that for X = l 1 it is impossible to achieve ⊇ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ⊆ ≤ 1 /c.
Abstract: Given a metric space (X,dX), c≥1, r>0, and p,q ≡ [0,1], a distribution over mappings H : X → N is called a (r,cr,p,q)-sensitive hash family if any two points in X at distance at most r are mapped by H to the same value with probability at least p, and any two points at distance greater than cr are mapped by H to the same value with probability at most q. This notion was introduced by Indyk and Motwani in 1998 as the basis for an efficient approximate nearest neighbor search algorithm, and has since been used extensively for this purpose. The performance of these algorithms is governed by the parameter ⊇=log(1/p)/log(1/q), and constructing hash families with small ⊇ automatically yields improved nearest neighbor algorithms. Here we show that for X=l1 it is impossible to achieve ⊇ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ⊇ ≤ 1/c.

Proceedings ArticleDOI
06 Nov 2006
TL;DR: Efficient algorithms using the technique of Locality-Sensitive Hashing (LSH) to extract topics from a document collection based on the asymmetric relationships between terms in a collection are presented.
Abstract: Topic or feature extraction is often used as an important step in document classification and text mining. Topics are succinct representation of content in a document collection and hence are very effective when used as content identifiers in peer-to-peer systems and other large scale distributed content management systems. Effective topic extraction is dependent on the accuracy of term clustering that often has to deal with problems like synonymy and polysemy. Retrieval techniques based on spectral analysis like Latent Semantic Indexing (LSI) are often used to effectively solve these problems. Most of the spectral retrieval schemes produce term similarity measures that are symmetric and often, not an accurate characterization of term relationships. Another drawback of LSI is its running time that is polynomial in the dimensions of the m x n matrix, A. This can get prohibitively large for some IR applications. In this paper, we present efficient algorithms using the technique of Locality-Sensitive Hashing (LSH) to extract topics from a document collection based on the asymmetric relationships between terms in a collection. The relationship is characterized by the term co-occurrences and other higher-order similarity measures. Our LSH based scheme can be viewed as a simple alternative to LSI. We show the efficacy of our algorithms via experiments on a set of large documents. An interesting feature of our algorithms is that it produces a natural hierarchical decomposition of the topic space instead of a flat clustering.

Journal ArticleDOI
TL;DR: A data mining framework built on top of a recently developed fast approximate nearest-neighbor-finding algorithm called locality-sensitive hashing (LSH) that can be used to mine huge chemical spaces in a scalable fashion using very modest computational resources is introduced.
Abstract: Virtual screening (VS) has become a preferred tool to augment high-throughput screening1 and determine new leads in the drug discovery process. The core of a VS informatics pipeline includes several data mining algorithms that work on huge databases of chemical compounds containing millions of molecular structures and their associated data. Thus, scaling traditional applications such as classification, partitioning, and outlier detection for huge chemical data sets without a significant loss in accuracy is very important. In this paper, we introduce a data mining framework built on top of a recently developed fast approximate nearest-neighbor-finding algorithm2 called locality-sensitive hashing (LSH) that can be used to mine huge chemical spaces in a scalable fashion using very modest computational resources. The core LSH algorithm hashes chemical descriptors so that points close to each other in the descriptor space are also close to each other in the hashed space. Using this data structure, one can perf...

Proceedings ArticleDOI
26 Sep 2006
TL;DR: A wavelet-based robust database hashing scheme is analyzed with respect to its resilience against image modifications and hostile attacks and a method to construct a forgery is presented.
Abstract: A wavelet-based robust database hashing scheme is analyzed with respect to its resilience against image modifications and hostile attacks. A method to construct a forgery is presented and possible countermeasures are discussed.

Proceedings ArticleDOI
06 Nov 2006
TL;DR: This paper proposes an approach for efficient approximative RkNN search in arbitrary metric spaces where the value of k is specified at query time by using an approximation of the nearest-neighbor-distances in order to prune the search space.
Abstract: In this paper, we propose an approach for efficient approximative RkNN search in arbitrary metric spaces where the value of k is specified at query time. Our method uses an approximation of the nearest-neighbor-distances in order to prune the search space. In several experiments, our solution scales significantly better than existing non-approximative approaches while producing an approximation of the true query result with a high recall.

Proceedings ArticleDOI
20 Aug 2006
TL;DR: Novel adaptive nearest neighbor classifiers based on hit-distance were proposed to generalize the representational capacity of available prototypes and it was shown that the proposed classifiers performed much better than the classical nearest neighbour classifier (NN) and the nearest feature line method (NFL).
Abstract: In this paper, a novel idea of distance, Hit-Distance, was firstly introduced to generalize the representational capacity of available prototypes. Novel adaptive nearest neighbor classifiers based on Hit-Distance were then proposed. Experiments were performed on 8 benchmark datasets from the UCI Machine Learning Repository. It was shown that the proposed classifiers performed much better than the classical nearest neighbor classifier (NN) and the nearest feature line method (NFL), the nearest feature plane method (NFP), the nearest neighbor line method (NNL) and the nearest neighbor plane method (NNP).

Proceedings ArticleDOI
01 Sep 2006
TL;DR: The proposed technique clearly outperforms all known regular hashing algorithms by a significant margin and follows an ad hoc design that is critical to adapting to real-time situation when there exists a changing database with an irregular non-uniform distribution.
Abstract: For applications that rely on large databases as the core data structure, the need for a fast search process is essential. Hashing algorithms have widely been adopted as the search algorithm of choice for fast lookups. Hashing algorithms involve the creation of hash values from the target database entries. A hashing algorithm that transforms the database to hash values with a distribution as uniform as possible would lead to a better search performance. When a database is already value-wise uniformly distributed, any regular hashing algorithm, such as bit-extraction, group-XOR, etc., will lead to a statistically perfect hashing result. In almost all known practical applications, the target database rarely demonstrates uniformly distributed characteristic. The use of any known regular hashing algorithm can lead to a performance far less than desirable. This paper aims at designing a hashing algorithm that can deliver a better performance for all practical databases. An analytical preprocess is performed on the original database to extract critical information that would significantly benefit the design of a better hashing algorithm. The process includes sorting database hash bits to provide a priority that would facilitate the decision-making on which bits and how these bits should be combined to generate better hash values. The algorithm follows an ad hoc design that is critical to adapting to real-time situation when there exists a changing database with an irregular non-uniform distribution. The proposed technique clearly outperforms all known regular hashing algorithms by a significant margin.


Journal Article
TL;DR: An HVS-based image hashing method incorporating a Watson's sensitivity matrix is proposed, which has better robustness against JPEG compression and low-pass filtering.
Abstract: An HVS-based image hashing method incorporating a Watson's sensitivity matrix is proposed.The transform domain matrix composed of 8-by-8 block DCT coefficients of the image are multiplied by N matrices that are pseudo-randomly generated with a key,and divided by the periodically extended Watson matrix.By quantization,an N-bit image hash is obtained.Compared to some other hashing methods,the HVS-based hash has better robustness against JPEG compression and low-pass filtering.Since a key is used in the algorithm,the hash is hard to be forged.

01 Oct 2006
TL;DR: The results show that LSH is at least an order of magnitude faster than the exact nearest neighbor method and that accuracy is not impacted by the approximate method.
Abstract: We present new methods for computing inter-song similarities using intersections between multiple audio pieces. The intersection contains portions that are similar, when one song is a derivative work of the other for example, in two different musical recordings. To scale our search to large song databaseswe have developed an algorithmbased on localitysensitive hashing (LSH) of sequences of audio features called audio shingles. LSH provides an efficient means to identify approximate nearest neighbors in a high-dimensional feature space. We combine these nearest neighbor estimates, each a match from a very large database of audio to a small portion of the query song, to form a measure of the approximate similarity. We demonstrate the utility of our methods on a derivative works retrieval experiment using both exact and approximate (LSH) methods. The results show that LSH is at least an order of magnitude faster than the exact nearest neighbor method and that accuracy is not impacted by the approximate method.

Book ChapterDOI
24 Sep 2006
TL;DR: A locally adaptive nearest neighbor classification method based on supervised learning style which works well for the multi-classification problems is proposed, and the ellipsoid clustering learning is applied to estimate an effective metric which is used in the K-NN classification.
Abstract: Nearest neighbor classifier is a widely-used effective method for multi-class problems. However, it suffers from the problem of the curse of dimensionality in high dimensional space. To solve this problem, many adaptive nearest neighbor classifiers were proposed. In this paper, a locally adaptive nearest neighbor classification method based on supervised learning style which works well for the multi-classification problems is proposed. In this method, the ellipsoid clustering learning is applied to estimate an effective metric. This metric is then used in the K-NN classification. Finally, the experimental results show that it is an efficient and robust approach for multi-classification.

Book
01 Jan 2006
TL;DR: This thesis studies algorithms for different kinds of search using hashing and sketching, and some fundamental limits of what can be realized using some of these approaches are studied.
Abstract: The Information Age has enabled the search for information in ways never imagined before The simplest search function may be an exact search where the input query is expected to exactly match the search object But some search criteria are fuzzy---for instance image search, news search, and similar document search---making the search problem much harder One common approach is to convert such a search object into a mathematical representation such as a point (vector) in a high dimensional space The search for a similar object then becomes a nearest neighbor search in a high dimensional space Hashing is a simple and effective method for exact search that uses a random hash function to map items into buckets, often viewed as throwing balls into bins A variant of hashing called locality-sensitive hashing that tends to map similar objects to the same hash bucket, can be used to perform nearest neighbor search A related notion is sketching that is used to transform a large complex object into a small sketch---often a tiny bitmap---so that similarity between the sketches can be used to estimate the similarity between the original objects In this thesis we study algorithms for different kinds of search using hashing and sketching, and some fundamental limits of what can be realized using some of these approaches For exact search, we will see how variants of balls-and-bins processes can be used to derive space efficient methods for maintaining hash tables For similarity search, we will see a variant of locality-sensitive hashing that uses linear space and how the underlying ideas can be used in the kd-tree data structure for improved performance We will also probe the fundamental limits of some of these approaches by showing lower bounds on their performance

08 Sep 2006
TL;DR: Two methods are proposed: one is to eliminate feature vectors that require a number of distance calculations, and the other is to use no distance calculation, which is two to three times efficient.
Abstract: Efficiency of object recognition methods using local descriptors such as SIFT and PCA-SIFT depends largely on the speed of matching between feature vectors since images are described by a large number of feature vectors. Because the matching is considered to be “nearest neighbor (NN) search” of feature vectors, the problem is paraphrased by “how to make the NN search efficient”. For the object recognition, it is required that the number of incorrect matching does not exceed that of correct matching. In other words, a certain number of incorrect matching is acceptable. This observation allows us to make NN search more efficient using approximate NN search with reduced distance calculation. For this purpose, we propose two methods: one is to eliminate feature vectors that require a number of distance calculations. The other is to use no distance calculation. From experimental results with 10,000 database images and 2,000 query images, it is shown that the proposed method is two to three times efficient as compared to a method using ANN and can achieve, recognition rate of 98% with 8.3 ms/query.

Kai Li1, Qin Lv1
01 Jan 2006
TL;DR: A sketch construction algorithm is proposed, such that the weighted (and thresholded) e1 distance between two feature vectors can be estimated by the Hamming distance of their sketches, which can typically reduce the space requirement by an order of magnitude with minimal impact on similarity search quality.
Abstract: Content-based image similarity search is a difficult problem due to the high dimensionality and usually massive amount of image data. The main challenge is to achieve high-quality similarity search with high speed and low space usage. This thesis proposes several techniques to address the problem of building a similarity search system for large-scale image datasets. A prototype image search system, called CASS-Image (Content-Aware Search System for Images), has been implemented to demonstrate the effectiveness of these techniques. The first contribution of this thesis is a sketch construction algorithm that converts high-dimensional feature vectors into bit vectors (sketches), such that the weighted (and thresholded) e1 distance between two feature vectors can be estimated by the Hamming distance of their sketches. Experimental results show that using sketches can typically reduce the space requirement by an order of magnitude with minimal impact on similarity search quality. The second is a hash-perturbation based LSH (Locality Sensitive Hashing) technique for approximate nearest neighbor search in high dimensions. This technique probes multiple buckets in each hash table by perturbing the hashed value of the query object. Performance evaluations show that this method is both time and space efficient. It has a similar time efficiency as the basic LSH method while reducing the space requirement by a factor of five. Also, its time efficiency is twice that of the point-perturbation based LSH method. The third is a multi-feature filtering algorithm for region-based image similarity search. This method uses approximation algorithms to generate a candidate set, and then ranks the objects in the candidate set with a more sophisticated multi-feature distance measure. It works for both feature vectors and their sketches. It can also be combined with indexing techniques to further speed up the search process. Performance evaluations show that filtering is 4--13 times faster than the brute-force approach, while still maintaining good search quality. This thesis also proposes a new region-based image similarity measure, EMD* match, which uses square-root region weights and region distance thresholding. Experimental results show that EMD* match is 27%--91% more effective than previous image similarity search techniques.

Book ChapterDOI
04 Sep 2006
TL;DR: In this article, a hypersphere indexer, named Hydex, is proposed to perform approximate nearest-neighbor search in high-dimensional data, where the data space is partitioned using concentric hyperspheres.
Abstract: Indexing high-dimensional data for efficient nearest-neighbor searches poses interesting research challenges. It is well known that when data dimension is high, the search time can exceed the time required for performing a linear scan on the entire dataset. To alleviate this dimensionality curse, indexing schemes such as locality sensitive hashing (LSH) and M-trees were proposed to perform approximate searches. In this paper, we propose a hypersphere indexer, named Hydex, to perform such searches. Hydex partitions the data space using concentric hyperspheres. By exploiting geometric properties, Hydex can perform effective pruning. Our empirical study shows that Hydex enjoys three advantages over competing schemes for achieving the same level of search accuracy. First, Hydex requires fewer seek operations. Second, Hydex can maintain sequential disk accesses most of the time. And third, it requires fewer distance computations.

01 Jan 2006
TL;DR: The results show that the proposed method exceeds a popular approximate nearest neighbor library, "ANN" in search time and accuracy in the case of higher-dimension and a large number of prototypes.
Abstract: In this paper we propose a fast approximate nearest neighbor search algorithm in a high dimensional spherical space using an idea called "distributed coding" which is to represent a vector by a set of many vectors and encode them efficiently. We implemented the algorithm and tested it with synthetic data. The results show that the proposed method exceeds a popular approximate nearest neighbor library, "ANN" in search time and accuracy in the case of higher-dimension and a large number of prototypes. Keyword Approximate Nearest Neighbor, Distributed Coding, k-d tree, Locality Sensitive Hashing 1. はじめに

Journal Article
TL;DR: It is shown that under some conditions, as dimensionality increases, the distances between query point and data points approach to each other, so the “nearest neighbor” is becoming meaningless.
Abstract: This paper explores the effect of dimensionality on the “nearest neighbor” problem.Based on statistics,it shows that under some conditions,as dimensionality increases,the distances between query point and data points approach to each other.So the “nearest neighbor” is becoming meaningless.The way of how to evaluate the dimensionality effect is presented.From two distributions of statistics about distance,the effect of dimensionality on the “nearest neighbor” problem is evaluated.Empirical result is presented to demonstrate the two distributions.

Book ChapterDOI
22 Jun 2006
TL;DR: It is shown that an SEA significantly improves the equivalent simple EA configuration with higher---dimensional problems in an expeditious manner.
Abstract: Evolutionary Algorithms (EAs) are common optimization techniques based on the concept of Darwinian evolution. During the search for the global optimum of a search space, a traditional EA will often become trapped in a local optimum. The Scouting-Inspired Evolutionary Algorithms (SEAs) are a recently---introduced family of EAs that use a cross---generational memory mechanism to overcome this problem and discover solutions of higher fitness. The merit of the SEAs has been established in previous work with a number of two and three-dimensional test cases and a variety of configurations. In this paper, we will present two approaches to using SEAs to solve high---dimensional problems. The first one involves the use of Locality Sensitive Hashing (LSH) for the repository of individuals, whereas the second approach entails the use of scouting---driven mutation at a certain rate, the Scouting Rate. We will show that an SEA significantly improves the equivalent simple EA configuration with higher---dimensional problems in an expeditious manner.

Journal Article
Dai Yafei1
TL;DR: A hashing function for large scale URLs set is proposed and it has better uniformity and stability than the other two (HfIp and hf) through three experiments of the large scale and is recommended used in the applications of needing to hash URLs.
Abstract: URL hashing is found many applications in Web research We propose a hashing function for large scale URLs set and find it has better uniformity and stability than the other two (HfIp and hf) through three experiments of the large scale It is a variation of the well-known function (ELFhash) and is recommended used in the applications of needing to hash URLs Moreover, it has low time cost and almost performance compared with MD5 and SHA-1 so that we think it is more practical than the other Finally, some future works are given

Proceedings ArticleDOI
23 Oct 2006
TL;DR: A sub-object retrieval system based on a segmentation method that utilizes the segmentation results to capture the higher-level concept of images and gets a stable and accurate result.
Abstract: This paper describes a sub-object retrieval system based on a segmentation method. We also use dynamic partial function (DPF) and indexing by locality sensitive hashing (LSH) for improving system performance. Such a system is useful for finding a sub-object from a large image database. In order to obtain the sub-object from a sample image, we use a segmentation method to cut out the object. The system utilizes the segmentation results to capture the higher-level concept of images and gets a stable and accurate result. Experimental and comparison results, which are performed using a general purpose database containing 20,000 images, are encouraging.