scispace - formally typeset
Search or ask a question

Showing papers on "Euclidean distance published in 2000"


Journal ArticleDOI
TL;DR: The Mahalanobis distance, in the original and principal component (PC) space, will be examined and interpreted in relation with the Euclidean distance (ED).

1,802 citations


Proceedings Article
10 Sep 2000
TL;DR: This paper presents a novel and fast indexing scheme for time sequences, when the distance function is any of arbitrary Lp norms including the popular Euclidean distance (L2 norm), and achieves significant speedups over the state of the art.
Abstract: Fast indexing in time sequence databases for similarity searching has attracted a lot of research recently. Most of the proposals, however, typically centered around the Euclidean distance and its derivatives. We examine the problem of multimodal similarity search in which users can choose the best one from multiple similarity models for their needs. In this paper, we present a novel and fast indexing scheme for time sequences, when the distance function is any of arbitrary Lp norms (p = 1; 2; : : : ;1). One feature of the proposed method is that only one index structure is needed for all Lp norms including the popular Euclidean distance (L2 norm). Our scheme achieves significant speedups over the state of the art: extensive experiments on real and synthetic time sequences show that the proposed method is comparable to the best competitor forL2 andL1 norms, but significantly (up to 10 times) faster for L1 norm.

753 citations


Proceedings ArticleDOI
01 Aug 2000
TL;DR: This paper introduces a modification of DTW which operates on a higher level abstraction of the data, in particular, a Piecewise Aggregate Approximation (PAA) which allows us to outperform DTW by one to two orders of magnitude, with no loss of accuracy.
Abstract: There has been much recent interest in adapting data mining algorithms to time series databases. Most of these algorithms need to compare time series. Typically some variation of Euclidean distance is used. However, as we demonstrate in this paper, Euclidean distance can be an extremely brittle distance measure. Dynamic time warping (DTW) has been suggested as a technique to allow more robust distance calculations, however it is computationally expensive. In this paper we introduce a modification of DTW which operates on a higher level abstraction of the data, in particular, a Piecewise Aggregate Approximation (PAA). Our approach allows us to outperform DTW by one to two orders of magnitude, with no loss of accuracy.

667 citations


Journal ArticleDOI
TL;DR: Experimental results, up to a 97 percent rate of success in classification, will show the possibility of using this biometric system in medium/high security environments with full acceptance from all users.
Abstract: A work in defining and implementing a biometric system based on hand geometry identification is presented here. Hand features are extracted from a color photograph taken when the user has placed his hand on a platform designed for such a task. Different pattern recognition techniques have been tested to be used in classification and/or verification from Euclidean distance to neural networks. Experimental results, up to a 97 percent rate of success in classification, will show the possibility of using this system in medium/high security environments with full acceptance from all users.

504 citations


Proceedings ArticleDOI
01 Feb 2000
TL;DR: The landmark model is presented, a model for time series that yields new techniques for similarity-based time series pattern querying and a generalized approach for removing noise from raw time series without smoothing out the peaks and bottoms.
Abstract: In this paper we present the landmark model, a model for time series that yields new techniques for similarity-based time series pattern querying. The landmark model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to landmark similarity, a general model of similarity that is consistent with human intuition and episodic memory. By tracking different specific subsets of features of landmarks, we can efficiently compute different landmark similarity measures that are invariant under corresponding subsets of six transformations; namely, shifting, uniform amplitude scaling, uniform time scaling, uniform bi-scaling, time warping and non-uniform amplitude scaling. A method of identifying features that are invariant under these transformations is proposed. We also discuss a generalized approach for removing noise from raw time series without smoothing out the peaks and bottoms. Beside these new capabilities, our experiments show that landmark indexing is considerably fast.

326 citations


Journal ArticleDOI
01 Sep 2000
TL;DR: The first lower bound on the peak-to-average power ratio (PAPR) of a constant energy code of a given length n, minimum Euclidean distance and rate is established and there exist asymptotically good codes whose PAPR is at most 8 log n.
Abstract: The first lower bound on the peak-to-average power ratio (PAPR) of a constant energy code of a given length n, minimum Euclidean distance and rate is established. Conversely, using a nonconstructive Varshamov-Gilbert style argument yields a lower bound on the achievable rate of a code of a given length, minimum Euclidean distance and maximum PAPR. The derivation of these bounds relies on a geometrical analysis of the PAPR of such a code. Further analysis shows that there exist asymptotically good codes whose PAPR is at most 8 log n. These bounds motivate the explicit construction of error-correcting codes with low PAPR. Bounds for exponential sums over Galois fields and rings are applied to obtain an upper bound of order (log n)/sup 2/ on the PAPRs of a constructive class of codes, the trace codes. This class includes the binary simplex code, duals of binary, primitive Bose-Chaudhuri-Hocquenghem (BCH) codes and a variety of their nonbinary analogs. Some open problems are identified.

288 citations


Journal ArticleDOI
TL;DR: Regression results demonstrated that the most commonly used county-based measures of geographic access (e.g., MSA designation and providers per capita) explained 3%–10% of the variation in accessibility and 34%–70% of that in availability.
Abstract: Objective: This research compared alternative measures of geographic access to health care providers using different levels of spatial aggregation (county, zipcode and street) and different methods of calculating the cost of space (Euclidean distance, road distance and travel time).

181 citations


Proceedings ArticleDOI
28 Feb 2000
TL;DR: This work uses a disk-based suffix tree as an index structure and employs lower-bound distance functions to filter out dissimilar subsequences without false dismissals to make the index structure compact and thus accelerate the query processing.
Abstract: We propose an indexing technique for fast retrieval of similar subsequences using time warping distances. A time warping distance is a more suitable similarity measure than the Euclidean distance in many applications, where sequences may be of different lengths or different sampling rates. Our indexing technique uses a disk-based suffix tree as an index structure and employs lower-bound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and thus accelerate the query processing, we convert sequences of continuous values to sequences of discrete values via a categorization method and store only a subset of suffixes whose first values are different from their preceding values. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than sequential scanning.

164 citations


Journal ArticleDOI
01 Aug 2000
TL;DR: A framework for learning based on information theoretic criteria is discussed and two approximations to the Kulback-Leibler divergence based on quadratic distances based on Cauchy-Schwartz inequality and Euclidean distance are proposed.
Abstract: This paper discusses a framework for learning based on information theoretic criteria. A novel algorithm based on Renyi's quadratic entropy is used to train, directly from a data set, linear or nonlinear mappers for entropy maximization or minimization. We provide an intriguing analogy between the computation and an information potential measuring the interactions among the data samples. We also propose two approximations to the Kulback-Leibler divergence based on quadratic distances (Cauchy-Schwartz inequality and Euclidean distance). These distances can still be computed using the information potential. We test the newly proposed distances in blind source separation (unsupervised learning) and in feature extraction for classification (supervised learning). In blind source separation our algorithm is capable of separating instantaneously mixed sources, and for classification the performance of our classifier is comparable to the support vector machines (SVMs).

146 citations


Journal ArticleDOI
TL;DR: It is shown that the Steiner ratio is 1/ 4, that is, the minimum spanning tree yields a polynomial-time approximation with performance ratio exactly 4, and there exists a poynomial- time approxi-mation scheme under certain conditions.
Abstract: Given n terminals in the Euclidean plane and a positive constant, find a Steiner tree interconnecting all terminals with the minimum number of Steiner points such that the Euclidean length of each edge is no more than the given positive constant. This problem is NP-hard with applications in VLSI design, WDM optical networks and wireless communications. In this paper, we show that (a) the Steiner ratio is 1/ 4, that is, the minimum spanning tree yields a polynomial-time approximation with performance ratio exactly 4, (b) there exists a polynomial-time approximation with performance ratio 3, and (c) there exists a polynomial-time approxi-mation scheme under certain conditions.

143 citations


Journal ArticleDOI
TL;DR: The Euclidean operator norm of a random matrix, A, obtained from a (non-random) matrix by randomizing the signs of the matrix's entries is considered, which is the best inequality possible (up to a multiplicative constant).
Abstract: We compare the Euclidean operator norm of a random matrix with the Euclidean norm of its rows and columns. In the first part of this paper, we show that if A is a random matrix with i.i.d. zero mean entries, then EpA∥h ≤ Kh (E maxi ∥ai• ph + E maxj ∥aj• ∥h), where K is a constant which does not depend on the dimensions or distribution of A (h, however, does depend on the dimensions). In the second part we drop the assumption that the entries of A are i.i.d. We therefore consider the Euclidean operator norm of a random matrix, A, obtained from a (non-random) matrix by randomizing the signs of the matrix's entries. We show that in this case, the best inequality possible (up to a multiplicative constant) is E∥Aph ≤ (c log1/4 min {m, n})h (E maxi ∥ai• ∥h + E maxj ∥aj• ∥h) (m, n the dimensions of the matrix and c a constant independent of m, n).

Journal ArticleDOI
TL;DR: A cost model for index structures for point databases such as the R*-tree and the X-tree is developed that provides accurate estimates of the number of data page accesses for range queries and nearest-neighbor queries under a Euclidean metric and a maximum metric.
Abstract: During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography, and molecular biology. An important research topic in multimedia databases is similarity search in large data sets. Most current approaches that address similarity search use the feature approach, which transforms important properties of the stored objects into points of a high-dimensional space (feature vectors). Thus, similarity search is transformed into a neighborhood search in feature space. Multidimensional index structures are usually applied when managing feature vectors. Query processing can be improved substantially with optimization techniques such as blocksize optimization, data space quantization, and dimension reduction. To determine optimal parameters, an accurate estimate of index-based query processing performance is crucial. In this paper we develop a cost model for index structures for point databases such as the R*-tree and the X-tree. It provides accurate estimates of the number of data page accesses for range queries and nearest-neighbor queries under a Euclidean metric and a maximum metric and a maximum metric. The problems specific to high-dimensional data spaces, called boundary effects, are considered. The concept of the fractal dimension is used to take the effects of correlated data into account.

01 Jan 2000
TL;DR: In this paper, the closest point transform to a manifold on a rectilinear grid in low dimensional spaces is computed by solving the Eikonal equation |∇u| = 1 by the method of characteristics.
Abstract: This paper presents a new algorithm for computing the closest point transform to a manifold on a rectilinear grid in low dimensional spaces. The closest point transform finds the closest point on a manifold and the Euclidean distance to a manifold for all the points in a grid, (or the grid points within a specified distance of the manifold). We consider manifolds composed of simple geometric shapes, such as, a set of points, piecewise linear curves or triangle meshes. The algorithm computes the closest point on and distance to the manifold by solving the Eikonal equation |∇u| = 1 by the method of characteristics. The method of characteristics is implemented efficiently with the aid of computational geometry and polygon/polyhedron scan conversion. The computed distance is accurate to within machine precision. The computational complexity of the algorithm is linear in both the number of grid points and the complexity of the manifold. Thus it has optimal computational complexity. Examples are presented for piecewise linear curves in 2D and triangle meshes in 3D. 1 The Closest Point Transform Let u(x), x ∈ R, be the distance from the point x to a manifold S. If dim(S) = n − 1, (for example curves in 2D or surfaces in 3D), then the distance is signed. The orientation of the manifold determines the sign of the distance. One can adopt the convention that the outward normals point in the direction of positive or negative distance. In order for the distance to be well-defined, the manifold must be orientable and have a consistent orientation. A Klein bottle in 3D for example is not orientable. Two concentric circles in 2D have consistent orientations only if the normals of the inner circle point “inward” and the normals of the outer circle point “outward”, or vice-versa. Otherwise the distance would be ill-defined in the region between the circles. For manifolds which are not closed, the distance is ill-defined in any neighborhood of the boundary. However, the distance is well-defined in neighborhoods of the manifold which do not contain the boundary. If dim(S) < n− 1, (for example a set of points in 2D or a curve in 3D), the distance is unsigned, (non-negative).

Proceedings ArticleDOI
24 Jul 2000
TL;DR: A way of selecting bankruptcy predictors is shown, using the Euclidean distance based criterion calculated within the SVM kernel, using a support vector machine approach to the problem.
Abstract: The conventional neural network approach has been found useful in predicting corporate distress from financial statements. We have adapted a support vector machine approach to the problem. A way of selecting bankruptcy predictors is shown, using the Euclidean distance based criterion calculated within the SVM kernel. A comparative study is provided using three classical corporate distress models and an alternative model based on the SVM approach.

Proceedings ArticleDOI
13 Jun 2000
TL;DR: A new algorithm for computing subpixel skeletons which is robust and accurate, has low computational complexity and preserves topology is introduced.
Abstract: There has recently been significant interest in using representations based on abstractions of Bhum's skeleton into a graph, for qualitative shape matching. The application of these techniques to large databases of shapes hinges on the availability of numerical algorithms for computing the medial axis. Unfortunately this computation can be extremely subtle. Approaches based on Voronoi techniques preserve topology but heuristic pruning measures are introduced to remove unwanted edges. Methods based on Euclidean distance functions can localize skeletal points accurately, but often at the cost of altering the object's topology. In this paper we introduce a new algorithm for computing subpixel skeletons which is robust and accurate, has low computational complexity and preserves topology. The key idea is to measure the net outward flux of a vector field per unit area, and to detect locations where a conservation of energy principle is violated. This is done in conjunction with a thinning process applied in a rectangular lattice. We illustrate the approach with several examples of skeletal graphs for biological and man-made silhouettes.

Book ChapterDOI
26 Jun 2000
TL;DR: The key idea is to measure the net outward flux of a vector field per unit volume, and to detect locations where a conservation of energy principle is violated.
Abstract: The medial surface of a volumetric object is of significant interest for shape analysis. However, its numerical computation can be subtle. Methods based on Voronoi techniques preserve the object's topology, but heuristic pruning measures are introduced to remove unwanted faces. Approaches based on Euclidean distance functions can localize medial surface points accurately, but often at the cost of altering the object's topology. In this paper we introduce a new algorithm for computing medial surfaces which addresses these concerns. The method is robust and accurate, has low computational complexity, and preserves topology. The key idea is to measure the net outward flux of a vector field per unit volume, and to detect locations where a conservation of energy principle is violated. This is done in conjunction with a thinning process applied in a cubic lattice. We illustrate the approach with examples of medial surfaces of synthetic objects and complex anatomical structures obtained from medical images.

Proceedings ArticleDOI
01 Jan 2000
TL;DR: In extensive experimental studies on the publicly available XM2VTS database, it is shown that the proposed metric is consistently superior to both the Euclidean distance and normalised correlation matching scores.
Abstract: We address the problem of face verification using linear discriminant analysis and investigate the issue of matching score 1 . We establish the reason behind the success of the normalised correlation. The improved understanding about the role of metric then naturally leads to a novel way of measuring the distance between a probe image and a model. In extensive experimental studies on the publicly available XM2VTS database 2 using the Lausanne protocol 3 we show that the proposed metric is consistently superior to both the Euclidean distance and normalised correlation matching scores. The effect of various photometric normalisations 4 on the matching scores is also investigated.

Journal ArticleDOI
TL;DR: If an approximation to the stretch factor is sufficient, then it can be efficiently computed by making only O(n) approximate shortest path queries in the graph G, and is applied to obtain efficient algorithms for approximating the stretchFactor of Euclidean graphs such as paths, cycles, trees, planar graphs, and general graphs.
Abstract: There are several results available in the literature dealing with efficient construction of t-spanners for a given set S of n points in $\IR^d$. t-spanners are Euclidean graphs in which distances between vertices in G are at most t times the Euclidean distances between them; in other words, distances in G are "stretched" by a factor of at most t. We consider the interesting dual problem: given a Euclidean graph G whose vertex set corresponds to the set S, compute the stretch factor of G, i.e., the maximum ratio between distances in G and the corresponding Euclidean distances. It can trivially be solved by solving the all-pairs-shortest-path problem. However, if an approximation to the stretch factor is sufficient, then we show it can be efficiently computed by making only O(n) approximate shortest path queries in the graph G. We apply this surprising result to obtain efficient algorithms for approximating the stretch factor of Euclidean graphs such as paths, cycles, trees, planar graphs, and general graphs. The main idea behind the algorithm is to use Callahan and Kosaraju's well-separated pair decomposition.

01 Jan 2000
TL;DR: A method of local modeling for predicting time series generated by nonlinear dynamic systems is proposed that incorporates a weighted Euclidean metric and a novel ρ-steps ahead crossvalidation error to assess model accuracy.
Abstract: A method of local modeling for predicting time series generated by nonlinear dynamic systems is proposed that incorporates a weighted Euclidean metric and a novel ρ-steps ahead crossvalidation error to assess model accuracy. The tradeoff between the cost of computation and model accuracy is discussed in the context of optimizing model parameters. A fast nearest neighbor algorithm and a novel modification to find neighboring trajectory segments are described.

Book ChapterDOI
18 Dec 2000
TL;DR: There exists a routing algorithm for arbitrary triangulations that has no memory and uses no randomization, and there is no competitive online routing algorithm under the Euclidean distance metric in arbitraryTriangulations.
Abstract: We consider online routing algorithms for finding paths between the vertices of plane graphs. We show (1) there exists a routing algorithm for arbitrary triangulations that has no memory and uses no randomization, (2) no equivalent result is possible for convex subdivisions, (3) there is no competitive online routing algorithm under the Euclidean distance metric in arbitrary triangulations, and (4) there is no competitive online routing algorithm under the link distance metric even when the input graph is restricted to be a Delaunay, greedy, or minimum-weight triangulation.

Proceedings ArticleDOI
15 Jun 2000
TL;DR: A probabilistic approach is presented and two likelihood-based similarity measures for image retrieval are described that perform significantly better than geometric approaches like the nearest neighbor rule with city-block or Euclidean distances.
Abstract: Similarity between images in image retrieval is measured by computing distances between feature vectors. This paper presents a probabilistic approach and describes two likelihood-based similarity measures for image retrieval. Popular distance measures like the Euclidean distance implicitly assign more more weighting to features with large ranges than those with small ranges. First, we discuss the effects of five feature normalization methods on retrieval performance. Then, we show that the probabilistic methods perform significantly better than geometric approaches like the nearest neighbor rule with city-block or Euclidean distances. They are also more robust to normalization effects and using better models for the features improves the retrieval results compared to making only general assumptions. Experiments on a database of approximately 10000 images show that studying the feature distributions are important and this information should be used in designing feature normalization methods and similarity measures.

Proceedings Article
01 Jan 2000
TL;DR: This paper investigates the performance of BICM over a multiple-antenna channel by evaluating its information-theoretic limits and provides some design guidelines, and checks the results via some simulations.
Abstract: Bit-interleaved coded modulation (BICM) is known to provide a robust solution to the coding prob- lem for wireless channels, as it provides at the same time a large Hamming distance and a large (albeit nonoptimum) Euclidean distance. As BICM splits the code design into the selection of the encoder and the selection of a modula- tion scheme, the design task is simpler than with "standard" space-time codes. In this paper we investigate the performance of BICM over a multiple-antenna channel by evaluating its information-theoretic limits. Further, we provide some design guidelines and we check our results via some simulations.

Journal ArticleDOI
TL;DR: In this paper an optimal piece-wise linear approximation of the Euclidean norm is presented which is applied to vector median filtering.
Abstract: For reducing impulsive noise without degrading image contours, median filtering is a powerful tool. In multiband images, as for example color images or vector fields obtained by optic flow computation, a vector median filter can be used. Vector median filters are defined on the basis of a suitable distance, the best performing distance being the Euclidean. Euclidean distance is evaluated by using the Euclidean norm which is quite demanding from the point of view of computation given that a square root is required. In this paper an optimal piece-wise linear approximation of the Euclidean norm is presented which is applied to vector median filtering.

Journal ArticleDOI
TL;DR: Several fast encoding algorithms based on multiple triangle inequalities and wavelet transform to overcome the problem of expensive computation for searching the closest codevector to the input vector are presented.
Abstract: The encoding of vector quantization (VQ) needs expensive computation for searching the closest codevector to the input vector. This paper presents several fast encoding algorithms based on multiple triangle inequalities and wavelet transform to overcome this problem. The multiple triangle inequalities confine a search range using the intersection of search areas generated from several control vectors. A systematic way for designing the control vectors is also presented. The wavelet transform combined with the partial distance elimination is used to reduce the computational complexity of the distance calculation of vectors. The proposed algorithms provide the same coding quality as the full search method. The experimental results indicate that the new algorithms perform more efficiently than existing algorithms.

Journal ArticleDOI
TL;DR: Using a peeling procedure to decompose the discrete, triangulated geometries along a one-dimensional path, each Euclidean space-time is explicitly associated with each Lorentzian space- time, which motivates a map between the parameter spaces of the two theories.

Journal ArticleDOI
TL;DR: The results show that the proposed method selects about one-fourth as many candidates with accuracy preserved compared to the conventional method that selects a fixed number of candidates.
Abstract: This paper proposes a precise candidate selection method for large character set recognition by confidence evaluation of distance-based classifiers. The proposed method is applicable to a wide variety of distance metrics and experiments on Euclidean distance and city block distance have achieved promising results. By confidence evaluation, the distribution of distances is analyzed to derive the probabilities of classes in two steps: output probability evaluation and input probability inference. Using the input probabilities as confidences, several selection rules have been tested and the rule that selects the classes with high confidence ratio to the first rank class produced best results. The experiments were implemented on the ETL9B database and the results show that the proposed method selects about one-fourth as many candidates with accuracy preserved compared to the conventional method that selects a fixed number of candidates.

Journal ArticleDOI
TL;DR: In both experiments, verification times, but not distance estimations, were affected by group membership, suggesting that perceptual, not memory processes were responsible for the formation of cognitive clusters.
Abstract: Two experiments were performed to investigate the organization of spatial information in perception and memory. Participants were confronted with map-like configurations of objects which were grouped by color (Experiment 1) or shape (Experiment 2) so as to induce cognitive clustering. Two tasks were administered: speeded verification of spatial relations between objects and unspeeded estimation of the Euclidean distance between object pairs. In both experiments, verification times, but not distance estimations, were affected by group membership. Spatial relations of objects belonging to the same color or shape group were verified faster than those of objects from different groups, even if the spatial distance was identical. These results did not depend on whether judgments were based on perceptually available or memorized information, suggesting that perceptual, not memory processes were responsible for the formation of cognitive clusters.

Journal ArticleDOI
TL;DR: It is shown that any weighted tree T with n vertices and L leaves can be embedded into d -dimensional Euclidean space with Õ (L1/(d-1)) distortion and an embedding with almost the same distortion which can be computed efficiently is exhibited.
Abstract: We consider embedding metrics induced by trees into Euclidean spaces with a restricted number of dimensions. We show that any weighted tree T with n vertices and L leaves can be embedded into d -dimensional Euclidean space with O (L 1/(d-1) ) distortion. Furthermore, we exhibit an embedding with almost the same distortion which can be computed efficiently. This distortion substantially improves the previous best upper bound of \tilde O (n 2/d ) and almost matches the best known lower bound of Ω(L 1/d ) .

Journal ArticleDOI
01 Feb 2000
TL;DR: The proposed OPATA8 has good noise resistance, perfectly 8-connected skeleton output, and a faster speed without serious erosion, and has been compared to both algorithm OPATA4 and Zhang and Suen's two-pass parallel thinning algorithm.
Abstract: This paper investigates the skeletonization problem using parallel thinning techniques and proposes a new one-pass parallel asymmetric thinning algorithm (OPATA8). Wu and Tsai presented a one-pass parallel asymmetric thinning algorithm (OPATA4) that implemented 4-distance, or city block distance, skeletonization. However, city block distance is not a good approximation of Euclidean distance. By applying 8-distance, or chessboard distance, this new algorithm improves not only the quality of the resulting skeletons but also the efficiency of the computation. This algorithm uses 18 patterns. The algorithm has been implemented, and has been compared to both algorithm OPATA4 and Zhang and Suen's two-pass parallel thinning algorithm. The results show that the proposed OPATA8 has good noise resistance, perfectly 8-connected skeleton output, and a faster speed without serious erosion.