scispace - formally typeset
Search or ask a question

Showing papers on "Euclidean distance published in 2006"


Journal ArticleDOI
TL;DR: This note shows how to select the projection matrix in such a way that the euclidean norm of the resulting perturbation is minimal, which is particularly useful if integral sliding-mode control is to be combined with other methods to further robustify against unmatched perturbations.
Abstract: The robustness properties of integral sliding-mode controllers are studied. This note shows how to select the projection matrix in such a way that the euclidean norm of the resulting perturbation is minimal. It is also shown that when the minimum is attained, the resulting perturbation is not amplified. This selection is particularly useful if integral sliding-mode control is to be combined with other methods to further robustify against unmatched perturbations. H/sub /spl infin// is taken as a special case. Simulations support the general analysis and show the effectiveness of this particular combination.

535 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: Experimental results show that the proposed algorithms for Discriminative Component Analysis and Kernel DCA are effective and promising in learning good quality distance metrics for image retrieval.
Abstract: Relevant Component Analysis (RCA) has been proposed for learning distance metrics with contextual constraints for image retrieval. However, RCA has two important disadvantages. One is the lack of exploiting negative constraints which can also be informative, and the other is its incapability of capturing complex nonlinear relationships between data instances with the contextual information. In this paper, we propose two algorithms to overcome these two disadvantages, i.e., Discriminative Component Analysis (DCA) and Kernel DCA. Compared with other complicated methods for distance metric learning, our algorithms are rather simple to understand and very easy to solve. We evaluate the performance of our algorithms on image retrieval in which experimental results show that our algorithms are effective and promising in learning good quality distance metrics for image retrieval.

330 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied configurations of points on the unit sphere that minimize potential energy for a broad class of potential functions (viewed as functions of the squared Euclidean distance between points).
Abstract: We study configurations of points on the unit sphere that minimize potential energy for a broad class of potential functions (viewed as functions of the squared Euclidean distance between points). Call a configuration sharp if there are m distances between distinct points in it and it is a spherical (2m-1)-design. We prove that every sharp configuration minimizes potential energy for all completely monotonic potential functions. Examples include the minimal vectors of the E_8 and Leech lattices. We also prove the same result for the vertices of the 600-cell, which do not form a sharp configuration. For most known cases, we prove that they are the unique global minima for energy, as long as the potential function is strictly completely monotonic. For certain potential functions, some of these configurations were previously analyzed by Yudin, Kolushov, and Andreev; we build on their techniques. We also generalize our results to other compact two-point homogeneous spaces, and we conclude with an extension to Euclidean space.

285 citations


Journal ArticleDOI
TL;DR: A large class of valid models that incorporate flow and stream distance by using spatial moving averages are developed by running the moving average function upstream from a location, and by construction they are valid models based on stream distance.
Abstract: We develop spatial statistical models for stream networks that can estimate relationships between a response variable and other covariates, make predictions at unsampled locations, and predict an average or total for a stream or a stream segment. There have been very few attempts to develop valid spatial covariance models that incorporate flow, stream distance, or both. The application of typical spatial autocovariance functions based on Euclidean distance, such as the spherical covariance model, are not valid when using stream distance. In this paper we develop a large class of valid models that incorporate flow and stream distance by using spatial moving averages. These methods integrate a moving average function, or kernel, against a white noise process. By running the moving average function upstream from a location, we develop models that use flow, and by construction they are valid models based on stream distance. We show that with proper weighting, many of the usual spatial models based on Euclidean distance have a counterpart for stream networks. Using sulfate concentrations from an example data set, the Maryland Biological Stream Survey (MBSS), we show that models using flow may be more appropriate than models that only use stream distance. For the MBSS data set, we use restricted maximum likelihood to fit a valid covariance matrix that uses flow and stream distance, and then we use this covariance matrix to estimate fixed effects and make kriging and block kriging predictions.

276 citations


Proceedings ArticleDOI
14 May 2006
TL;DR: This paper develops several algorithms for non-negative matrix factorization (NMF) in applications to blind (or semi blind) source separation (BSS), when sources are generally statistically dependent under conditions that additional constraints are imposed such as nonnegativity, sparsity, smoothness, lower complexity or better predictability.
Abstract: In this paper we develop several algorithms for non-negative matrix factorization (NMF) in applications to blind (or semi blind) source separation (BSS), when sources are generally statistically dependent under conditions that additional constraints are imposed such as nonnegativity, sparsity, smoothness, lower complexity or better predictability. We express the non-negativity constraints using a wide class of loss (cost) functions, which leads to an extended class of multiplicative algorithms with regularization. The proposed relaxed forms of the NMF algorithms have a higher convergence speed with the desired constraints. Moreover, the effects of various regularization and constraints are clearly shown. The scope of the results is vast since the discussed loss functions include quite a large number of useful cost functions such as weighted Euclidean distance, relative entropy, Kullback Leibler divergence, and generalized Hellinger, Pearson's, Neyman's distances, etc.

248 citations


Journal ArticleDOI
TL;DR: The statistical discrimination and clustering literature has studied the problem of identifying similarities in time series data and the use of both hierarchical and non-hierarchical clustering algorithms is considered.

244 citations


Proceedings ArticleDOI
01 Sep 2006
TL;DR: This paper studies k-NN monitoring in road networks, where the distance between a query and a data object is determined by the length of the shortest path connecting them, and proposes two methods that can handle arbitrary object and query moving patterns, as well as fluctuations of edge weights.
Abstract: Recent research has focused on continuous monitoring of nearest neighbors (NN) in highly dynamic scenarios, where the queries and the data objects move frequently and arbitrarily. All existing methods, however, assume the Euclidean distance metric. In this paper we study k-NN monitoring in road networks, where the distance between a query and a data object is determined by the length of the shortest path connecting them. We propose two methods that can handle arbitrary object and query moving patterns, as well as fluctuations of edge weights. The first one maintains the query results by processing only updates that may invalidate the current NN sets. The second method follows the shared execution paradigm to reduce the processing time. In particular, it groups together the queries that fall in the path between two consecutive intersections in the network, and produces their results by monitoring the NN sets of these intersections. We experimentally verify the applicability of the proposed techniques to continuous monitoring of large data and query sets.

230 citations


Proceedings ArticleDOI
01 Sep 2006
TL;DR: This work can take current approaches and make them four orders of magnitude faster, without false dismissals, and is used with any of the dozens of existing shape representations and with all the most popular distance measures including Euclidean distance, Dynamic Time Warping and Longest Common Subsequence.
Abstract: The matching of two-dimensional shapes is an important problem with applications in domains as diverse as biometrics, industry, medicine and anthropology. The distance measure used must be invariant to many distortions, including scale, offset, noise, partial occlusion, etc. Most of these distortions are relatively easy to handle, either in the representation of the data or in the similarity measure used. However rotation invariance seems to be uniquely difficult. Current approaches typically try to achieve rotation invariance in the representation of the data, at the expense of discrimination ability, or in the distance measure, at the expense of efficiency. In this work we show that we can take the slow but accurate approaches and dramatically speed them up. On real world problems our technique can take current approaches and make them four orders of magnitude faster, without false dismissals. Moreover, our technique can be used with any of the dozens of existing shape representations and with all the most popular distance measures including Euclidean distance, Dynamic Time Warping and Longest Common Subsequence.

223 citations


Journal ArticleDOI
TL;DR: In this article, the closest tensors of higher symmetry classes are derived in explicit form for a given elasticity tensor of arbitrary symmetry, where the mathematical problem is to minimize the elastic length or distance between the given tensor and the closest elasticity Tensor of the specified symmetry.
Abstract: The closest tensors of higher symmetry classes are derived in explicit form for a given elasticity tensor of arbitrary symmetry. The mathematical problem is to minimize the elastic length or distance between the given tensor and the closest elasticity tensor of the specified symmetry. Solutions are presented for three distance functions, with particular attention to the Riemannian and log-Euclidean distances. These yield solutions that are invariant under inversion, i.e., the same whether elastic stiffness or compliance are considered. The Frobenius distance function, which corresponds to common notions of Euclidean length, is not invariant although it is simple to apply using projection operators. A complete description of the Euclidean projection method is presented. The three metrics are considered at a level of detail far greater than heretofore, as we develop the general framework to best fit a given set of moduli onto higher elastic symmetries. The procedures for finding the closest elasticity tensor are illustrated by application to a set of 21 moduli with no underlying symmetry.

204 citations


Journal ArticleDOI
01 Jan 2006
TL;DR: The experimental results demonstrate that the index can help speed up the computation of expensive similarity measures such as the LCSS and the DTW and can also be tailored to provide much faster response time at the expense of slightly reduced precision/recall.
Abstract: While most time series data mining research has concentrated on providing solutions for a single distance function, in this work we motivate the need for an index structure that can support multiple distance measures. Our specific area of interest is the efficient retrieval and analysis of similar trajectories. Trajectory datasets are very common in environmental applications, mobility experiments, and video surveillance and are especially important for the discovery of certain biological patterns. Our primary similarity measure is based on the longest common subsequence (LCSS) model that offers enhanced robustness, particularly for noisy data, which are encountered very often in real-world applications. However, our index is able to accommodate other distance measures as well, including the ubiquitous Euclidean distance and the increasingly popular dynamic time warping (DTW). While other researchers have advocated one or other of these similarity measures, a major contribution of our work is the ability to support all these measures without the need to restructure the index. Our framework guarantees no false dismissals and can also be tailored to provide much faster response time at the expense of slightly reduced precision/recall. The experimental results demonstrate that our index can help speed up the computation of expensive similarity measures such as the LCSS and the DTW.

178 citations


Journal ArticleDOI
TL;DR: A simple nonparametric classifier based on the local mean vectors is proposed that is compared with the 1-NN, k-nn, Euclidean distance, Parzen, and artificial neural network (ANN) classifiers in terms of the error rate on the unknown patterns.

Journal ArticleDOI
Guy Lebanon1
TL;DR: This work considers the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points and discusses in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations
Abstract: Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.

Journal ArticleDOI
Hui Wang1
TL;DR: This paper considers one definition of neighborhood for multivariate data and derives a formula for such similarity, called neighborhood counting measure or NCM, which has a computational complexity in the same order as the standard Euclidean distance function and works for numerical and categorical data in a conceptually uniform way.
Abstract: Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it works for other types of data.

Journal ArticleDOI
01 Nov 2006
TL;DR: A novel generalized approach using the well-known energy compaction power of Fourier-related transforms to hide sensitive data values and to approximately preserve Euclidean distances in centralized and distributed scenarios to a great degree of accuracy is proposed.
Abstract: Privacy preserving data mining has become increasingly popular because it allows sharing of privacy-sensitive data for analysis purposes. However, existing techniques such as random perturbation do not fare well for simple yet widely used and efficient Euclidean distance-based mining algorithms. Although original data distributions can be pretty accurately reconstructed from the perturbed data, distances between individual data points are not preserved, leading to poor accuracy for the distance-based mining methods. Besides, they do not generally focus on data reduction. Other studies on secure multi-party computation often concentrate on techniques useful to very specific mining algorithms and scenarios such that they require modification of the mining algorithms and are often difficult to generalize to other mining algorithms or scenarios. This paper proposes a novel generalized approach using the well-known energy compaction power of Fourier-related transforms to hide sensitive data values and to approximately preserve Euclidean distances in centralized and distributed scenarios to a great degree of accuracy. Three algorithms to select the most important transform coefficients are presented, one for a centralized database case, the second one for a horizontally partitioned, and the third one for a vertically partitioned database case. Experimental results demonstrate the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: A genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k and finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.
Abstract: The k-means algorithm is widely used for clustering because of its computational efficiency. Given n points in d-dimensional space and the number of desired clusters k, k-means seeks a set of k-cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyper-quadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.

Proceedings ArticleDOI
14 Mar 2006
TL;DR: An interactive algorithm to compute discretized 3D Euclidean distance fields using a set of piecewise linear geometric primitives that is more accurate and almost one order of magnitude faster as compared to previous distance computation algorithms that use graphics hardware.
Abstract: We present an interactive algorithm to compute discretized 3D Euclidean distance fields. Given a set of piecewise linear geometric primitives, our algorithm computes the distance field for each slice of a uniform spatial grid. We express the non-linear distance function of each primitive as a dot product of linear factors. The linear terms are efficiently computed using texture mapping hardware. We also improve the performance by using culling techniques that reduce the number of distance function evaluations using bounds on Voronoi regions of the primitives. Our algorithm involves no preprocessing and is able to handle complex deforming models at interactive rates. We have implemented our algorithm on a PC with NVIDIA GeForce 7800 GPU and applied it to models composed of thousands of triangles. We demonstrate its application to medial axis approximation and proximity computations between rigid and deformable models. In practice, our algorithm is more accurate and almost one order of magnitude faster as compared to previous distance computation algorithms that use graphics hardware.

Journal ArticleDOI
TL;DR: A new distance measure between n-gram profiles is reported, which shows superior performance compared to many other measures, including commonly used Euclidean distance.

Journal ArticleDOI
TL;DR: A new distance measure is proposed that better quantifies the similarity evaluation between two orientation fields than the conventional Euclidean and Manhattan distance measures and is applicable to large databases.
Abstract: This paper presents a front-end filtering algorithm for fingerprint identification, which uses orientation field and dominant ridge distance as retrieval features. We propose a new distance measure that better quantifies the similarity evaluation between two orientation fields than the conventional Euclidean and Manhattan distance measures. Furthermore, fingerprints in the data base are clustered to facilitate a fast retrieval process that avoids exhaustive comparisons of an input fingerprint with all fingerprints in the data base. This makes the proposed approach applicable to large databases. Experimental results on the National Institute of Standards and Technology data base-4 show consistent better retrieval performance of the proposed approach compared to other continuous and exclusive fingerprint classification methods as well as minutia-based indexing schemes

Book ChapterDOI
01 Jan 2006
TL;DR: This work describes a distributed or decomposed semidefinite programming (SDP) method for solving Euclidean metric localization problems that arise from ad hoc wireless sensor networks.
Abstract: We describe a distributed or decomposed semidefinite programming (SDP) method for solving Euclidean metric localization problems that arise from ad hoc wireless sensor networks. Using the distributed method, we can solve very large scale semidefinite programs which are intractable for the centralized methods. Our distributed or decomposed SDP scheme also seems to be applicable to solving other Euclidean geometry problems where points are locally connected.

Journal ArticleDOI
TL;DR: The deformable spanner succinctly encodes all proximity information in a deforming point cloud, giving us efficient kinetic algorithms for problems such as the closest pair, the near neighbors of all points, approximate nearest neighbor search, well-separated pair decompositions, and approximate k-centers.
Abstract: For a set S of points in Rd, an s-spanner is a subgraph of the complete graph with node set S such that any pair of points is connected via some path in the spanner whose total length is at most s times the Euclidean distance between the points. In this paper we propose a new sparse (1 + e)-spanner with O(n/ed) edges, where e is a specified parameter. The key property of this spanner is that it can be efficiently maintained under dynamic insertion or deletion of points, as well as under continuous motion of the points in both the kinetic data structures setting and in the more realistic blackbox displacement model we introduce. Our deformable spanner succinctly encodes all proximity information in a deforming point cloud, giving us efficient kinetic algorithms for problems such as the closest pair, the near neighbors of all points, approximate nearest neighbor search (aka approximate Voronoi diagram), well-separated pair decompositions, and approximate k-centers.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method of principal component analysis for Riemannian manifolds based on geodesics of the intrinsic metric, and provided a numerical implementation in the case of spheres.
Abstract: Classical principal component analysis on manifolds, for example on Kendall's shape spaces, is carried out in the tangent space of a Euclidean mean equipped with a Euclidean metric. We propose a method of principal component analysis for Riemannian manifolds based on geodesics of the intrinsic metric, and provide a numerical implementation in the case of spheres. This method allows us, for example, to compare principal component geodesics of different data samples. In order to determine principal component geodesics, we show that in general, owing to curvature, the principal component geodesics do not pass through the intrinsic mean. As a consequence, means other than the intrinsic mean are considered, allowing for several choices of definition of geodesic variance. In conclusion we apply our method to the space of planar triangular shapes and compare our findings with those of standard Euclidean principal component analysis.

Proceedings ArticleDOI
01 Sep 2006
TL;DR: A distributed algorithm that uses inter-vehicle distance estimates, made using a radio-based ranging technology, to localize a vehicle among its neighbours to accurately estimate the position of a vehicle within a cluster is proposed.
Abstract: We propose a distributed algorithm that uses inter-vehicle distance estimates, made using a radio-based ranging technology, to localize a vehicle among its neighbours. Given that the inter-vehicle distance estimates contain noise, our algorithm reduces the residuals of the Euclidean distance between the vehicles and their measured distances, allowing it to accurately estimate the position of a vehicle within a cluster. In this paper, we show that our proposed algorithm outperforms previously proposed algorithms and present its performance in a simulated vehicular environment.

Journal ArticleDOI
TL;DR: Using the multipath component distance (MCD) is used to calculate the distance between individual multipath components estimated by a channel parameter estimator, such as SAGE, which significantly improved clustering performance.
Abstract: The problem of identifying clusters from MIMO measurement data is addressed. Conventionally, visual inspection has been used for cluster identification, but this approach is impractical for a large amount of measurement data. For automatic clustering, the multipath component distance (MCD) is used to calculate the distance between individual multipath components estimated by a channel parameter estimator, such as SAGE. This distance is implemented in the well-known KMeans clustering algorithm. To demonstrate the effectiveness of the choice made, the performance of the MCD and the Euclidean distance were compared by clustering synthetic data generated by the 3GPP spatial channel model (SCM). Using the MCD significantly improved clustering performance.

Journal ArticleDOI
TL;DR: A multiple criteria scatter search to deal with bounded constrained non-linear continuous vector optimization problems of high dimension, applying a MultiStart Tabu Search as a diversification generation method, each TS works with its own starting point, recency memory, and aspiration threshold.

Book ChapterDOI
TL;DR: This paper investigates the relation between non-Euclidean aspects of dissimilarity data and the classification performance of the direct NN rule and some classifiers trained in representation spaces and concludes that statistical classifiers perform well and the optimal values of the parameters characterize a non-D Euclidean and somewhat non-metric measure.
Abstract: Statistical learning algorithms often rely on the Euclidean distance. In practice, non-Euclidean or non-metric dissimilarity measures may arise when contours, spectra or shapes are compared by edit distances or as a consequence of robust object matching [1,2]. It is an open issue whether such measures are advantageous for statistical learning or whether they should be constrained to obey the metric axioms. The k-nearest neighbor (NN) rule is widely applied to general dissimilarity data as the most natural approach. Alternative methods exist that embed such data into suitable representation spaces in which statistical classifiers are constructed [3]. In this paper, we investigate the relation between non-Euclidean aspects of dissimilarity data and the classification performance of the direct NN rule and some classifiers trained in representation spaces. This is evaluated on a parameterized family of edit distances, in which parameter values control the strength of non-Euclidean behavior. Our finding is that the discriminative power of this measure increases with increasing non-Euclidean and non-metric aspects until a certain optimum is reached. The conclusion is that statistical classifiers perform well and the optimal values of the parameters characterize a non-Euclidean and somewhat non-metric measure.

Book ChapterDOI
31 Dec 2006
TL;DR: In this paper, the ergodic theory of translation surfaces is discussed and a saddle connection is defined as a geodesic joining two of the singularities with no singularities in its interior.
Abstract: This chapter discusses the ergodic theory of translation surfaces. Since the Euclidean metric on the plane is preserved by translations, the notion of direction and parallel lines makes sense on the complement of the singularity set. Geodesics can change direction if they go through a singular point. A pair of straight lines through the singular point forms a geodesic if the angle between them is at least 2π. A saddle connection is a geodesic joining two of the singularities with no singularities in its interior. In each coordinate chart, it is a straight line in the Euclidean metric. An oriented saddle connection determines a vector called the holonomy vector of the saddle connection.

Journal ArticleDOI
TL;DR: An analytic approach is presented to capture the statistics on hop count for a given source-to-destination Euclidean distance in a greedy routing approach and it is shown that, for agiven hop count, the bounds on Euclidesan distance can be computed from the distribution characteristics of per-hop progress.
Abstract: Wireless ad hoc networks are generally characterised by random node locations and multi-hop routes. A quantitative knowledge of the relation between hop count and Euclidean distance could provide a better understanding of important network parameters such as end-to-end delay, power consumption along the route, and node localisation. In this paper, we present an analytic approach to capture the statistics on hop count for a given source-to-destination Euclidean distance in a greedy routing approach. We also show that, for a given hop count, the bounds on Euclidean distance can be computed from the distribution characteristics of per-hop progress.

Journal ArticleDOI
26 Jun 2006
TL;DR: It is demonstrated that the inaccuracy of Euclidean embedding is caused by a large degree of triangle inequality violation in the Internet distances, which leads to negative eigenvalues of large magnitude, and a new hybrid model for embedding the network nodes using only a 2-dimensional Euclideans coordinate system and small error adjustment terms is proposed.
Abstract: In this paper, we investigate the suitability of embedding Internet hosts into a Euclidean space given their pairwise distances (as measured by round-trip time). Using the classical scaling and matrix perturbation theories, we first establish the (sum of the) magnitude of negative eigenvalues of the (doubly-centered, squared) distance matrix as a measure of suitability of Euclidean embedding. We then show that the distance matrix among Internet hosts contains negative eigenvalues of large magnitude, implying that embedding the Internet hosts in a Euclidean space would incur relatively large errors. Motivated by earlier studies, we demonstrate that the inaccuracy of Euclidean embedding is caused by a large degree of triangle inequality violation (TIV) in the Internet distances, which leads to negative eigenvalues of large magnitude. Moreover, we show that the TIVs are likely to occur locally, hence, the distances among these close-by hosts cannot be estimated accurately using a global Euclidean embedding, in addition, increasing the dimension of embedding does not reduce the embedding errors. Based on these insights, we propose a new hybrid model for embedding the network nodes using only a 2-dimensional Euclidean coordinate system and small error adjustment terms. We show that the accuracy of the proposed embedding technique is as good as, if not better, than that of a 7-dimensional Euclidean embedding.

Journal ArticleDOI
TL;DR: Three main types of relevance feedback algorithms are investigated; the Euclidean, the query point movements and the correlation-based approaches, and a newly objective criterion, called average normalized similarity metric distance is introduced which exploits the difference among the actual and ideal similarity measure among all best retrievals.
Abstract: Multimedia content modeling, i.e., identification of semantically meaningful entities, is an arduous task mainly due to the fact that (a) humans perceive the content using high-level concepts and (b) the subjectivity of human perception, which often interprets the same content in a different way at different times. For this reason, an efficient content management system has to be adapted to current user's information needs and preferences through an on-line learning strategy based on users’ interaction. One adaptive learning strategy is relevance feedback, originally developed in traditional text-based information retrieval systems. In this way, the user interacts with the system to provide information about the relevance of the content, which is then fed back to the system to update its performance. In this paper, we evaluate and investigate three main types of relevance feedback algorithms; the Euclidean, the query point movements and the correlation-based approaches. In the first case, we examine heuristic and optimal techniques which are based either on the weighted or the generalized Euclidean distance. In the second case, we survey single and multipoint query movement schemes. As far as the third type is concerned, two different ways for parametrizing the normalized cross-correlation similarity metric are proposed. The first scales only the elements of the query feature vector and called query-scaling strategy, while the second scales both the query and the selected samples (query-sample scaling strategy). All the examined algorithms are evaluated using both subjective and objective criteria. Subjective evaluation is performed by depicting the best retrieved images as response of the system to a user's query. Instead, objective evaluation is obtained using standard criteria, such as the precision–recall curve and the average normalized modified retrieval rank (ANMRR). Furthermore, a newly objective criterion, called average normalized similarity metric distance is introduced which exploits the difference among the actual and ideal similarity measure among all best retrievals. Discussions and comparisons of all the aforementioned relevance feedback algorithms are presented.

Book ChapterDOI
13 Dec 2006
TL;DR: This paper illustrates the point for partially synthetic microdata and shows that, in some cases, Mahalanobis DBRL can yield a very high re-identification percentage, far superior to the one offered by other record linkage methods.
Abstract: Distance-based record linkage (DBRL) is a common approach to empirically assessing the disclosure risk in SDC-protected microdata. Usually, the Euclidean distance is used. In this paper, we explore the potential advantages of using the Mahalanobis distance for DBRL. We illustrate our point for partially synthetic microdata and show that, in some cases, Mahalanobis DBRL can yield a very high re-identification percentage, far superior to the one offered by other record linkage methods.