scispace - formally typeset
Search or ask a question

Showing papers on "Mahalanobis distance published in 1990"


Journal ArticleDOI
TL;DR: This work proposes to compute distances based on very robust estimates of location and covariance, better suited to expose the outliers in a multivariate point cloud, to avoid the masking effect.
Abstract: Detecting outliers in a multivariate point cloud is not trivial, especially when there are several outliers. The classical identification method does not always find them, because it is based on the sample mean and covariance matrix, which are themselves affected by the outliers. That is how the outliers get masked. To avoid the masking effect, we propose to compute distances based on very robust estimates of location and covariance. These robust distances are better suited to expose the outliers. In the case of regression data, the classical least squares approach masks outliers in a similar way. Also here, the outliers may be unmasked by using a highly robust regression method. Finally, a new display is proposed in which the robust regression residuals are plotted versus the robust distances. This plot classifies the data into regular observations, vertical outliers, good leverage points, and bad leverage points. Several examples are discussed.

1,419 citations


Journal ArticleDOI
TL;DR: The development and implementation of a line segment-based token tracker that combines prediction and matching steps and is illustrated in several experiments that have been carried out considering noisy synthetic data and real scenes obtained from the INRIA mobile robot.

287 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the most surprising observation must lie at one of the vertices of the convex hull and that the observation with the maximum Mahalanobis distance from the sample mean must lie on the concave hull.
Abstract: SUMMARY The conditional predictive ordinate (CPO) is a Bayesian diagnostic which detects surprising observations. It has been used in a variety of situations such as univariate samples, the multivariate normal distribution and regression models. Results are presented about the most surprising observation which has minimum CPO. For the multivariate normal distribution it is shown that the most surprising observation must lie at one of the vertices of the convex hull. It is also shown that the observation with maximum Mahalanobis distance from the sample mean must lie on the convex hull. Results are given for the expected number of vertices on the convex hull when the sample is contaminated. An alternative, closely related diagnostic, the ratio ordinate measure, is presented. A numerical comparison of the two measures is given.

151 citations


Book ChapterDOI
01 Apr 1990
TL;DR: A tracking approach that combines a prediction and a matching steps is presented that will be illustrated in several experiments that have been carried out considering noisy synthetic data and real scenes obtained from the INRIA mobile robot.
Abstract: This paper describes the development and the implementation of a line segments based token tracker. Given a sequence of time-varying images, the goal is to track line segments corresponding to the edges extracted from the image being analyzed. We will present a tracking approach that combines a prediction and a matching steps. The prediction step is a Kalman filtering based approach that is used in order to provide reasonable estimates of the region where the matching process has to seek for a possible match between tokens. Correspondence in the search area is done through the use of a similarity function based on Mahalanobis distance between attributs carefully chosen of the line segments. The efficiency of the proposed approach will be illustrated in several experiments that have been carried out considering noisy synthetic data and real scenes obtained from the INRIA mobile robot.

131 citations


Journal ArticleDOI
TL;DR: In this paper, the authors identify influential observations in univariate autoregressive integrated moving average time series models and measure their effects on the estimated parameters of the model, and the sensitivity of the parameters to the presence of either additive or innovational outliers is analyzed.
Abstract: This article studies how to identify influential observations in univariate autoregressive integrated moving average time series models and how to measure their effects on the estimated parameters of the model. The sensitivity of the parameters to the presence of either additive or innovational outliers is analyzed, and influence statistics based on the Mahalanobis distance are presented. The statistic linked to additive outliers is shown to be very useful for indicating the robustness of the fitted model to the given data set. Its application is illustrated using a relevant set of historical data.

92 citations


Journal ArticleDOI
TL;DR: In this article, principal component analysis of near-infrared reflectance (NIR) spectra is used for the calculation of Mahalanobis distances and for the construction of soft independent modeling of class analogy (SIMCA) classification models.
Abstract: Principal component analysis of near-infrared reflectance (NIR) spectra is used for the calculation of Mahalanobis distances and for the construction of soft independent modeling of class analogy (SIMCA) classification models. The complementary behavior of these two classification methods is discussed and a new classification rule based on a combination of the two methods is described. The application of NIR spectroscopy and the pattern recognition technique for identifying and classifying raw materials used in pharmaceutical industry is also discussed

80 citations


Journal ArticleDOI
TL;DR: The concept of multivariate classification of geological objects can be combined with the concept of regionalized variables to yield a procedure for typification of geological objects, such as rock units, well records, or samples.
Abstract: The concept of multivariate classification of “geological objects” can be combined with the concept of regionalized variables to yield a procedure for typification of geological objects, such as rock units, well records, or samples. Numerical classification is followed by subdivision of the area of investigation, and culminates in a regionalization or mapping of the classification onto the plane. Regions are subdivisions of the map area which are spatially contiguous and relatively homogeneous in their geological properties. The probability of correct classification of each point within a region as being part of that region can be assessed in terms of Bayesian probability as a space-dependent function. The procedure is applied to subsurface data from western Kansas. The geologic properties used are quantitative variables, and relationships are expressed by Mahalanobis' distances. These functions could be replaced by other metrics if qualitative or binary data derived from geological descriptions or appraisals were included in the analysis.

57 citations


Proceedings ArticleDOI
04 Dec 1990
TL;DR: A feedforward approach which combines prediction and correction is presented, and the equations are given and discussed in the case of small rigid motions.
Abstract: A description is given of the development and implementation of a line segments based module of recovery of self-motion and 3-D structure. Given a monocular sequence of time-varying images and 2-D-lines matches in this sequence, the goal is to recover the camera motion and the 3-D-lines structure. A feedforward approach which combines prediction and correction is presented, and the equations are given and discussed in the case of small rigid motions. Algorithms, based on the minimization of the Mahalanobis distance between two estimates, are given and their implementations carefully discussed. The efficiency of the proposed approach is illustrated through experimental tests carried out considering noisy synthetic data and a real indoor scene. >

38 citations


01 Jan 1990
TL;DR: In this paper, the spectral reflectance of peach surface defects (bruise, cut, scar, scale, brown rot, wormhole) in the 350 to 1200 nm range was measured.
Abstract: The spectral reflectance of peach surface defects (bruise, cut, scar, scale, brown rot, wormhole) in the 350 to 1200 nm range was measured. Sorting criteria which were suitable for use with a machine vision system were developed and evaluated based on their potential for discriminating between defective and normal peach surface. Separability was calculated in terms of the Mahalanobis distance between the two classes. Sorting criteria which were based on ratios of wave-lengths had good separability for some defects, but multivariate sorting criteria with at least one unnormalized component were required for good class discrimination for all peach defect types at a fixed set of wavelengths. A three-wave-length criterion with spectral sensitivities centered at 650, 720, and 815 nm had the highest Mahalanobis distances for the types of peach defects tested.

23 citations


Journal ArticleDOI
TL;DR: In this paper, the detection of multivariate outliers (anomalous observations) by Mahalanobis distance calculation was carried out on the surface-rock geochemical data from the sheeted-vein tin mineralization in the Emmaville district.

18 citations


Journal Article
TL;DR: Assessment of the population structure on the island of Hvar is based on the analysis of Malécot's isolation by distance model and the results are interpreted within the context of microevolutionary theory and the population's ethnohistorical background.
Abstract: Assessment of the population structure on the island of Hvar is based on the analysis of Malecot's isolation by distance model. We have tested the fit of the model by regression analysis of different measures of similarity (genetic kinship) and distance [Hamming's HSM distance for linguistic data; Mahalanobis's D2 distances for anthropometric head and body dimensions, radiogrammetric dimensions of metacarpal bones, physiological (cardiorespiratory) traits, and quantitative dermatoglyphic properties of the digitopalmar complex; and Edwards's E2 for frequencies of erythrocyte antigens]. Good fit of the model for linguistic and anthropometric data, which was demonstrated in previous studies on other eastern Adriatic populations, is confirmed. We compared parameters of the model with those already published for various populations in the eastern Adriatic and other parts of the world. We also evaluated the pattern of correlations between different measures of geographic, biological, and sociocultural distances through principal components analysis and interpreted the results within the context of microevolutionary theory and the population's ethnohistorical background.

PatentDOI
TL;DR: In this paper, a normalized cumulative distance is calculated by measuring a difference between input power and average power and the power difference is weighted by a coefficient (λ) between 0 and 1, and a Mahalanobis distance is then weighted by (1-λ) and added to the weighted power difference.
Abstract: Speech recognition is achieved using a normalized cumulative distance. A normalized Dynamic Programming (DP) value is calculated by dividing a cumulative path distance by an optimal integral path length. The path length is calculated iteratively by adding 2 if the warping path is diagonal or by adding 3 if the warping path is horizontal or vertical. Distance may be calculated by measuring a difference between input power and average power. The power difference is weighted by a coefficient (λ) between 0 and 1. A Mahalanobis distance is then weighted by (1-λ) and added to the weighted power difference.

Journal ArticleDOI
Masayuki Honds1
TL;DR: In this paper, the error rates in linear discriminant analysis were examined both in normal and in non-normal situations, under the assumption that two population distributions were characterized by a mixture of two multivariate normal distributions, and the bootstrap bias-corrected apparent error rate compares favorably to other available estimators for nonnormal populations with small Mahalanobis distance.
Abstract: The parametric and nonparametric methods for estimating the error rates in linear discriminant analysis are examined both in normal and in nonnormal situations. A Monte Carlo experiment was carried out under the assumption that two population distributions were characterized by a mixture of two multivariate normal distributions. The bootstrap bias-corrected apparent error rate compares favourably to other available estimators for nonnormal populations with small Mahalanobis distance. The methods for error estimation are also applied to a practical problem in medical diagnosis

Journal ArticleDOI
TL;DR: In this article, a metric which is a linear combination of the Mahalanobis distance in the subspace of the first k components and the euclidean distance in its orthogonal complement is proposed.
Abstract: The use of principal components to reduce the number of dimensions is an optimum procedure for data representation, but may involve the loss of valuable information for discriminant analysis. In this paper a simple approach is proposed to improve the discrimination based on a given number (k) of principal components, without requiring the calculation of additional ones. This is achieved by introducing a metric which is a linear combination of the Mahalanobis distance in the subspace of the first k components and the euclidean distance in its orthogonal complement. This concept, applied to Fisher's linear discriminant function, yields an optimum combination of the two distances mentioned. The empirical performance of this procedure for estimated parameters is investigated by a simulation study. Some suggestions for further extensions of this method to nonlinear discrimination are briefly discussed.


Journal ArticleDOI
TL;DR: The problem of constructing sequential tests for fault detection methods that are based on the concept of Mahalanobis distance is addressed in this paper, where the average sample number and an upper limit for the number of samples necessary that the test will terminate with a given probability are calculated.

Journal ArticleDOI
TL;DR: In this article, the empirical influence function for Mahalanobis distance and for misclassification rates is presented for discriminant analysis with two multivariate normal populations, following Campbell (1978).
Abstract: The empirical influence function for Mahalanobis distance and for misclassification rates are presented for discriminant analysis with two multivariate normal populations, following Campbell (1978). Conclusions about the effects of outliers from the empirical influence function are contrasted with exact calculations for four simple cases. These cases demonstrate that the higher-order terms discarded in deriving the empirical influence function can be important in practical problems.

Journal ArticleDOI
TL;DR: An alternative similarity measure based on the metric tensor measure (MTM) is introduced and two standard clustering strategies are tested with the proposed similarity measure: hierarchical clustering and the K‐median method.
Abstract: The results of unsupervised pattern recognition methods are critically dependent on the measure of similarity used for clustering objects. There is little a priori information available on the relative utility of various similarity measures. We introduce here an alternative similarity measure based on the metric tensor measure (MTM). Two standard clustering strategies are tested with the proposed similarity measure: hierarchical clustering and the K-median method. As data we use the ARCH obsidian data, a data set on Hungarian coal, and trace element data on Hungarian paprika. Differences from the Mahalanobis distance measure are described for intraclass relations.

Journal ArticleDOI
TL;DR: In this article, the authors argue that the type of Mahalanobis distance where the covariance matrix is based on the scores of the pair of samples to be compared should also have been considered.
Abstract: After a methodological introduction (section 1) it is indicated why Cherry et al. (1982) were not fair in suggesting that Mahalanobis distance is less reliable than other simple characteristics of morphological distance. The argument is that these authors used the type of Mahalanobis distance where the covariance matrix is based on the scores of the pair of samples to be compared. They should also have considered the much less unreliable type, where the same covariance matrix is used for all pairwise comparisons. This, at least, would not have displayed violations of the triangle inequality (section 2). The methodological discussions in the rest of the paper give a sense of the various uncertainties the statistician and the physical anthropologist are confronted with when trying to find solutions for outstanding anthropological problems. As a whole, the discussion is more philosophical than solution-oriented. The physical anthropologist and the statistician need to cut their way through the forest of uncertainties when a concrete problem is tackled. To do this it may be useful if various arguments have been listed, even if they are somewhat contradictory.

01 Jan 1990
TL;DR: In this article, the authors identify influential observations in univariate autoregressive integrated moving average time series models and measure their effects on the estimated parameters of the model, and the sensitivity on the parameters to the presence of either additive or innovational outliers is analyzed.
Abstract: This article studies how to identify influential observations in univariate autoregressive integrated moving average time series models and how to measure their effects on the estimated parameters of the model. The sensitivity on the parameters to the presence of either additive or innovational outliers is analyzed, and influence statistics based on the Mahalanobis distance are presented. The statistic linked to additive outliers is shown to be very useful for indicating the robustness of the fitted model to the given data set. Its application is illustrated using a


Proceedings ArticleDOI
01 Aug 1990
TL;DR: It is demonstrated that the polynomial classifier and high-order neural network can be equated thereby implying that the classification power of the multilayer perceptron can be achieved while retaining the ease of training advantages of the poynomial classifiers.
Abstract: In this study we consider a family of polynomial classifiers and compare the performance of these classifiers to the Mahalanobis Distance classifier and to two types of artificial neural networks- -multilayer perceptrons and high-order neural networks. The well-known Mahalanobis Distance classifier is based on the assumption that the underlying probability distributions are Gaussian. The neural network classifiers and polynomial classifiers make no assumptions regarding underlying distributions. The decision boundaries of the polynomial classifier can be made to be arbitrarily nonlinear corresponding to the degree of the polynomial hence comparable to those of the neural networks. Further we describe both iterative gradient descent and batch procedures by which the polynomial classifiers can be trained. These procedures provide much faster training than that achievable for multilayer perceptrons trained via backpropagation. We demonstrate that the polynomial classifier and high-order neural network can be equated thereby implying that the classification power of the multilayer perceptron can be achieved while retaining the ease of training advantages of the polynomial classifiers. 1.© (1990) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Book ChapterDOI
01 Jan 1990
TL;DR: In this article, the Bayes decision rule is applied to classify the samples using the characteristic features of the training data, and the classification results are presented as a single map showing the maximal probability of all models.
Abstract: The objective of this study is to detect and identify regional geochemical patterns resembling given models defined by local training areas. The data consist of ICP-analyses on till samples (fine fraction) collected for the Geochemical Atlas of Finland and the Nordkalott project. The samples are composited with the resultant sampling density being only 1 sample/300 km2. Total and partial leaching are employed to yield two sets of 34 variables. Supervised learning with nonparametric class-conditional probability functions and the Bayes decision rule is applied to classify the samples using the characteristic features of the training data. The variables are selected according to analytical quality. Outliers are screened out to avoid spurious correlations. The total information of the two sets of variables is compressed into 10 new factors, by factor analysis, and 2 variables, Mahalanobis distances, indicating the rarity of each sample. The 25 training areas are located at known ore occurrences in Finland and the Nordkalott area of Norway and Sweden. The classification results are presented as a single map showing the maximal probability of all models. Some expected patterns are shown on the maps as well as zones not revealed on single element maps.

Book ChapterDOI
H. Mizuno1
01 Jan 1990
TL;DR: In this article, a simple statistical approach is applied to the repeated EDM of 1,383 lines, the average time interval of which is about 8 years, to obtain knowledge about the accuracy of EDM and possible accumulation of strain.
Abstract: A simple statistical approach is applied to the repeated EDM of 1,383 lines, the average time interval of which is about 8 years, to obtain knowledge about the accuracy of EDM and possible accumulation of strain The ratio d of a measured distance change to the distance is computed for each of these lines This relative change in distance, d, shows a trend of decrease in magnitude with increasing line length Then, the data of d are classified into three distance ranges, D ≤ 10krn, 10km 20km The histograms of d illustrated for each of the distance ranges suggest that d has a normal distribution with non-zero mean The χ 2 test applied to the data of d provides support to this view The standard deviation of d for a shorter distance range is larger than that for a longer distance range The expression of random error in EDM divided by distance D; (a + bD)/D, is applied to describe the dependence of the standard deviation of d on D The constant a is found to be about 10mm or more This gives error in the phase difference determination of EDM The quantity b is obtained as 1mm/km or less This should provide information about distant-dependent random error and possible accumulation of strain But it can be fully understood only in terms of error in atmospheric correction The rate of horizontal crustal deformation must be much smaller in general than that so far estimated, even though the average interval between repeated measurements is short