scispace - formally typeset
Search or ask a question

Showing papers on "Mahalanobis distance published in 1992"


Journal ArticleDOI
Ali S. Hadi1
TL;DR: In this article, the authors propose a procedure for the detection of multiple outliers in multivariate data, where the data set is first ordered using an appropriately chosen robust measure of outlyingness, and then the data sets are divided into two initial subsets: a "basic" subset which contains p + 1 "good" observations and a "nonbasic" subsets which contain the remaining n -p -1 observations.
Abstract: SUMMARY We propose a procedure for the detection of multiple outliers in multivariate data. Let Xbe an n x p data matrix representing n observations onp variates. We first order the n observations, using an appropriately chosen robust measure of outlyingness, then divide the data set into two initial subsets: a 'basic' subset which containsp + 1 'good' observations and a 'nonbasic' subset which contains the remaining n -p - 1 observations. Second, we compute the relative distance from each point in the data set to the centre of the basic subset, relative to the (possibly singular) covariance matrix of the basic subset. Third, we rearrange the n observations in ascending order accordingly, then divide the data set into two subsets: a basic subset which contains the first p +2 observations and a non-basic subset which contains the remaining n -p -2 observations. This process is repeated until an appropriately chosen stopping criterion is met. The final non-basic subset of observations is declared an outlying subset. The procedure proposed is illustrated and compared with existing methods by using several data sets. The procedure is simple, computationally inexpensive, suitable for automation, computable with widely available software packages, effective in dealing with masking and swamping problems and, most importantly, successful in identifying multivariate outliers.

792 citations


Journal ArticleDOI
TL;DR: An efficient algorithm for evaluating the (weighted bipartite graph of) associations between two sets of data with Gaussian error, e.g., between a set of measured state vectors and aSet of estimated state vectors is described.
Abstract: An efficient algorithm for evaluating the (weighted bipartite graph of) associations between two sets of data with Gaussian error, e.g., between a set of measured state vectors and a set of estimated state vectors, is described. A general method is developed for determining, from the covariance matrix, minimal d-dimensional error ellipsoids for the state vectors which always overlap when a gating criterion is satisfied. Circumscribing boxes, or d-ranges, for the data ellipsoids are then found and whenever they overlap the association probability is computed. For efficiently determining the intersections of the d-ranges, a multidimensional search tree method is used to reduce the overall scaling of the evaluation of associations. Very few associations that lie outside the predetermined error threshold or gate are evaluated. The search method developed is a fixed Mahalanobis distance search. Empirical tests for variously distributed data in both three and eight dimensions indicate that the scaling is significantly reduced. Computational loads for many large-scale data association tasks can therefore be significantly reduced by this or related methods. >

111 citations


Journal ArticleDOI
TL;DR: In this article, a generalization of Wilks's single-outlier test for detecting from 1 to k outliers in a multivariate data set is proposed and appropriate critical values determined.
Abstract: A generalization of Wilks's single‐outlier test suitable for application to the many‐outlier problem of detecting from 1 to k outliers in a multivariate data set is proposed and appropriate critical values determined. The method used follows that suggested by Rosner employing sequential application of the generalized extreme Studentized deviate to univariate samples of reducing size, in which the type I error is controlled both under the hypothesis of no outliers and under the alternative hypothesis of 1, 2,. . ., k outliers. It is shown that critical values for the sequential application of Wilks's test to detect many outliers depend only on those for a single outlier test which may be approximated by percentage points from the F‐distributions as tabulated by Wilks. Relationships between Wilks's test statistic, the Mahalanobis distance between the ‘outlier’ and the mean vector, and Hotelling's T2‐test between the outlier and the rest of the data, are used to reduce the amount of computation involved in applying the sequential procedure. Simulations are used to show that the method behaves well in detecting multiple outliers in samples larger than about 25. Finally, an example with three dimensions is used to illustrate how the method is applied.

50 citations


Journal ArticleDOI
TL;DR: A decision model for the robot selection problem is proposed using both a robustified Mahalanobis distance analysis, i.e. a multivariate distance measure, and principal‐components analysis, and takes into consideration the fact that a robot′s performance, as specified by the manufacturer, is often unobtainable in reality.
Abstract: Industrial robots are increasingly used by many manufacturing firms. The number of robot manufacturers has also increased, with many of these firms now offering a wide range of robots. A potential user is thus faced with many options in both performance and cost. Proposes a decision model for the robot selection problem using both a robustified Mahalanobis distance analysis, i.e. a multivariate distance measure, and principal‐components analysis. Unlike most other models for robot selection, this model takes into consideration the fact that a robot′s performance, as specified by the manufacturer, is often unobtainable in reality. The robots selected by the proposed model become candidates for factory testing to verify manufacturers′ specifications. Tests the proposed model on a real data set and presents an example.

42 citations



Journal ArticleDOI
TL;DR: In this article, a combination of multivariate statistical and geostatistical techniques is used to assess the probability of occurrence of natural resources such as petroleum deposits, such as Petroleum deposits, can be assessed by a combination.
Abstract: The probability of occurrence of natural resources, such as petroleum deposits, can be assessed by a combination of multivariate statistical and geostatistical techniques. The area of study is partitioned into regions that are as homogeneous as possible internally while simultaneously as distinct as possible. Fisher's discriminant criterion is used to select geological variables that best distinguish productive from nonproductive localities, based on a sample of previously drilled exploratory wells. On the basis of these geological variables, each wildcat well is assigned to the production class (dry or producer in the two-class case) for which the Mahalanobis' distance from the observation to the class centroid is a minimum. Universal kriging is used to interpolate values of the Mahalanobis' distances to all locations not yet drilled. The probability that an undrilled locality belongs to the productive class can be found, using the kriging estimation variances to assess the probability of misclassification. Finally, Bayes' relationship can be used to determine the probability that an undrilled location will be a discovery, regardless of the production class in which it is placed. The method is illustrated with a study of oil prospects in the Lansing/Kansas City interval of western Kansas, using geological variables derived from well logs.

20 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions, and obtained the probabilities of correct classification were obtained for these discriminant analysis before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) EM algorithm.
Abstract: We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑1=∑2, (2) multivariate normal with ∑1≠∑2 and (3) multivariate non-normal with ∑1=∑2. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement.

14 citations


Proceedings ArticleDOI
15 Jun 1992
TL;DR: Two problems pertinent to using implicit higher degree polynomials in real-world robust systems are dealt with: characterization and fitting algorithms for the subset of these algebraic curves and surfaces that is bounded and exists largely in the vicinity of the data.
Abstract: Two problems pertinent to using implicit higher degree polynomials in real-world robust systems are dealt with: (1) characterization and fitting algorithms for the subset of these algebraic curves and surfaces that is bounded and exists largely in the vicinity of the data; (2) a Mahalanobis distance for comparing the coefficients of two polynomials, to determine whether the curves or surfaces that they represent are close over a specified region. These tools make practical use of geometric invariants for determining whether one implicit polynomial curve or surface is a rotation, translation, or an affine transformation of another. The approach is ideally suited to smooth curves and smooth curved surfaces that do not have detectable features. >

12 citations


Book ChapterDOI
01 Jan 1992
TL;DR: The “Regularized Nearest Cluster Method” is presented, an efficient and versatile technique of discrimination, well adapted to this kind of data.
Abstract: This paper contains three parts. The first part consists of a brief review of the discrimination techniques used when dealing with large arrays of sparse qualitative data. The second part presents the “Regularized Nearest Cluster Method”, an efficient and versatile technique of discrimination, well adapted to this kind of data. This technique is compared to some other existing methods likely to be used in similar contexts. The third part briefly discusses the interest of these methods in the domain of textual data analysis.

11 citations


Patent
20 May 1992
TL;DR: In this paper, a method for discerning whether an object to be inspected is acceptable or not is based on feature values with respect to a binary-coded image of the object.
Abstract: A method for discerning whether an object to be inspected is acceptable or not is based on feature values with respect to a binary-coded image of the object. The method includes the steps of coding image data of the object into binary digits to obtain the binary-coded image, calculating at least three feature values based on a predetermined sample group of acceptable objects and a predetermined sample group of unacceptable objects, obtaining a Mahalanobis' generalized distance between the sample groups of the acceptable objects and the unacceptable objects with respect to each of the calculated feature values, comparing each of the distances with a first predetermined value and then selecting as a first representative feature value the distance which is not smaller than the first predetermined value, obtaining a Mahalanobis' generalized distance between groups of acceptable objects and unacceptable objects with respect to the feature values except for the feature value selected as the first representative feature value and the first representative feature value, and comparing each of the distances with a second predetermined value and then selecting as a second representative feature value the distance which is not smaller than the second predetermined value, so that it is discerned whether the object is acceptable or not based on the first and/or first and second feature values with respect to the binary-coded image of the object.

11 citations


Journal ArticleDOI
TL;DR: Two Mahalanobis distance-based criteria are proposed for feature evaluation and are expected to perform better than the direct use of the MahalanOBis distance in a multiclass pattern recognition problem.

Proceedings ArticleDOI
10 May 1992
TL;DR: An adaptive recognition system that is based on self-organization that estimates the cluster distribution of given data and recognizes an unknown input datum at the same time and the adaptability and the generalizability of the clustering and recognition are explored.
Abstract: An adaptive recognition system that is based on self-organization is proposed. The method estimates the cluster distribution of given data and recognizes an unknown input datum at the same time. The clustering/recognizing of a given characteristic vector is based on the Mahalanobis distance. By using adaptation, it is possible to reconstruct the cluster set suitable for the given characteristic data even if the distribution of these data changes with time. It is also shown that the total number of nodes can be minimized by using the rules of node merging. The adaptability and the generalizability of the clustering and recognition are explored. >

Journal ArticleDOI
TL;DR: The neural net architecture is based on an improved version of Kohonen's learning vector organization: learning vector quantization with training count, where the number of times a neuron is trained by input patterns of each class is stored in newly introduced training counters.

Book ChapterDOI
01 Jan 1992
TL;DR: The method attempts to address the shortcomings of traditional time alignment approaches, commonly based on dynamic programming algorithms, by employing the branch and bound search algorithm coupled with the Mahalanobis distance measure as the matching criterion.
Abstract: In this paper, a new method for dynamic time alignment of speech waveforms is introduced. The method attempts to address the shortcomings of traditional time alignment approaches, commonly based on dynamic programming algorithms. Such methods, usually called dynamic time warping (DTW) algorithms, make the assumption that the samples of the speech waveform under consideration are statistically independent. The proposed method makes no such assumption. Instead, the method is based on models of speech entities with Gaussian distributions and general covariance matrices. These ideas are implemented by employing the branch and bound search algorithm [1] coupled with the Mahalanobis distance measure as the matching criterion. Hence, the new method attempts to utilise more discriminatory information than is presently incorporated. Preliminary results on a spoken letter recognition problem are reported validating the approach.

Dissertation
01 Jan 1992
TL;DR: In this article, the authors proposed an iterative technique for detecting and identifying outliers based on Mahalanobis distances computed from sub-samples of the observations, referred to as the Seemingly Unrelated Regressions/Constructed Variable (SURCON) analysis.
Abstract: The classical multivariate theory has been largely based on the multivariate normal distribution (MVN): the scarcity of alternative models for the meaningful and consistent analysis of multiresponse data is a well recognised problem. Further, the complexity of generalising many non-normal univariate distributions makes it undesirable or impossible to use their multivariate versions. Hence, it seems reasonable to inquire about ways of transforming the data so as to enable the use of more familiar statistical techniques that are based implicitly or explicitly on the normal distribution. Techniques for developing data-based transformations of univariate observations have been proposed by several authors. However, there is only one major technique in the multivariate (p-variable) case by Andrews et. al. [1971]. Their approach extended the power transformations proposed by Box & Cox [1964] to the problem of estimating power transformations of multiresponse data so as to enhance joint normality. The approach estimates the vector of transformation parameters by numerically maximising the log-likelihood function. However, since there are several parameters to be estimated, p(p+5)/2 for multivariate data without regression, the resulting maximisation is of high dimension, even with modest values of p and sample size n. The purpose of the thesis is to develop computationally simpler and more informative statistical procedures which are incorporated in a package. The thesis is in three main parts: - A proposed complementary procedure to the log-likelihood approach which attempts to reduce the size of the computational requirements for obtaining the estimates. Though computational simplicity is the main factor, the statistical qualities of the estimates are not compromised, indeed the estimated values are numerically identical to those of the log-likelihood. Further, the procedure implicitly produces diagnostic statistics and some useful statistical quantities describing the structure of the data. The technique is a generalisation of the constructed variables method of obtaining quick estimates for transformation parameters [Atkinson 1985]. To take into account the multiresponse nature of the data and, hence, joint estimates, a seemingly unrelated regression is carried out. The algorithm is iterative. However, there is considerable savings in the number of iterations required to converge to the maximum likelihood (MLE) estimates compared to those using the log-likelihood function. The technique is refered to as the Seemingly Unrelated Regressions/Constructed Variable (SURCON) analysis, and the estimates obtained are the Surcon estimates. - The influence of individual observations on the need for transformations is quite crucial and, hence, it is necessary to investigate the data for any spurious or suspicious observations, outliers. The thesis also proposes an iterative technique for detecting and identifying outliers based on Mahalanobis distances computed from sub-samples of the observations. The results of the analysis are displayed in a graphical summary called the Stalactite Chart, hence, the analysis is refered to as the Stalactite Analysis. - The development of a userfriendly microcomputer-based statistical package which incorporates the above techniques. The package is written in the C programming language.

Dissertation
03 Jul 1992
TL;DR: In this paper, the authors proposed a distance-based regression and discrimination method based on the Euclidean representation of data, which can be interpreted in terms of the principal coordinate matrix.
Abstract: Distance Based (DB) Regression and Discrimination methods, proposed by Cuadras, give statistical predictions by exploiting geometrical properties of a Euclidean representation obtained from distances between observations. They are adequate to deal with mixed variables. Choice of a suitable distance function is a critical step. Some "standard" functions, however, fit a wide range of problems, and particularly the Absolute Value distance. This is explained showing that for �n� equidistant points on the real line, elements in the �j�-th row of the principal coordinate matrix are values of a �j�-th degree polynomial function. For arbitrary one-dimensional sets of points a qualitatively analogous result holds. Using results from the theory of random processes, a sequence of random variables is obtained from a continuous uniform distribution on the (0, 1) interval. Their properties show that they deserve the name of "Principal Coordinates". The DB prediction scheme in this case provides a goodness-of-fit measuring technique. DB discriminant functions are evaluated from distances between observations. They have a simple geometrical interpretation in the Euclidean representation of data. For parametric models, distances can be derived from the Differential Geometry of the parametric manifold. Several DB discriminant functions are computed using this approach. In particular, for multinomial variables they coincide with the classic Pearson�s Chi Square statistic, and for Normal variables, Fisher's linear discriminant function is obtained. A distance between populations generalizing Mahalanobis' is obtained as a Jensen difference from distances between observations. It can be interpreted in terms of the Euclidean representation. Using Multidimensional Scaling, it originates a Euclidean representation of populations which generalizes the classical Canonical Analysis. Several issues concerning implementation of DB algorithms are discussed, specially difficulties related to the huge dimension of objects involved.

Book ChapterDOI
01 Jan 1992
TL;DR: Respiratory sounds of pathological and healthy subjects were analyzed via autoregressive (AR) models with a view to construct a diagnostic aid based on auscultation, and two reference libraries were built.
Abstract: Respiratory sounds of pathological and healthy subjects were analyzed via autoregressive (AR) models with a view to construct a diagnostic aid based on auscultation. Using the AR vectors, two reference libraries, pathological and healthy, were built. Two classifiers using Mahalanobis distance measure and minimum distance classification method, and Itakura distance measure and k-nearest neighbor (k-NN) classification method were designed and compared. Performances of the classifiers were tested for different model orders.

Proceedings ArticleDOI
01 Nov 1992
TL;DR: The discussion that follows details the algorithmic approach for the entire system including image acquisition, object segmentation, feature extraction, and pattern classification.
Abstract: A method for recognizing closed containers based on features extracted from their circular tops is presented. The approach developed consists of obtaining images from two spatially separated cameras that utilize both diffuse and specular light sources. The images thus obtained are used to segment target objects from the background and to extract representative features. The features utilized consist of container height as computed using stereopsis as well as the mean, variance, and second central moments of the intensities of the segmented caps. The recognition procedure is based on a minimum distance Mahalanobis classifier which takes feature covariance into account. The discussion that follows details the algorithmic approach for the entire system including image acquisition, object segmentation, feature extraction, and pattern classification. Result of test runs involving sets of several hundred training samples and untrained samples are presented.© (1992) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Book ChapterDOI
01 Jan 1992
TL;DR: In this paper, it was shown that multivariate analysis of variance (MANOVA) can be viewed as a special case of multivariate linear regression, where the explanatory variables represent design variables.
Abstract: The first part of this chapter extends Chapter 7 by specializing the multivariate linear regression model to the case where the explanatory variables represent design variables In the same manner that ANOVA is a special case of multiple regression, we see here that multivariate analysis of variance (MANOVA) can be viewed as a special case of multivariate linear regression

Book ChapterDOI
01 Jan 1992
TL;DR: This chapter presents a method based on the hypothesize-and-verify paradigm to register two sets of 3D line segments obtained from stereo and to Computerte the transformation (motion) between them.
Abstract: We present in this chapter a method based on the hypothesize-and-verify paradigm to register two sets of 3D line segments obtained from stereo and to Computerte the transformation (motion) between them. We assume that the environment is static and that it is only the stereo rig that has moved. The multiple-object-motions problem is dealt with in Chap. 9.

Proceedings ArticleDOI
26 Oct 1992
TL;DR: It is shown that a previously published algorithm for nonlinear equalization involving the measurement of the Mahalanobis distance makes the assumption that the clusters in the underlying observation space have a Gaussian distribution, and is capable of generating good approximations to the theoretical optimum decision boundary.
Abstract: It is shown that a previously published algorithm by C.F.N. Cowan (see Proc. 25th Asimolar Conf. on Signals, Systems and Computers, Pacific Grove, CA, USA, IEEE, 1991), for nonlinear equalization involving the measurement of the Mahalanobis distance makes the assumption that the clusters in the underlying observation space have a Gaussian distribution. If this assumption is violated, poor performance may be obtained. However, it is shown that the equalizer structure is capable of generating good approximations to the theoretical optimum decision boundary. It is the use of the Mahalanobis distance which is inappropriate in the non-Gaussian case. By using a more general concept of distance, it is demonstrated that it is possible to obtain significantly better results than those obtained using the Mahalanobis distance measure. The new method and previous algorithms are also extended to cover the case of multilevel transmitted signals. >

Journal ArticleDOI
TL;DR: A method for 2D temporal tracking of line segments in a monocular sequence of images starts with a temporal matching phase which consists of forecasting the future system state and ends with the spatial matching which consist of finding the most probable matching among the segments present in the search area.

01 Jan 1992
TL;DR: This paper shows that the equaliser structure is capable of generating good approximations to the theoretical optimum decision boundary and demonstrates that it is possible to obtain significantly better results than those obtained using the Mahalanobis distance measure.
Abstract: the filter is given by the convolution An algorithm for non-linear equalisation involving the measurement of Mahalanobis distance has previously been published [l]. In this paper, we show that this algorithm makes the assumption that the clusters in the underlying observation space have a Gaussian distribution. If this assumption is violated, poor performance may be obtained. However, we show that the equaliser structure is capable of generating good approximations to the theoretical optimum decision boundary. It is the use of Mahalanobis distance which is inappropriate in the non-Gaussian case. By using a more general concept of distance, we demonstrate that it is possible to obtain significantly better results than those obtained using the Mahalanobis distance measure. The new method and previous algorithms are also extended to cover the case of a multi-level iransmatted signals.