scispace - formally typeset
Search or ask a question

Showing papers on "Mahalanobis distance published in 1996"


Journal ArticleDOI
01 Jan 1996
TL;DR: Experiments show that the HEC network leads to a significant improvement in the clustering results over the K-means algorithm with Euclidean distance, and indicates that hyperellipsoidal shaped clusters are often encountered in practice.
Abstract: We propose a self-organizing network for hyperellipsoidal clustering (HEC). It consists of two layers. The first employs a number of principal component analysis subnetworks to estimate the hyperellipsoidal shapes of currently formed clusters. The second performs competitive learning using the cluster shape information from the first. The network performs partitional clustering using the proposed regularized Mahalanobis distance, which was designed to deal with the problems in estimating the Mahalanobis distance when the number of patterns in a cluster is less than or not considerably larger than the dimensionality of the feature space during clustering. This distance also achieves a tradeoff between hyperspherical and hyperellipsoidal cluster shapes so as to prevent the HEC network from producing unusually large or small clusters. The significance level of the Kolmogorov-Smirnov test on the distribution of the Mahalanobis distances of patterns in a cluster to the cluster center under the Gaussian cluster assumption is used as a compactness measure. The HEC network has been tested on a number of artificial data sets and real data sets, We also apply the HEC network to texture segmentation problems. Experiments show that the HEC network leads to a significant improvement in the clustering results over the K-means algorithm with Euclidean distance. Our results on real data sets also indicate that hyperellipsoidal shaped clusters are often encountered in practice.

287 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that the previously recommended {p(n-1)/(n −p)}F p,n−p are unsuitable, and p(n 1) 2 F p, n−p−1 /n(n− p−1+pF p,n−P−1 ) are the correct critical values when searching for a single outlier.
Abstract: The Mahalanobis distance is a well-known criterion which may be used for detecting outliers in multivariate data. However, there are some discrepancies about which critical values are suitable for this purpose. Following a comparison with Wilks's method, this paper shows that the previously recommended {p(n-1)/(n-p)}F p,n−p are unsuitable, and p(n-1) 2 F p,n−p−1 /n(n−p−1+pF p,n−p−1 ) are the correct critical values when searching for a single outlier The importance of which critical values should be used is illustrated when searching for a single outlier in a clinical laboratory data set containing 10 patients and five variables. The jackknifed Mahalanobis distance is also discussed and the relevant critical values are given. Finally, upper bounds for the usual Mahalanobis distance and the jackknifed version are discussed.

192 citations


Journal ArticleDOI
TL;DR: A new framework for computing the Euclidean distance and weighted distance from the boundary of a given digitized shape is presented and an algorithm that calculates the geodesic distance transform on surfaces is presented.
Abstract: A new framework for computing the Euclidean distance and weighted distance from the boundary of a given digitized shape is presented. The distance is calculated with sub-pixel accuracy. The algorithm is based on a equal distance contour evolution process. The moving contour is embedded as a level set in a time varying function of higher dimension. This representation of the evolving contour makes possible the use of an accurate and stable numerical scheme, due to Osher and Sethian [22]. The relation between the classical shape from shading problem and the weighted distance transform is presented, as well as an algorithm that calculates the geodesic distance transform on surfaces.

131 citations


Journal ArticleDOI
TL;DR: This paper treats the use of more complex higher degree polynomial curves and surfaces of degree higher than 2, which have many desirable properties for object recognition and position estimation, and attack the instability problem arising in their use with partial and noisy data.
Abstract: We treat the use of more complex higher degree polynomial curves and surfaces of degree higher than 2, which have many desirable properties for object recognition and position estimation, and attack the instability problem arising in their use with partial and noisy data. The scenario discussed in this paper is one where we have a set of objects that are modeled as implicit polynomial functions, or a set of representations of classes of objects with each object in a class modeled as an implicit polynomial function, stored in the database. Then, given partial data from one of the objects, we want to recognize the object (or the object class) or collect more data in order to get better parameter estimates for more reliable recognition. Two problems arising in this scenario are discussed: 1) the problem of recognizing these polynomials by comparing them in terms of their coefficients; and 2) the problem of where to collect data so as to improve the parameter estimates as quickly as possible. We use an asymptotic Bayesian approximation for solving the two problems. The intrinsic dimensionality of polynomials and the use of the Mahalanobis distance are discussed.

92 citations


Patent
01 Apr 1996
TL;DR: In this paper, a motor current signal is monitored, and estimated motor torque is used to transform the current signal into a time-frequency spectra including a plurality of segments representative of good operating modes.
Abstract: During a learning stage a motor current signal is monitored, and estimated motor torque is used to transform the current signal into a time-frequency spectra including a plurality of segments representative of good operating modes. A representative parameter and a respective boundary of each segment is estimated. The current signal is monitored during a test stage to obtain test data, and the test data is compared with the representative parameter and the respective boundary of each respective segment to detect the presence of a fault in a motor. Frequencies at which broken bar faults are likely to occur in a motor can be estimated using the estimated motor torque, and a weighting function can highlight such frequencies during estimation of the parameter. The current signal can be further subdivided into the segments by monitoring sidebands of the frequency components of current spectrum strips of each segment. Estimating the parameter and the boundary of each segment can include calculating a segment mean (the representative parameter) and variance for each frequency component in each respective segment; calculating a modified Mahalanobis distance for each strip of each respective segment; and for each respective segment, using respective modified Mahalanobis distances to calculate a respective radius about a respective segment mean to define the respective boundary.

42 citations



Journal ArticleDOI
TL;DR: In this paper, the authors proposed a robust procedure based upon a recently proposed influence function, called the PROP function, which is resistant to masking and swamping, and works effectively in correctly identifying multiple univariate and multivariate outliers.

24 citations


Book ChapterDOI
01 Jan 1996
TL;DR: An iterative and a non-iterative method to calculate the estimates in a method-performance study are presented and a new method based on a score-function allows to characterise the performance of laboratories both as groups and individually.
Abstract: Interlaboratory analytical study is the general term of an experiment organised by a committee and involving several laboratories to achieve a common goal. Two important types of studies are the method-performance studies and the laboratory-performance studies. The purpose of a method-performance study is to determine the precision and bias characteristics of an analytical test method. A laboratory-performance study ascertains whether the laboratories conform to stated standards in their testing activities. An iterative and a non-iterative method to calculate the estimates in a method-performance study are presented and a new method based on a score-function allows to characterise the performance of laboratories both as groups and individually. This score is a squared Mahalanobis distance with robust estimates of means and covariances. For the latters’ determination the specific structure of the interlaboratory-test data is taken into account. Instructive graphical displays support the classification of the laboratories.

22 citations


Book ChapterDOI
20 Aug 1996
TL;DR: A novel framework for performing relational graph matching using genetic search that gauges relational consistency at both the symbolic and attribute levels and is Bayesian in origin.
Abstract: This paper describes a novel framework for performing relational graph matching using genetic search. The fitness measure is Bayesian in origin. It gauges relational consistency at both the symbolic and attribute levels. The basic measure of symbolic consistency is Hamming distance, while attribute consistency is measured using Mahalanobis distance. We provide examples of the performance on synthetic graphs containing significant levels of clutter. We also demonstrate that the technique is capable of resolving multiple graphs with significant overlap. The performance advantages over deterministic hill climbing are also demonstrated.

17 citations


Proceedings ArticleDOI
18 Nov 1996
TL;DR: The paper reports on a related supervised approach in which training sets are selected, then the fuzzy class memberships are determined by the reciprocal of the Mahalanobis distance from these training class means.
Abstract: Traditional 'hard' classification techniques are inappropriate for classifying remotely sensed imagery. Class 'boundaries' in the natural environment are not distinct and a single pixel may exhibit spectral characteristics related to a number of classes. Fuzzy set theory was introduced to address the issue of the 'vagueness' of class or set membership. An unsupervised approach to fuzzy classification uses the fuzzy c-means algorithm. The paper reports on a related supervised approach in which training sets are selected, then the fuzzy class memberships are determined by the reciprocal of the Mahalanobis distance from these training class means.

16 citations


01 Dec 1996
TL;DR: A new method that uses the weighted Mahalanobis distance via the covariance matrix of the individual clusters as the basis for grouping is presented in this thesis, and provides better results than the competing methods.
Abstract: : Cluster analysis is widely used in many applications, ranging from image and speech coding to pattern recognition. A new method that uses the weighted Mahalanobis distance (WMD) via the covariance matrix of the individual clusters as the basis for grouping is presented in this thesis. In this algorithm, the Mahalanobis distance is used as a measure of similarity between the samples in each cluster. This thesis discusses some difficulties associated with using the Mahalanobis distance in clustering. The proposed method provides solutions to these problems. The new algorithm is an approximation to the well-known expectation maximization (EM) procedure used to find the maximum likelihood estimates in a Gaussian mixture model. Unlike the EM procedure, WMD eliminates the requirement of having initial parameters such as the cluster means and variances as it starts from the raw data set. Properties of the new clustering method are presented by examining the clustering quality for codebooks designed with the proposed method and competing methods on a variety of data sets. The competing methods are the Linde-Buzo-Gray (LBG) algorithm and the Fuzzy c-means (FCM) algorithm, both of them use the Euclidean distance. The neural network for hyperellipsoidal clustering (HEC) that uses the Mahalnobis distance is also studied and compared to the WMD method and the other techniques as well. The new method provides better results than the competing methods. Thus, this method becomes another useful tool for use in clustering.

Patent
23 Oct 1996
TL;DR: In this article, a Mahalanobis space in a normal state is formed, and the data sets DS1 -DSq obtained from each sensor S 1 -Sq in normal time under different basic environment conditions are considered.
Abstract: PROBLEM TO BE SOLVED: To provide an environment monitoring system and an abnormality detecting method in which malfunction due to the delicate change of environment or the influence of noise or the like can be reduced, and exact abnormality detection can be attained at the time of detecting abnormality such as fire by using a Mahalanobis distance. SOLUTION: At the time of forming a Mahalanobis space in a normal state, not only (q) sets of basic data sets DS1 -DSq obtained from each sensor S1 -Sq in a normal time under (q) kinds of basic environment conditions E1 -Eq but also (w) sets of job spot reference data sets DSq+1 -DSq+w obtained from each sensor S1 -Sn in a normal time under (w) kinds of job spot environment conditions Eq+1 -Eq+w are considered. Then, the total (q+w) sets of data sets DS1 -DSq+w in a normal time of the (q) sets of basic data sets DS1 -DSq and the (w) sets of job spot reference data sets DSq+1 -DSq+w are defined as a reference data set, and the Mahalanobis space being a reference, that is, a reference Mahalanobis distance Dj is calculated.

Proceedings Article
01 Jan 1996
TL;DR: This study showed that the Weighted Euclidean distance measure performed best, and the Mahalanobis distance measure did not perform well.
Abstract: In this study, four distance measures were compared for text-independent speaker identitication using the long time statistical feature averaging method. The four methods were the City block, the Euclidean, the Weighted Euclidean, and the Mahalanobis distance measures. Identification decision was based on the minimum distance criterion. This study showed that the Weighted Euclidean distance measure performed best, and the Mahalanobis distance measure did not perform well. An explanation is advanced for this result.

Journal ArticleDOI
Xitao Fan1
TL;DR: In this paper, a SAS program checks for multivariate normality of data using a graphical plot of Mahalanobis distances versus chi-square values, which can be used to test the null hypothesis of multivariate kurtosis.
Abstract: This SAS program checks for multivariate normality of data. The program can be used in any version of the SAS system. Program input and output are both highly automated, so the users need to know very little about the SAS system or the SAS language. The program produces graphic plot of Mahalanobis distances versus chi-square values, the proportion of Mahalanobis distances exceeding the 50th percentile of chi-square distribution, and the normalized estimate for multivariate kurtosis that can be used to test the null hypothesis of multivariate normality.

Journal ArticleDOI
TL;DR: This paper introduces an approach to Euclidean distance transformation that achieves a better accuracy than D4, D8, or octagonal distance transformation and develops a fast method to compute the Euclideans distance transformation.
Abstract: In this paper we present a new thinning algorithm based on distance transformation. Because the choice of a distance measure will influence the result of skeletonization, we introduce an approach to Euclidean distance transformation that achieves a better accuracy than D4 , D8 , or octagonal distance transformation. We have developed a fast method to compute the Euclidean distance transformation. Using this technique, we can extract a reliable skeleton efficiently to represent a binary pattern. Our method works well on real images and compares favorably with other methods.

Proceedings ArticleDOI
01 Sep 1996
TL;DR: The robust performance of the equalizer is demonstrated for a hostile environment in the presence of CCI and non linearities, and it is compared against the performance ofThe MLSE and a symbol by symbol RBF equalizer.
Abstract: In this paper the equalization problem is treated as a classification task No specific (linear or nonlinear) model is required for the channel or for the interference and the noise Training is achieved via a supervised learning scheme Adopting Mahalanobis distance as an appropriate distance metric, decisions are made on the basis of minimum distance path The proposed equalizer operates on a sequence mode and implements the Viterbi searching Algorithm The robust performance of the equalizer is demonstrated for a hostile environment in the presence of CCI and non linearities, and it is compared against the performance of the MLSE and a symbol by symbol RBF equalizer Suboptimal techniques with reduced complexity are discussed The operation of the proposed equalizer in a blind mode is also considered

Journal ArticleDOI
TL;DR: A neural net program for pattern classification is presented, which includes an improved version of Kohonen's learning vector quantization (LVQ with training count) and feed-forward neural networks with back-propagation training.

Journal ArticleDOI
TL;DR: A new criterion for peak selection based on minimizing the classification error is presented and its application to image quantization is studied.

Proceedings ArticleDOI
20 May 1996
TL;DR: The authorse propose a method for using the covariance matrix of the individual clusters as the basis for grouping, using the Mahalanobis distance as a measure of similarity in each cluster.
Abstract: Vector quantization (VQ) is widely used in many applications, ranging from image and speech coding to pattern recognition. The authorse propose a method for using the covariance matrix of the individual clusters as the basis for grouping. In this algorithm, the Mahalanobis distance is used as a measure of similarity in each cluster. Properties of the new clustering method are presented by examining the clustering quality for codebooks designed with the proposed method and two competing methods on a variety of data sets. The competing methods are the Linde-Buzo-Gray (LBG) algorithm and the fuzzy c-means (FCM) algorithm using Euclidean distance. The new method provides better results than the competing methods for several data sets. Thus, this method becomes another useful tool for use in codebook design.

Book ChapterDOI
16 Jul 1996
TL;DR: In this work, multidimensional histograms were reduced to two dimensions using the Tree-Structured Self-Organizing Map, here called the Co-occurrence Map, and the highest classification accuracies were obtained using variance-equalized principal components of the co-occurring vectors.
Abstract: Textures can be described by multidimensional co-occurrence histograms of several pixel gray levels and then classified, e.g., with nearest-neighbors rules. In this work, multidimensional histograms were reduced to two dimensions using the Tree-Structured Self-Organizing Map, here called the Co-occurrence Map. The best components of the co-occurrence vectors, i.e., the spatial displacements minimizing the classification error were selected by exhaustive search. The fast search in the tree-structured maps made it possible to train about 14 000 maps during the feature selection. The highest classification accuracies were obtained using variance-equalized principal components of the co-occurrence vectors. Texture classification with our reduced multidimensional histograms was compared with classification using either channel histograms or standard co-occurrence matrices, which were also selected to minimize the classification error. In all comparisons, the multidimensional histograms performed better than the two other methods.

Journal ArticleDOI
TL;DR: The LVQ1 and OLVQ1 using TR1 produce better classification results regarding with average accuracy compared to the other methods, and produces excellent classification images which are more realistic and noiseless compared with the conventional methods.
Abstract: In this paper, we propose a method to apply a competitive neural network to land cover mapping of remote sensing data. This neural network is trained by the Learning Vector Quantization (LVQ) method. For the neural network, several weight vectors of neurons in the competitive layer represent a category. We employ LVQ1 and OLVQ1 in learning algorithms of LVQ. OLVQ1 is introduced in order to obtain high speed convergence compared with LVQ1. The three types of pattern distance functions and number of neurons are considered to find an optimal LVQ method. To evaluate the classification accuracy, the LVQ1 and OLVQ1, Maximum Likelihood Method, Back Propagation Method (3 layered neural network) and Nearest Neighbor method are considered. We classify LANDSAT TM data using TR1 or TR2 type training data where TR1 includes the same number of training data for each category, and TR2 is produced by picking up pixels on a grid in a certain processing area. As a result of this experiment, the OLVQ1 using the Mahalanobis' generalized distance and TR2 outperform the other methods with respect to overall accuracy despite a very small number of neurons, for example 24. The LVQ1 and OLVQ1 using TR1 produce better classification results regarding with average accuracy compared to the other methods. This method produces excellent classification images which are more realistic and noiseless compared with the conventional methods.

Proceedings ArticleDOI
25 Mar 1996
TL;DR: The successive feature elimination process and optimal feature selection method are successfully applied for the classification of UV-visible synchronous fluorescence spectra and is equally useful for large data set problems as it always partitions the problem into a set of two class problems.
Abstract: Pattern classification of UV-visible synchronous fluorescence of petroleum oils is performed using a composite system developed by the authors. The system consists of three phases, namely, feature extraction, feature selection and pattern classification. Each of these phases are briefly reviewed, focusing particularly on the feature selection method. Without assuming any particular classification algorithm the method extracts as much information (features) from spectra as conveniently possible and then applies the proposed successive feature elimination process to remove the redundant features. From the remaining features a significantly smaller, yet optimal, feature subset is selected that enhances the recognition performance of the classifier. The successive feature elimination process and optimal feature selection method are formally described. These methods are successfully applied for the classification of UV-visible synchronous fluorescence spectra. The features selected by the algorithm are used to classify twenty different sets of petroleum oils (the design set). A proximity index classifier using the Mahalanobis distance as the proximity criterion is developed using the smaller feature subset. The system was trained on the design set. The recognition performance on the design set was 100%. The recognition performance on the testing set was over 93% by successfully identifying 28 out of 30 samples in six classes. This performance is very encouraging. In addition, the method is computationally inexpensive and is equally useful for large data set problems as it always partitions the problem into a set of two class problems. The method further reduces the need for a careful feature determination problem which a system designer usually encounters during the initial design phase of a pattern classifier.


Book ChapterDOI
01 Jan 1996
TL;DR: A semi-fuzzy partition algorithm is introduced in order to take into account both the advantages of fuzzy and hard classification methods and keeps the information of mixed objects without losing the sharpness of the pure objects.
Abstract: Aim of the present paper is to introduce a semi-fuzzy partition algorithm in order to take into account both the advantages of fuzzy and hard classification methods. It keeps the information of mixed objects without losing the sharpness of the pure objects. The assignment rule of the objects to the classes, in fuzzy or in hard way, is based on the empirical distributions of the squared Mahalanobis distances of the objects from the baricenters (or prototypes) of each fuzzy class.

Book ChapterDOI
01 Jan 1996
TL;DR: In this article, the uncertainty involved in the estimation of the Mahalanobis distances at unsampled locations is characterized using stochastic simulation, a geostatistical technique that regards a surface as an outcome of a random function.
Abstract: Mathematically modeled thematic maps can relate productivity to geology and display the probability of occurrence of mineral deposits. Stochastic simulation, a geostatistical technique that regards a surface as an outcome of a random function, is used for nonparametric characterization of the uncertainty involved in the estimation of the Mahalanobis’ distances at unsampled locations. This avoids overly smooth maps and precludes the need for distributional assumptions to assess misclassification probabilities, problems encountered in earlier studies that described the relations between multivariate observations and rock type by distances and treated them as univariate variables.