scispace - formally typeset
Search or ask a question

Showing papers on "Mahalanobis distance published in 1998"


Proceedings Article
01 Dec 1998
TL;DR: A new algorithm is presented, based on the multiresolution kd-trees of [5], which dramatically reduces the cost of EM-based clustering, with savings rising linearly with the number of datapoints.
Abstract: Clustering is important in many fields including manufacturing, biology, finance, and astronomy. Mixture models are a popular approach due to their statistical foundations, and EM is a very popular method for finding mixture models. EM, however, requires many accesses of the data, and thus has been dismissed as impractical (e.g. [9]) for data mining of enormous datasets. We present a new algorithm, based on the multiresolution kd-trees of [5], which dramatically reduces the cost of EM-based clustering, with savings rising linearly with the number of datapoints. Although presented here for maximum likelihood estimation of Gaussian mixture models, it is also applicable to non-Gaussian models (provided class densities are monotonic in Mahalanobis distance), mixed categorical/ numeric clusters. and Bayesian methods such as Antoclass [1].

199 citations


Book ChapterDOI
01 Jan 1998
TL;DR: Multivariate statistical methods based on Mahalanobis D 2 enable robust rejection of erroneous source assignments and lead to sourcing of artifacts with a very high degree of confidence.
Abstract: A systematic approach to sample collection, chemical analysis, and statistical evaluation of obsidian sources is recommended before significant numbers of artifacts should be analyzed. Multivariate statistical methods based on Mahalanobis D 2 enable robust rejection of erroneous source assignments and lead to sourcing of artifacts with a very high degree of confidence. These multivariate procedures can also assist in identifying and evaluating abbreviated analytical methods that are more rapid and less expensive. Test cases based on obsidian data from sources in Mesoamerica are presented.

195 citations


Proceedings ArticleDOI
25 Jan 1998
TL;DR: In this article, a method for extracting environment features from 1D range data and their interpretation is presented, where the segmentation process is considered to include a subsequent matching step where segments which belong to the same landmark are merged while keeping track of those which originate from distinct features.
Abstract: A scheme for extracting environment features from 1D range data and their interpretation is presented. Segmentation is done by deciding on a measure of model fidelity which is applied to adjacent groups of measurements. The extraction process is considered to include a subsequent matching step where segments which belong to the same landmark are to be merged while keeping track of those which originate from distinct features. This is done by an agglomerative hierarchical clustering algorithm with a Mahalanobis distance matrix. The method is discussed with straight line segments which are found in a generalized least squares sense using polar coordinates including their first-order covariance estimates. As a consequence, extraction is no longer a real time problem on the level of single range readings, but must be treated on the level of whole scans. Experimental results with three commercially available laser scanners are presented. The implementation on a mobile robot which performs a mapbased localization demonstrate the accuracy and applicability of the method under real time conditions. The collection of line segments and associated covariance matrices obtained from the extraction process contains more information about the scene than is required for map-based localization. In a subsequent reasoning step this information is made explicit. By successive abstraction and consequent propagation of uncertainties, a compact scene model is finally obtained in the form of a weighted symbolic description preserving topology information and reflecting the main characteristics of a local observation.

144 citations


Journal ArticleDOI
TL;DR: Both the proposed resampling by the half-means method and the smallest half-volume method are simple to use, are conceptually clear, and provide results superior to MVT and the current best-performing technique, MCD.
Abstract: The unreliability of multivariate outlier detection techniques such as Mahalanobis distance and hat matrix leverage has been known in the statistical community for well over a decade. However, only within the past few years has a serious effort been made to introduce robust methods for the detection of multivariate outliers into the chemical literature. Techniques such as the minimum volume ellipsoid (MVE), multivariate trimming (MVT), and M-estimators (e.g., PROP), and others similar to them, such as the minimum covariance determinant (MCD), rely upon algorithms that are difficult to program and may require significant processing times. While MCD and MVE have been shown to be statistically sound, we found MVT unreliable due to the method's use of the Mahalanobis distance measure in its initial step. We examined the performance of MCD and MVT on selected data sets and in simulations and compared the results with two methods of our own devising. Both the proposed resampling by the half-means method and the smallest half-volume method are simple to use, are conceptually clear, and provide results superior to MVT and the current best-performing technique, MCD. Either proposed method is recommended for the detection of multiple outliers in multivariate data.

139 citations


01 Mar 1998
TL;DR: A thorough generalization of the three above criterion to homogeneous Riemannian manifolds that rely only on intrinsic characteristics of the manifold is presented and an intrinsic gradient descent algorithm is proposed to obtain the minimum of the criterions.
Abstract: The question we investigate in this article is: what is the mean value of a set of geometric features and how can we compute it? We use as a guiding example one of the most studied type of features in computer vision and robotics: 3D rotations. The usual techniques on points consist of minimizing the least-square criterion, which gives the barycenter, the weighted least-squares or the sum of (squared) Mahalanobis distances. Unfortunately, these techniques rely on the vector space structure of points and generalizing them directly to other types of features could lead to paradoxes \cite{Pennec:JMIV:97}. For instance, computing the barycenter of rotations using rotation matrices, unit quaternions or rotation vectors gives three different results. We present in this article a thorough generalization of the three above criterions to homogeneous Riemannian manifolds that rely only on intrinsic characteristics of the manifold. The necessary condition for the mean rotation, independently derived in \cite{denney96}, is obtained here as a particular case of a general formula. We also propose an intrinsic gradient descent algorithm to obtain the minimum of the criterions and show how to estimate the uncertainty of the resulting estimation. These algorithms prove to be not only accurate but also efficient: computations are only 3 to 4 times longer for rotations than for points. The accuracy prediction of the results is within 1%, which is quite remarkable. The striking similarity of the algorithms' behavior for general features and for points stresses the validity of our approach using Riemannian geometry and lets us anticipate that other statistical results and algorithms could be generalized to manifolds in this framework.

72 citations


Journal ArticleDOI
TL;DR: In this paper, the potential of a GIS mapping technique, using a resource selection model developed for black-tailed jackrabbits (Lepus californicus) and based on the Mahalanobis distance statistic, to track changes in shrubsteppe habitats in southwestern Idaho was tested.
Abstract: We tested the potential of a GIS mapping technique, using a resource selection model developed for black-tailed jackrabbits (Lepus californicus) and based on the Mahalanobis distance statistic, to track changes in shrubsteppe habitats in southwestern Idaho. If successful, the technique could be used to predict animal use areas, or those undergoing change, in different regions from the same selection function and variables without additional sampling. We determined the multivariate mean vector of 7 GIS variables that described habitats used by jackrabbits. We then ranked the similarity of all cells in the GIS coverage from their Mahalanobis distance to the mean habitat vector. The resulting map accurately depicted areas where we sighted jackrabbits on verification surveys. We then simulated an increase in shrublands (which are important habitats). Contrary to expectation, the new configurations were classified as lower similarity relative to the original mean habitat vector. Because the selection function is based on a unimodal mean, any deviation, even if biologically positive, creates larger Malanobis distances and lower similarity values. We recommend the Mahalanobis distance technique for mapping animal use areas when animals are distributed optimally, the landscape is well-sampled to determine the mean habitat vector, and distributions of the habitat variables does not change.

71 citations


Journal ArticleDOI
TL;DR: This study compares the performance of LWR that uses PCR and PLS regression, the Euclidean and Mahalanobis distance as a distance measure, and the uniform and cubic weighting of calibration objects in local models.
Abstract: The application of locally weighted regression (LWR) to nonlinear calibration problems and strongly clustered calibration data often yields more reliable predictions than global linear calibration models. This study compares the performance of LWR that uses PCR and PLS regression, the Euclidean and Mahalanobis distance as a distance measure, and the uniform and cubic weighting of calibration objects in local models. Recommendations are given on how to apply LWR to near-infrared data sets without spending too much time in the optimization phase.

66 citations


Journal ArticleDOI
TL;DR: A modification of the least Mahalanobis distance method is proposed for prediction of protein classes based on their secondary structures that is a generalization of a quadratic discriminant function to the case of degenerate covariance matrices.
Abstract: We first discuss quantitative rules for determining the protein structural classes based on their secondary structures. Then we propose a modification of the least Mahalanobis distance method for prediction of protein classes. It is a generalization of a quadratic discriminant function to the case of degenerate covariance matrices. The resubstitution tests and leave-one-out tests are carried out to compare several methods. When the class sample sizes or the covariance matrices of different classes are significantly different, the modified method should be used to replace the least Mahalanobis distance method. Two lemmas for the derivation of our new algorithm are proved in an appendix.

57 citations


Book ChapterDOI
11 Oct 1998
TL;DR: A generic framework for pose estimation from geometric features is provided and more particularly two algorithms: a gradient descent on the Riemannian least squares distance and on the Mahalanobis distance are proposed.
Abstract: We provide in this article a generic framework for pose estimation from geometric features. We propose more particularly two algorithms: a gradient descent on the Riemannian least squares distance and on the Mahalanobis distance. For each method, we provide a way to compute the uncertainty of the resulting transformation. The analysis and comparison of the algorithms show their advantages and drawbacks and point out the very good prediction on the transformation accuracy. An application in medical image analysis validates the uncertainty estimation on real data and demonstrates that, using adapted and rigorous tools, we can detect very small modifications in medical images. We believe that these algorithms could be easily embedded in many applications and provide a thorough basis for computing many image statistics.

44 citations


Proceedings ArticleDOI
05 Apr 1998
TL;DR: A new defect detection algorithm for textured images is presented, based on the subband decomposition of gray level images through wavelet filters and extraction of the co-occurrence features from the sub band images.
Abstract: In this paper, a new defect detection algorithm for textured images is presented. The algorithm is based on the subband decomposition of gray level images through wavelet filters and extraction of the co-occurrence features from the subband images. Detection of defects within the inspected texture is performed by partitioning the textured image into non-overlapping subwindows and classifying each subwindow as defective or nondefective with a mahalanobis distance classifier being trained on defect free samples a priori. The experimental results demonstrating the use of this algorithm for the visual inspection of textile products obtained from the real factory environment are also presented.

36 citations


Journal ArticleDOI
TL;DR: In this article, data collected from a paper mill using a WIC-100 process analyzer was divided into six classes, each representing a different kind of paper grade or quality.

01 Jan 1998
TL;DR: An example shows that converging to a (local) minimum from the initial estimates from the elemental subsets is an effective way of determining the overall minimum.
Abstract: SUMMARY An S-estimator of multivariate location and scale minimizes the determinant of the covariance matrix, subject to a constraint on the magnitudes of the corresponding Mahalanobis distances. The relationship between S-estimators and w-estimators of multivariate location and scale can be used to calculate robust estimates of covariance matrices. Elemental subsets of observations are generated to derive initial estimates of means and covariances, and the w-estimator equations are then iterated until convergence to obtain the S-estimates. An example shows that converging to a (local) minimum from the initial estimates from the elemental subsets is an e⁄ective way of determining the overall minimum. None of the estimates gained from the elemental samples is close to the final solution. ( 1998 John Wiley & Sons, Ltd.

Proceedings ArticleDOI
09 Dec 1998
TL;DR: Alternative designs of a radial basis function network acting as classifier in a face recognition system are investigated and the Gaussian mixture model RBF achieves the best performance while allowing less neurons in the hidden layer.
Abstract: In this paper we investigate alternative designs of a radial basis function network acting as classifier in a face recognition system. The inputs to the RBF network are the projections of a face image over the principal components. A database of 250 facial images of 25 persons is used for training and evaluation. Two RBF designs are studied: the forward selection and the Gaussian mixture model. Both designs are also compared to the conventional Euclidean and Mahalanobis classifiers. A set of experiments evaluates the recognition rate of each method as a function of the number of principal components used to characterize the image samples. The results of the experiments indicate that the Gaussian mixture model RBF achieves the best performance while allowing less neurons in the hidden layer. The Gaussian mixture model approach shows also to be less sensitive to the choice of the training set.

Journal ArticleDOI
TL;DR: In this article, the S-estimator of multivariate location and scale minimizes the determinant of the covariance matrix, subject to a constraint on the magnitudes of the corresponding Mahalanobis distances.
Abstract: An S-estimator of multivariate location and scale minimizes the determinant of the covariance matrix, subject to a constraint on the magnitudes of the corresponding Mahalanobis distances. The relationship between S-estimators and w-estimators of multivariate location and scale can be used to calculate robust estimates of covariance matrices. Elemental subsets of observations are generated to derive initial estimates of means and covariances, and the w-estimator equations are then iterated until convergence to obtain the S-estimates. An example shows that converging to a (local) minimum from the initial estimates from the elemental subsets is an effective way of determining the overall minimum. None of the estimates gained from the elemental samples is close to the final solution.

Journal ArticleDOI
TL;DR: In this article, a fuzzy c-means algorithm was applied to a set of 627 alcohols and the results showed that the Mahalanobis distance and a fuzziness coefficient of 1.2 should be used for an optimal clustering.

Journal ArticleDOI
TL;DR: A new algorithm called fuzzy-minimals is obtained, which detects the possible prototypes of the groups of a sample and applies the theoretical results using the Euclidean distance.

Journal ArticleDOI
TL;DR: Three 100- Compound spectra libraries have been used to evaluate artificial neural network classifications of functional groups and the neural network using a radial basis function algorithm was able to correctly classify all aromatic and nonaromatic samples in a test set of 40 samples from the 100-compound library.
Abstract: Three 100-compound spectra libraries have been used to evaluate artificial neural network classifications of functional groups. A near-IR gas-phase library was used to compare neural network classifications with those obtained by two-dimensional principal component analysis (PCA) score plots and by the use of the Mahalanobis distance metric based on multidimensional (score) vectors. The neural network using a radial basis function algorithm was able to correctly classify all aromatic and nonaromatic samples in a test set of 40 samples from the 100-compound library; PCA score plots were successful in separating ∼92% of the 100-compound library into aromatic and nonaromatic classes, whereas the Mahalanobis distance metric could not separate the in-class vs out-of-class aromatics in the library. Using principal component scores as input to the neural network training with 40 randomly selected samples, validating with 20 randomly selected samples, and testing with 40 randomly selected samples were performed i...

Journal ArticleDOI
TL;DR: A learning strategy based on the Self-Organizing feature-mapping method to get the best cluster center is introduced and a comparative analysis among methods without learning is illustrated.

10 Mar 1998
TL;DR: A new algorithm that incorporates the best features of PNN and LVQ was developed that achieved excellent classification performance and met many of the qualitative criteria for an ideal algorithm.
Abstract: : For application to chemical sensor arrays, the ideal pattern recognition is accurate, fast, simple to train, robust to outliers, has low memory requirements, and has the ability to produce a measure of classification certainty. In this work, four data sets representing typical chemical sensor array data were used to compare seven pattern recognition algorithms nearest neighbor, Mahalanobis linear discriminant analysis, Bayesian linear discriminant analysis, SIMCA, back propagation neural networks, probabilistic neural networks (PNN), and learning vector quantization (LVQ) for their ability to meet the criteria. LVQ and PNN exhibited high classification accuracy and met many of the qualitative criteria for an ideal algorithm. Based on these results, a new algorithm (LVQ-PNN) that incorporates the best features of PNN and LVQ was developed. The LVQ-PNN algorithm was further improved by the addition of a faster training procedure. It was then compared with the other seven algorithms. The LVQ-PNN method achieved excellent classification performance. A general procedure for selecting the optimal rejection threshold for a PNN based algorithm using Monte Carlo simulations also was demonstrated. This outlier rejection strategy was implemented for an LVQ-PNN classifier and found consistently to reject ambiguous patterns.

Proceedings ArticleDOI
13 Jul 1998
TL;DR: A new clustering algorithm that uses a weighted Mahdlanobis distance as a distance metric to perform partitional clustering is proposed and application of this algorithm to the problem of detecting suspicious regions in a mammogram is discussed.
Abstract: A new clustering algorithm that uses a weighted Mahdlanobis distance as a distance metric to perform partitional clustering is proposed. The covariance matrices of the generated clusters are used to determine cluster similarity and closeness so that clusters which are similar in shape and close in Mahalanobis distance can be merged together serving the ultimate goal of automatically determining the optimal number of classes present in the data. Properties of the new algorithm are presented by examining the clustering quality for codebooks designed with the proposed method and another common method that uses Euclidean distance. The new algorithm provides better results than the competing method on a variety of data sets. Application of this algorithm to the problem of detecting suspicious regions in a mammogram is discussed.

Proceedings ArticleDOI
T. Kaniel1, M. Mizoguchi
23 Jun 1998
TL;DR: A new distance measure for an identification problem is proposed and experiments on fingerprint preselection using eigenfeatures of ridge direction patterns and "quality indexes" of feature vectors are described to make the distance adaptive to the quality indexes.
Abstract: In this paper we propose a new distance measure for an identification problem and describe experiments on fingerprint preselection using eigenfeatures of ridge direction patterns. The distance is defined by likelihood ratio of error distribution of feature vectors to the whole distribution of feature vector differences. In addition, we introduce "quality indexes" of feature vectors and make the distance adaptive to the quality indexes. Experiments on fingerprint preselection for ten-print cards revealed that our proposed distance is much more effective than the Mahalanobis distance. By combining the eigenfeatures and traditional classification features, 0.06% false acceptance rate at 2.0% false rejection rate and one million cards/sec preselection speed on a standard workstation have been achieved. This makes it possible to construct high performance fingerprint identification systems.

Book ChapterDOI
01 Jan 1998
TL;DR: In this article, the authors discuss the qualitative analysis of samples using spectroscopy and compare the spectrum of an unknown sample against many different spectra in a library of known compounds.
Abstract: This chapter discusses the qualitative analysis of samples using spectroscopy. Discriminant analysis determines the identity or quality of an unknown sample. There are two basic applications for spectroscopic discriminant analysis: sample purity/quality and sample identification/screening. This can replace many quantitative methods. Many different methods have been developed for performing discriminant analysis on spectra. One class of algorithm that is already familiar to many spectroscopists is library spectral search. In these algorithms, the spectrum of an unknown sample is compared against many different spectra in a library of known compounds. By comparing the responses at all wavelengths in the unknown spectrum to the corresponding responses in a series of known (or “library”) spectra, a list of the closest matches can be identified by ranking the known spectra by a calculated “hit quality index.” The Mahalanobis distance method has been applied successfully for spectral discrimination in a number of cases. Principal component analysis (PCA) is a very effective data-reduction technique for spectroscopic data. Replacing the selected wavelengths in the Mahalanobis distance equation with PCA scores gives this discriminant analysis method the same advantages. Using PCA as a data-reduction technique allows full spectral coverage of all samples and alleviates the need for wavelength optimization.

Patent
21 Apr 1998
TL;DR: In this paper, an area model of a face was fitted to the picture, which was subjected to normalization of the local area to be collated and the picture processing, to extract the feature variable of each discrimination element acquisition area in the area model.
Abstract: PROBLEM TO BE SOLVED: To accurately detect an object to be discriminated by relatively easy processing. SOLUTION: With respect to an input picture including a face picture taken in from a picture input part, the position of a local area to be collated to which an area model is to be fitted is designated by a collation area position designating part 25. Luminance normalization is performed for each designated local area to be collated by a luminance normalizing part 26, and picture processing like edge detection is performed by a picture processing part 27. The area model of a face is fitted to the picture, which was subjected to normalization of the local area to be collated and the picture processing, to extract the feature variable of each discrimination element acquisition area in the area model by a discrimination element acquisition part 28, and the Mahalanobis distance is calculated for each local area to be collated based on the extracted feature quantity by a Mahalanobis distance discrimination part 29, and a face is detected by this calculation result.

Patent
30 Sep 1998
TL;DR: In this paper, a plurality of data are sampled about at least one parameter, under a condition of normal operation of semiconductor manufacturing equipment 11, and Mahalanobis space is formed.
Abstract: PROBLEM TO BE SOLVED: To provide a control method which can accurately discriminate abnormal operation of semiconductor manufacturing equipment. SOLUTION: A plurality of data are sampled about at least one parameter, under a condition of normal operation of semiconductor manufacturing equipment 11. On the basis of the sampled data group, Mahalanobis space is formed. On the basis of the Mahalanobis space, the Mahalonobis distance D2 is calculated from the measured value about the parameter obtained by the operating condition of the semiconductor manufacturing equipment 11. When the value of the Mahalanobis distance D2 exceeds a specified value, it is decided that the semiconductor manufacturing equipment 11 has put to abnormal operation. COPYRIGHT: (C)2000,JPO

Journal Article
01 Jan 1998-Ursus
TL;DR: In this article, the authors used the Mahalanobis distance statistic and a raster geographic information system (GIS) to model potential black bear denning habitat in the Ouachita Mountains of Arkansas.
Abstract: We used the Mahalanobis distance statistic and a raster geographic information system (GIS) to model potential black bear (Ursus americanus) denning habitat in the Ouachita Mountains of Arkansas. The Mahalanobis distance statistic was used to represent the standard squared distance between sample variates in the GIS database (forest cover type, elevation, slope, aspect, distance to streams, distance to roads, and forest cover richness) and variates at known bear dens. Two models were developed: a generalized model for all den locations and another specific to dens in rock cavities. Differences between habitat at den sites and habitat across the study area were represented in 2 new GIS themes as Mahalanobis distance values. Cells similar to the mean vector derived from the known dens had low Mahalanobis distance values, and dissimilar cells had high values. The reliability of the predictive model was tested by overlaying den locations collected subsequent to original model development on the resultant den habitat themes. Although the generalized model demonstrated poor reliability, the model specific to rock dens had good reliability. Bears were more likely to choose rock den locations with low Mahalanobis distance values and less likely to choose those with high values. The model can be used to plan the timing and extent of management actions (e.g., road building, prescribed fire, timber harvest) most appropriate for those sites with high or low denning potential.

DOI
01 Dec 1998
TL;DR: In this case, a new approach for appearance defect inspection using Mahalanobis-Taguchi System was applied, where images from a group of "good" products were transformed into wave patterns and differential and integral characteristics were picked up from the patterns.
Abstract: In this case, a new approach for appearance defect inspection using Mahalanobis-Taguchi System was applied. Images from a group of "good" (normal) products were transformed into wave patterns. From these normal patterns, "differential characteristic" and "integral characteristic" data were picked up. A Mahalanobis space was constructed using these data. Then, the image of a product to be inspected was transformed into a group of wave patterns, and the differential and integral characteristics were picked up from the patterns. The data from these characteristics were calculated with the said Mahalanobis

Proceedings ArticleDOI
16 Aug 1998
TL;DR: A statistical model, describing noise-disturbed invariants extracted from a surface patch of a range image, has been developed and applied to region based pose estimation and classification of 3D quadrics.
Abstract: A statistical model, describing noise-disturbed invariants extracted from a surface patch of a range image, has been developed and applied to region based pose estimation and classification of 3D quadrics. The Mahalanobis distance, which yields the same results as a Baysian classifier, is used for the classification of the surface patches. The results, compared with the Euclidean distance, appear to be much more reliable.

Proceedings ArticleDOI
TL;DR: An integration method of the radar information in multispectral images without disturbing the spectral content is proposed and a new fusion rule performed in the redundant wavelet domain based on the Mahalanobis distance applied on the wavelet coefficients is defined.
Abstract: We propose in this paper an integration method of the radar information in multispectral images without disturbing the spectral content The main problem is to define a fusion rule that allows to take into account the characteristics of these images Also, the main purpose of this paper lies in defining a new fusion rule performed in the redundant wavelet domain This rule is based on the Mahalanobis distance applied on the wavelet coefficients Instead of comparing coefficient-to- coefficient, the distance-to-distance comparison is performed In this case the selected coefficient in the fused image will be the one that presents the large distance This approach is applied to fusing the infrared band of SPOT with, respectively, RADARSAT and ERS images The results show that spectral information is well preserved and there is a better information on the texture and the area roughness© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering Downloading of the abstract is permitted for personal use only

Journal ArticleDOI
TL;DR: A lower bound for the Mahalanobis distance is proposed that has two advantages: it can be progressively computed, and it is greater than the classical trace lower bound.

Book ChapterDOI
04 Nov 1998
TL;DR: A method to classify forms by a statistical approach, and solves the block instability by introducing a block penalty coefficient, which modifies the classical expression of Mahalanobis distance.
Abstract: In this paper, we present a method to classify forms by a statistical approach; the physical structure may vary from one writer to another. An automatic form segmentation is performed to extract the physical structure which is described by the main rectangular block set. During the form learning phase, a block matching is made inside each class; the number of occurrences of each block is counted, and statistical block attributes are computed. During the phase of identification, we solve the block instability by introducing a block penalty coefficient, which modifies the classical expression of Mahalanobis distance. A block penalty coefficient depends on the block occurrence probability. Experimental results, using the different form types, are given.