scispace - formally typeset
Search or ask a question
Topic

Mahalanobis distance

About: Mahalanobis distance is a research topic. Over the lifetime, 4616 publications have been published within this topic receiving 95294 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, Mahalanobis' generalized distance, K nearest neighbors (KNN), and soft independent modeling of class analogy (SIMCA) were evaluated to determine the best analytical procedure.
Abstract: This study deals with the suitable discriminant techniques of wood-based materials by means of near-infrared spectroscopy (NIRS) and several chemometric analyses. The concept of Mahalanobis' generalized distance, K nearest neighbors (KNN), and soft independent modeling of class analogy (SIMCA) were evaluated to determine the best analytical procedure. The difference in the accuracy of classification with the spectrophotometer, the wavelength range as the explanatory variables, and the light-exposure condition of the sample were examined in detail. It was difficult to apply Mahalanobis' generalized distances to the classification of wood-based materials where NIR spectra varied widely within the sample category. The performance of KNN in the NIR region (800–2500 nm), for which the device used in the laboratory was employed, exhibited a high rate of correct answers of validation (>98%) independent of the light-exposure conditions of the sample. When employing the device used in the field, both KNN and SIMCA revealed correct answers of validation (>88%) at wavelengths of 550–1010 nm. These results suggest the applicability of NIRS to a reasonable classification of used wood at the factory and at job sites.

50 citations

Journal ArticleDOI
TL;DR: In this article, a generalization of Wilks's single-outlier test for detecting from 1 to k outliers in a multivariate data set is proposed and appropriate critical values determined.
Abstract: A generalization of Wilks's single‐outlier test suitable for application to the many‐outlier problem of detecting from 1 to k outliers in a multivariate data set is proposed and appropriate critical values determined. The method used follows that suggested by Rosner employing sequential application of the generalized extreme Studentized deviate to univariate samples of reducing size, in which the type I error is controlled both under the hypothesis of no outliers and under the alternative hypothesis of 1, 2,. . ., k outliers. It is shown that critical values for the sequential application of Wilks's test to detect many outliers depend only on those for a single outlier test which may be approximated by percentage points from the F‐distributions as tabulated by Wilks. Relationships between Wilks's test statistic, the Mahalanobis distance between the ‘outlier’ and the mean vector, and Hotelling's T2‐test between the outlier and the rest of the data, are used to reduce the amount of computation involved in applying the sequential procedure. Simulations are used to show that the method behaves well in detecting multiple outliers in samples larger than about 25. Finally, an example with three dimensions is used to illustrate how the method is applied.

50 citations

Journal ArticleDOI
TL;DR: In this paper, the performance of the analog method for downscaling daily precipitation was evaluated for searching analogs, various ways to include the past atmospheric evolution, and different truncations in EOF space.
Abstract: This study examines the performance of the analog method for downscaling daily precipitation. The evaluation is performed for (1) a number of similarity measures for searching analogs, (2) various ways to include the past atmospheric evolution, and (3) different truncations in EOF space. It is carried out for two regions with complex topographic structures, and with distinct climatic characteristics, namely, California’s Central Valley (together with the Sierra Nevada) and the European Alps. NCEP/NCAR reanalysis data are used to represent the large scale state of the atmosphere over the regions. The assessment is based on simulating daily precipitation for 103 stations for the month of January, for the years 1950–2004 in the California region, and for 70 stations in the European Alps (January 1948–2004). Generally, simulated precipitation is in better agreement with observations in the California region than in the European Alps. Similarity measures such as the Euclidean norm, the sum of absolute differences and the angle between two atmospheric states perform better than measures which introduce additional weightings to principal components (e.g., the Mahalanobis distance). The best choice seems dependent upon the target variable. Lengths of wet spells, for instance, are best simulated by using the angular similarity measure. Overall, the Euclidean norm performs satisfactorily in most cases and hence is a reasonable first choice, whereas the use of Mahalanobis distance is less advisable. The performance of the analog method improves by including large-scale information for bygone days, particularly, for the simulation of wet and dry spells. Optimal performance is obtained when about 85–90% of the total predictor variability is retained.

50 citations

Proceedings ArticleDOI
11 Nov 2005
TL;DR: A novel technique for clustering and classification of object trajectory-based video motion clips using spatiotemporal functional approximations, which leads to efficiency gains over existing approaches that use discrete point-based flow vectors to represent the whole trajectory.
Abstract: This paper proposes a novel technique for clustering and classification of object trajectory-based video motion clips using spatiotemporal functional approximations. A Mahalanobis classifier is then used for the detection of anomalous trajectories. Motion trajectories are considered as time series and modeled using the leading Fourier coefficients obtained by a Discrete Fourier Transform. Trajectory clustering is then carried out in the Fourier coefficient feature space to discover patterns of similar object motions. The coefficients of the basis functions are used as input feature vectors to a Self-Organising Map which can learn similarities between object trajectories in an unsupervised manner. Encoding trajectories in this way leads to efficiency gains over existing approaches that use discrete point-based flow vectors to represent the whole trajectory. Experiments are performed on two different datasets -- synthetic and pedestrian object tracking - to demonstrate the effectiveness of our approach. Applications to motion data mining in video surveillance databases are envisaged.

50 citations

Journal ArticleDOI
TL;DR: It is found that using only two features, the OCC has comparable performance to that of original classifier using 20 macro features, and heterogeneous combination of classifiers is more promising than the homogenous combination.
Abstract: In this paper, we have investigated the problem of gender classification using frontal facial images. Four different classifiers, namely K-means, k-nearest neighbors, Linear Discriminant Analysis and Mahalanobis Distance Based classifiers are compared. Receiver operating characteristics (ROC) curve along with the area under the convex hull (AUCH) have been utilized as the performance measures of the classifiers at different feature subsets. To measure the overall performance of a classifier with single scalar value, the new scheme of finding the area under the convex hull of AUCH of ROC curves (AUCH of AUCHS) is proposed. It has been observed that, when the number of macro features is increased beyond 5, the AUCH saturates and even decreases for some classifiers, illustrating the curse of dimensionality. We then used genetic programming to combine classifiers and thus evolved an optimum combined classifier (OCC), producing better performance than the individual classifiers. We found that using only two features, the OCC has comparable performance to that of original classifier using 20 macro features. It produces true positive rate values as high as 0.94 corresponding to false positive rate as low as 0.15 for 1: 3 train to testing ratio. We also observed that heterogeneous combination of classifiers is more promising than the homogenous combination.

49 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
79% related
Artificial neural network
207K papers, 4.5M citations
79% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Convolutional neural network
74.7K papers, 2M citations
77% related
Image processing
229.9K papers, 3.5M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023208
2022452
2021232
2020239
2019249