scispace - formally typeset
Search or ask a question
Topic

Mahalanobis distance

About: Mahalanobis distance is a research topic. Over the lifetime, 4616 publications have been published within this topic receiving 95294 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Recommendations for pre-processing excipient NIR data and for choosing an appropriate classification method are given, namely the wavelength distance method combined with de-trending, a simple baseline correction method.

211 citations

Journal ArticleDOI
TL;DR: In this paper, the effect of nonnormality on multivariate regression tests, on thle one-way multivariate analysis of variance and on tests of equality of covariance matrices is studied following the approach of Box & Watson (1962).
Abstract: SUMMARY The effect of nonnormality on multivariate regression tests, on thle one-way multivariate analysis of variance and on tests of equality of covariance matrices is studied following the approach of Box & Watson (1962). In the nonnormal case, an approximation to the distribution of a generalized Mahalanobis distance type of statistic for the multivariate regression problem is derived. It is shown that sensitivity to nonnormality in the multivariate observations is determined by the extent of nonnormality of the regressors. The randomization distribution of the generalized Mahalanobis distance is deduced. The multivariate analysis of variance is found to be robust to nonnormality whereas the tests for equality of covariance matrices are found to be sensitive to nonnormality. An explanation for this varying degree of sensitivity to nonnormality is given.

208 citations

Book
21 Mar 2013
TL;DR: In this article, the authors present an analysis of multivariate data and the forward search for regression data in order to find a Multivariate Transformations to Normality (MTN) with the Forward Search.
Abstract: Contents Preface Notation 1 Examples of Multivariate Data 1.1 In.uence, Outliers and Distances 1.2 A Sketch of the Forward Search 1.3 Multivariate Normality and our Examples 1.4 Swiss Heads 1.5 National Track Records forWomen 1.6 Municipalities in Emilia-Romagna 1.7 Swiss Bank Notes 1.8 Plan of the Book 2 Multivariate Data and the Forward Search 2.1 The Univariate Normal Distribution 2.1.1 Estimation 2.1.2 Distribution of Estimators 2.2 Estimation and the Multivariate Normal Distribution 2.2.1 The Multivariate Normal Distribution 2.2.2 The Wishart Distribution 2.2.3 Estimation of O 2.3 Hypothesis Testing 2.3.1 Hypotheses About the Mean 2.3.2 Hypotheses About the Variance 2.4 The Mahalanobis Distance 2.5 Some Deletion Results 2.5.1 The Deletion Mahalanobis Distance 2.5.2 The (Bartlett)-Sherman-Morrison-Woodbury Formula 2.5.3 Deletion Relationships Among Distances 2.6 Distribution of the Squared Mahalanobis Distance 2.7 Determinants of Dispersion Matrices and the Squared Mahalanobis Distance 2.8 Regression 2.9 Added Variables in Regression 2.10 TheMean Shift OutlierModel 2.11 Seemingly Unrelated Regression 2.12 The Forward Search 2.13 Starting the Search 2.13.1 The Babyfood Data 2.13.2 Robust Bivariate Boxplots from Peeling 2.13.3 Bivariate Boxplots from Ellipses 2.13.4 The Initial Subset 2.14 Monitoring the Search 2.15 The Forward Search for Regression Data 2.15.1 Univariate Regression 2.15.2 Multivariate Regression 2.16 Further Reading 2.17 Exercises 2.18 Solutions 3 Data from One Multivariate Distribution 3.1 Swiss Heads 3.2 National Track Records for Women 3.3 Municipalities in Emilia-Romagna 3.4 Swiss Bank Notes 3.5 What Have We Seen? 3.6 Exercises 3.7 Solutions 4 Multivariate Transformations to Normality 4.1 Background 4.2 An Introductory Example: the Babyfood Data 4.3 Power Transformations to Approximate Normality 4.3.1 Transformation of the Response in Regression 4.3.2 Multivariate Transformations to Normality 4.4 Score Tests for Transformations 4.5 Graphics for Transformations 4.6 Finding a Multivariate Transformation with the Forward Search 4.7 Babyfood Data 4.8 Swiss Heads 4.9 Horse Mussels 4.10 Municipalities in Emilia-Romagna 4.10.1 Demographic Variables 4.10.2 Wealth Variables 4.10.3 Work Variables 4.10.4 A Combined Analysis 4.11 National Track Records for Women 4.12 Dyestuff Data 4.13 Babyfood Data and Variable Selection 4.14 Suggestions for Further Reading 4.15 Exercises 4.16 Solutions 5 Principal Components Analysis 5.1 Background 5.2 Principal Components and Eigenvectors 5.2.1 Linear Transformations and Principal Components . 5.2.2 Lack of Scale Invariance and Standardized Variables 5.2.3 The Number of Components 5.3 Monitoring the Forward Search 5.3.1 Principal Components and Variances 5.3.2 Principal Component Scores 5.3.3 Correlations Between Variables and Principal Components 5.3.4 Elements of the Eigenvectors 5.4 The Biplot and the Singular Value Decomposition 5.5 Swiss Heads 5.6 Milk Data 5.7 Quality of Life 5.8 Swiss Bank Notes 5.8.1 Forgeries and Genuine Notes 5.8.2 Forgeries Alone 5.9 Municipalities in Emilia-Romagna 5.10 Further reading 5.11 Exercises 5.12 Solutions 6 Discriminant Analysis 6.1 Background 6.2 An Outline of Discriminant Analysis 6.2.1 Bayesian Discrimination 6.2.2 Quadratic Discriminant Analysis 6.2.3 Linear Discriminant Analysis 6.2.4 Estimation of Means and Variances 6.2.5 Canonical Variates 6.2.6 Assessment of Discriminant Rules 6.3 The Forward Search 6.3.1 Step 1: Choice of the Initial Subset 6.3.2 Step 2: Adding

202 citations

Journal ArticleDOI
01 Oct 2010-Forestry
TL;DR: In this article, a mixed temperate forest landscape in southwestern Germany, multiple remote sensing variables from aerial orthoimages, Thematic Mapper data and small footprint light detection and ranging (LiDAR) were used for plot-level nonparametric predictions of the total volume and biomass using three distance measures of Euclidean, Mahalanobis and Most Similar Neighbors as well as a regression tree-based classifier (Random Forest).
Abstract: Summary In a mixed temperate forest landscape in southwestern Germany, multiple remote sensing variables from aerial orthoimages, Thematic Mapper data and small footprint light detection and ranging (LiDAR) were used for plot-level nonparametric predictions of the total volume and biomass using three distance measures of Euclidean, Mahalanobis and Most Similar Neighbour as well as a regression tree-based classifier (Random Forest). The performances of nearest neighbour (NN) approaches were examined by means of relative bias and root mean squared error. The original highdimensional dataset was pruned using an evolutionary genetic algorithm search with a NN classification scenario, as well as by a stepwise selection. The genetic algorithm (GA)-selected variables showed improved performance when applying Euclidean and Mahalanobis distances for predictions, whereas the Most Similar Neighbour and Random Forests worked more precise with the full dataset. The GA search proved to be unstable in multiple runs because of intercorrelations among the high-dimensional predictors. The selected datasets are dominated by LiDAR height metrics. Furthermore, The LiDAR-based metrics showed major relevance in predicting both response variables examined here. The Random Forest proved to be superior to the other examined NN methods, which was eventually used for a wallto-wall mapping of predictions on a grid of 20 × 20 m spatial resolution.

202 citations

Journal ArticleDOI
TL;DR: This paper addresses the problem of characterizing ensemble similarity from sample similarity in a principled manner by using a reproducing kernel as a characterization of sample similarity, and suggests a probabilistic distance measure in the reproducingkernel Hilbert space (RKHS) as the ensemble similarity.
Abstract: This paper addresses the problem of characterizing ensemble similarity from sample similarity in a principled manner. Using a reproducing kernel as a characterization of sample similarity, we suggest a probabilistic distance measure in the reproducing kernel Hilbert space (RKHS) as the ensemble similarity. Assuming normality in the RKHS, we derive analytic expressions for probabilistic distance measures that are commonly used in many applications, such as Chernoff distance (or the Bhattacharyya distance as its special case), Kullback-Leibler divergence, etc. Since the reproducing kernel implicitly embeds a nonlinear mapping, our approach presents a new way to study these distances whose feasibility and efficiency is demonstrated using experiments with synthetic and real examples. Further, we extend the ensemble similarity to the reproducing kernel for ensemble and study the ensemble similarity for more general data representations.

201 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
79% related
Artificial neural network
207K papers, 4.5M citations
79% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Convolutional neural network
74.7K papers, 2M citations
77% related
Image processing
229.9K papers, 3.5M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023208
2022452
2021232
2020239
2019249