scispace - formally typeset
Search or ask a question
Topic

Mahalanobis distance

About: Mahalanobis distance is a research topic. Over the lifetime, 4616 publications have been published within this topic receiving 95294 citations.


Papers
More filters
01 Jan 2004
TL;DR: The problem of choosing between the two methods of linear discriminant analysis and logistic regression is considered, and some guidelines for proper choice are set.
Abstract: Two of the most widely used statistical methods for analyzing categorical outcome variables are linear discrimina nt analysis and logistic regression. While both are appropriate for the deve lopment of linear classification models, linear discriminant analysis makes more assumptions about the underlying data. Hence, it is assumed tha t logistic regression is the more flexible and more robust method in case of violations of these assumptions. In this paper we consider the problem of choosing between the two methods, and set some guidelines for proper cho ice. The comparison between the methods is based on several measures of predictive accuracy. The performance of the methods is studied by simula tions. We start with an example where all the assumptions of the linear dis criminant analysis are satisfied and observe the impact of changes regardi ng the sample size, covariance matrix, Mahalanobis distance and directi on of distance between group means. Next, we compare the robustness of the methods towards categorisation and non-normality of explanatory var iables in a closely controlled way. We show that the results of LDA and LR are close whenever the normality assumptions are not too badl y violated, and set some guidelines for recognizing these situations. W e discuss the inappropriateness of LDA in all other cases.

279 citations

Journal ArticleDOI
01 Jul 2001-Proteins
TL;DR: In this communication, a simple, rigorous derivation is provided to prove that the Bayes decision rule introduced recently for protein structural class prediction is completely the same as the earlier component‐coupled algorithm.
Abstract: It has been quite clear that the success rate for predicting protein structural class can be improved significantly by using the algorithms that incorporate the coupling effect among different amino acid components of a protein However, there is still a lot of confusion in understanding the relationship of these advanced algorithms, such as the least Mahalanobis distance algorithm, the component-coupled algorithm, and the Bayes decision rule In this communication, a simple, rigorous derivation is provided to prove that the Bayes decision rule introduced recently for protein structural class prediction is completely the same as the earlier component-coupled algorithm Meanwhile, it is also very clear from the derivative equations that the least Mahalanobis distance algorithm is an approximation of the component-coupled algorithm, also named as the covariant-discriminant algorithm introduced by Chou and Elrod in protein subcellular location prediction (Protein Engineering, 1999; 12:107-118) Clarification of the confusion will help use these powerful algorithms effectively and correctly interpret the results obtained by them, so as to conduce to the further development not only in the structural prediction area, but in some other relevant areas in protein science as well

275 citations

Journal ArticleDOI
TL;DR: T theoretical analyses and empirical observations across wide spectrum multi-class imbalanced benchmarks indicate that MDO is the method of choice by offering statistical superior MAUC and precision compared to the popular over-sampling techniques.
Abstract: Class imbalance problem is quite pervasive in our nowadays human practice. This problem basically refers to the skewness in the data underlying distribution which, in turn, imposes many difficulties on typical machine learning algorithms. To deal with the emerging issues arising from multi-class skewed distributions, existing efforts are mainly divided into two categories: model-oriented solutions and data-oriented techniques. Focusing on the latter, this paper presents a new over-sampling technique which is inspired by Mahalanobis distance. The presented over-sampling technique, called MDO (Mahalanobis Distance-based Over-sampling technique), generates synthetic samples which have the same Mahalanobis distance from the considered class mean as other minority class examples. By preserving the covariance structure of the minority class instances and intelligently generating synthetic samples along the probability contours, new minority class instances are modelled better for learning algorithms. Moreover, MDO can reduce the risk of overlapping between different class regions which are considered as a serious challenge in multi-class problems. Our theoretical analyses and empirical observations across wide spectrum multi-class imbalanced benchmarks indicate that MDO is the method of choice by offering statistical superior MAUC and precision compared to the popular over-sampling techniques.

272 citations

Journal ArticleDOI
TL;DR: A discriminative deep multi-metric learning method to jointly learn multiple neural networks, under which the correlation of different features of each sample is maximized, and the distance of each positive pair is reduced and that of each negative pair is enlarged.
Abstract: This paper presents a new discriminative deep metric learning (DDML) method for face and kinship verification in wild conditions. While metric learning has achieved reasonably good performance in face and kinship verification, most existing metric learning methods aim to learn a single Mahalanobis distance metric to maximize the inter-class variations and minimize the intra-class variations, which cannot capture the nonlinear manifold where face images usually lie on. To address this, we propose a DDML method to train a deep neural network to learn a set of hierarchical nonlinear transformations to project face pairs into the same latent feature space, under which the distance of each positive pair is reduced and that of each negative pair is enlarged. To better use the commonality of multiple feature descriptors to make all the features more robust for face and kinship verification, we develop a discriminative deep multi-metric learning method to jointly learn multiple neural networks, under which the correlation of different features of each sample is maximized, and the distance of each positive pair is reduced and that of each negative pair is enlarged. Extensive experimental results show that our proposed methods achieve the acceptable results in both face and kinship verification.

264 citations

01 Jan 2003
TL;DR: Using a combinations of traditional distance measures in Eigenspace to improve performance in the matching stage of face recognition and comparing variations in performance due to different distance measures and numbers of Eigenvectors is compared.
Abstract: This study examines the role of Eigenvector selection and Eigenspace distance measures on PCA-based face recognition systems. In particular, it builds on earlier results from the FERET face recognition evaluation studies, which created a large face database (1,196 subjects) and a baseline face recognition system for comparative evaluations. This study looks at using a combinations of traditional distance measures (City-block, Euclidean, Angle, Mahalanobis) in Eigenspace to improve performance in the matching stage of face recognition. A statistically significant improvement is observed for the Mahalanobis distance alone when compared to the other three alone. However, no combinations of these measures appear to perform better than Mahalanobis alone. This study also examines questions of how many Eigenvectors to select and according to what ordering criterion. It compares variations in performance due to different distance measures and numbers of Eigenvectors. Ordering Eigenvectors according to a like-image difference value rather than their Eigenvalues is also considered.

263 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
79% related
Artificial neural network
207K papers, 4.5M citations
79% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Convolutional neural network
74.7K papers, 2M citations
77% related
Image processing
229.9K papers, 3.5M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023208
2022452
2021232
2020239
2019249