scispace - formally typeset
Search or ask a question
Topic

Mahalanobis distance

About: Mahalanobis distance is a research topic. Over the lifetime, 4616 publications have been published within this topic receiving 95294 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The Mahalanobis distance, in the original and principal component (PC) space, will be examined and interpreted in relation with the Euclidean distance (ED).

1,802 citations

Proceedings ArticleDOI
01 Dec 2005
TL;DR: This paper proposes and analyzes parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences, and shows that there is a bijection between regular exponential families and a largeclass of BRegman diverGences, that is called regular Breg man divergence.
Abstract: A wide variety of distortion functions, such as squared Euclidean distance, Mahalanobis distance, Itakura-Saito distance and relative entropy, have been used for clustering. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans , the Linde-Buzo-Gray (LBG) algorithm and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the method to a large class of clustering loss functions. This is achieved by first posing the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate distortion theory, and then deriving an iterative algorithm that monotonically decreases this loss. In addition, we show that there is a bijection between regular exponential families and a large class of Bregman divergences, that we call regular Bregman divergences. This result enables the development of an alternative interpretation of an efficient EM scheme for learning mixtures of exponential family distributions, and leads to a simple soft clustering algorithm for regular Bregman divergences. Finally, we discuss the connection between rate distortion theory and Bregman clustering and present an information theoretic analysis of Bregman clustering algorithms in terms of a trade-off between compression and loss in Bregman information.

1,723 citations

Journal ArticleDOI
TL;DR: This work proposes to compute distances based on very robust estimates of location and covariance, better suited to expose the outliers in a multivariate point cloud, to avoid the masking effect.
Abstract: Detecting outliers in a multivariate point cloud is not trivial, especially when there are several outliers. The classical identification method does not always find them, because it is based on the sample mean and covariance matrix, which are themselves affected by the outliers. That is how the outliers get masked. To avoid the masking effect, we propose to compute distances based on very robust estimates of location and covariance. These robust distances are better suited to expose the outliers. In the case of regression data, the classical least squares approach masks outliers in a similar way. Also here, the outliers may be unmasked by using a highly robust regression method. Finally, a new display is proposed in which the robust regression residuals are plotted versus the robust distances. This plot classifies the data into regular observations, vertical outliers, good leverage points, and bad leverage points. Several examples are discussed.

1,419 citations

Proceedings Article
01 Jan 2018
TL;DR: This paper proposes a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier, and obtains the class conditional Gaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis.
Abstract: Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditional Gaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-the-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in harsh cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class-incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models.

1,022 citations

Journal ArticleDOI
TL;DR: This article proposed a set of multidimensional measures, including economic, financial, political, administrative, cultural, demographic, knowledge, and global connectedness, as well as geographic distance.
Abstract: Cross-national distance is a key concept in the field of management. Previous research has conceptualized and measured cross-national differences mostly in terms of dyadic cultural distance, and has used the Euclidean approach to measuring it. In contrast, our goal is to disaggregate the construct of distance by proposing a set of multidimensional measures, including economic, financial, political, administrative, cultural, demographic, knowledge, and global connectedness as well as geographic distance. We ground our analysis and choice of empirical dimensions on institutional theories of national business, governance, and innovation systems. In order to overcome the methodological limitations of the Euclidean approach, we calculate dyadic distances using the Mahalanobis method, which is scale-invariant and takes into consideration the variance–covariance matrix. We empirically analyze four different foreign expansion choices of US companies to illustrate the importance of disaggregating the distance construct and the usefulness of our distance calculations, which we make freely available to managers and scholars.

981 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
79% related
Artificial neural network
207K papers, 4.5M citations
79% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Convolutional neural network
74.7K papers, 2M citations
77% related
Image processing
229.9K papers, 3.5M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023208
2022452
2021232
2020239
2019249