scispace - formally typeset
Search or ask a question
Topic

Mahalanobis distance

About: Mahalanobis distance is a research topic. Over the lifetime, 4616 publications have been published within this topic receiving 95294 citations.


Papers
More filters
01 Jun 2011
TL;DR: The results show that Mahalanobis distance is a useful technique for identifying both single-hour outliers and contiguous-time clusters whose component members are not, in themselves, highly deviant.
Abstract: : Modeling the behavior of interacting humans in routine but complex activities has many challenges, not the least of which is that humans can be both purposive and negligent, and further can encounter unexpected environmental hazards requiring fast action. The challenge is to characterize and model the humdrum routine while at the same time capturing the deviations and anomalies which arise from time to time. Because of the disruptive impact that anomalies (such as accidents) can have and the importance for incorporating them in our models, this report focuses on one technique for identifying anomalies in complex behavior patterns especially when there is no sharp demarcation between routine and unusual activity. The technique we evaluate is that of Mahalanobis distance which is known to be useful for identifying outliers when data is multivariate normal. But, the data we use for evaluation is deliberately markedly non-multivariate normal since that is what we confront in complex human systems. Specifically, we use one year's (2008) hourly traffic-volume data on a major multi-lane road (I-95) in one location in a major city (New York) with a dense population and several alternate routes. The traffic data is rich, large, incomplete, and reflects the effects of bad weather, accidents, routine fluctuations (rush hours versus dead of night), and onetime social events. The results show that Mahalanobis distance is a useful technique for identifying both single-hour outliers and contiguous-time clusters whose component members are not, in themselves, highly deviant.

34 citations

Proceedings ArticleDOI
10 Jun 2001
TL;DR: In this paper, a Mahalanobis space was constructed from the wafers that had high yields, and the relationship between yield and distance was analyzed in order to study the relationship.
Abstract: The distribution of yield from the production lines is concentrated at a high-yield area and tapers down to the lower-yield area. Production management would find it useful if the yield of individual wafers could be forecast. The yield is determined by the variability of electrical characteristics and dust. In this study, only the variability of electrical characteristics was discussed. One product was selected for study, and a Mahalanobis space was constructed from the wafers that had high yields. The Mahalanobis distances of various wafers ware calculated in order to study the relationship between yield and distance.

34 citations

Journal ArticleDOI
TL;DR: In this article, a performance degradation indicator RUL prediction model is established for a rolling bearing with a single performance degradation metric, which is based on the kurtosis-correlation coefficient (K-C) criteria.
Abstract: Aiming at the problem of poor prediction performance of rolling bearing remaining useful life (RUL) with single performance degradation indicator, a novel based-performance degradation indicator RUL prediction model is established. Firstly, the vibration signal of rolling bearing is decomposed into some intrinsic scale components (ISCs) by piecewise cubic Hermite interpolating polynomial-local characteristic-scale decomposition (PCHIP-LCD), and the effective ISCs are selected to reconstruct signals based on kurtosis-correlation coefficient (K-C) criteria. Secondly, the multi-dimensional degradation feature set of reconstructed signals is extracted, and then the sensitive degradation indicator IICAMD is calculated by fusing the improved independent component analysis (IICA) and Mahalanobis Distance (MD). Thirdly, the false fluctuation of the IICAMD is repaired by using the gray regression model (GM) to obtain the health indicator (HI) of the rolling bearing, and the start prediction time (SPT) of the rolling bearing is determined according to the time mutation point of HI. Finally, generalized regression neural network (GRNN) model based on HI is constructed to predict the RUL of rolling bearing. The experimental results of two groups of different rolling bearing data-sets show that the proposed method achieves better performance in prediction accuracy and reliability.

34 citations

Journal ArticleDOI
TL;DR: In this article, the authors used multivariate methods such as the Mahalanobis distance, the Jackknife distance, p -values, or Hadi's to identify potential outliers and avoid eliminating valid data.
Abstract: Evaluating water quality data for outliers is a good quality control/quality assessment procedure whether the data are used for monitoring or for modeling. Often water quality data are correlated, e.g., carbonaceous biochemical oxygen demand (CBOD) has some correlation with N H3 . Univariate methods for identifying outliers do not consider the correlation between variables and may identify too many data points as outliers or miss observations which have extreme ratios between variables, e.g., a raw wastewater sample with relatively low CBOD but high N H3 . Testing for outliers using multivariate methods such as the Mahalanobis distance, Jackknife distance, p -values, or Hadi’s automatically incorporates the correlation or covariance between variables and is fundamentally more correct. Such multivariate methods can better identify potential outliers and avoid eliminating valid data.

34 citations

Journal ArticleDOI
TL;DR: In this article, a contaminated multivariate normal distribution with two parameters indicating the percentage of outliers and the degree of contamination is used to identify the multivariate outliers, which can then be eliminated to obtain approximately normal data.
Abstract: Multivariate outliers may be modeled using the contaminated multivariate normal distribution with two parameters indicating the percentage of outliers and the degree of contamination. Recent developments in elliptical distribution theory are used to determine estimators of these parameters. These estimators can be used with an index of Mahalanobis distance to identify the multivariate outliers, which can then be eliminated to obtain approximately normal data. The performance of the proposed estimators and outliers rejection procedures are evaluated in a small simulation study.

34 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
79% related
Artificial neural network
207K papers, 4.5M citations
79% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Convolutional neural network
74.7K papers, 2M citations
77% related
Image processing
229.9K papers, 3.5M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023208
2022452
2021232
2020239
2019249