Topic
Mahalanobis distance
About: Mahalanobis distance is a research topic. Over the lifetime, 4616 publications have been published within this topic receiving 95294 citations.
Papers published on a yearly basis
Papers
More filters
01 Jun 2011
TL;DR: The results show that Mahalanobis distance is a useful technique for identifying both single-hour outliers and contiguous-time clusters whose component members are not, in themselves, highly deviant.
Abstract: : Modeling the behavior of interacting humans in routine but complex activities has many challenges, not the least of which is that humans can be both purposive and negligent, and further can encounter unexpected environmental hazards requiring fast action. The challenge is to characterize and model the humdrum routine while at the same time capturing the deviations and anomalies which arise from time to time. Because of the disruptive impact that anomalies (such as accidents) can have and the importance for incorporating them in our models, this report focuses on one technique for identifying anomalies in complex behavior patterns especially when there is no sharp demarcation between routine and unusual activity. The technique we evaluate is that of Mahalanobis distance which is known to be useful for identifying outliers when data is multivariate normal. But, the data we use for evaluation is deliberately markedly non-multivariate normal since that is what we confront in complex human systems. Specifically, we use one year's (2008) hourly traffic-volume data on a major multi-lane road (I-95) in one location in a major city (New York) with a dense population and several alternate routes. The traffic data is rich, large, incomplete, and reflects the effects of bad weather, accidents, routine fluctuations (rush hours versus dead of night), and onetime social events. The results show that Mahalanobis distance is a useful technique for identifying both single-hour outliers and contiguous-time clusters whose component members are not, in themselves, highly deviant.
34 citations
••
10 Jun 2001
TL;DR: In this paper, a Mahalanobis space was constructed from the wafers that had high yields, and the relationship between yield and distance was analyzed in order to study the relationship.
Abstract: The distribution of yield from the production lines is concentrated at a high-yield area and tapers down to the lower-yield area. Production management would find it useful if the yield of individual wafers could be forecast. The yield is determined by the variability of electrical characteristics and dust. In this study, only the variability of electrical characteristics was discussed. One product was selected for study, and a Mahalanobis space was constructed from the wafers that had high yields. The Mahalanobis distances of various wafers ware calculated in order to study the relationship between yield and distance.
34 citations
••
TL;DR: In this article, a performance degradation indicator RUL prediction model is established for a rolling bearing with a single performance degradation metric, which is based on the kurtosis-correlation coefficient (K-C) criteria.
Abstract: Aiming at the problem of poor prediction performance of rolling bearing remaining useful life (RUL) with single performance degradation indicator, a novel based-performance degradation indicator RUL prediction model is established. Firstly, the vibration signal of rolling bearing is decomposed into some intrinsic scale components (ISCs) by piecewise cubic Hermite interpolating polynomial-local characteristic-scale decomposition (PCHIP-LCD), and the effective ISCs are selected to reconstruct signals based on kurtosis-correlation coefficient (K-C) criteria. Secondly, the multi-dimensional degradation feature set of reconstructed signals is extracted, and then the sensitive degradation indicator IICAMD is calculated by fusing the improved independent component analysis (IICA) and Mahalanobis Distance (MD). Thirdly, the false fluctuation of the IICAMD is repaired by using the gray regression model (GM) to obtain the health indicator (HI) of the rolling bearing, and the start prediction time (SPT) of the rolling bearing is determined according to the time mutation point of HI. Finally, generalized regression neural network (GRNN) model based on HI is constructed to predict the RUL of rolling bearing. The experimental results of two groups of different rolling bearing data-sets show that the proposed method achieves better performance in prediction accuracy and reliability.
34 citations
••
TL;DR: In this article, the authors used multivariate methods such as the Mahalanobis distance, the Jackknife distance, p -values, or Hadi's to identify potential outliers and avoid eliminating valid data.
Abstract: Evaluating water quality data for outliers is a good quality control/quality assessment procedure whether the data are used for monitoring or for modeling. Often water quality data are correlated, e.g., carbonaceous biochemical oxygen demand (CBOD) has some correlation with N H3 . Univariate methods for identifying outliers do not consider the correlation between variables and may identify too many data points as outliers or miss observations which have extreme ratios between variables, e.g., a raw wastewater sample with relatively low CBOD but high N H3 . Testing for outliers using multivariate methods such as the Mahalanobis distance, Jackknife distance, p -values, or Hadi’s automatically incorporates the correlation or covariance between variables and is fundamentally more correct. Such multivariate methods can better identify potential outliers and avoid eliminating valid data.
34 citations
••
TL;DR: In this article, a contaminated multivariate normal distribution with two parameters indicating the percentage of outliers and the degree of contamination is used to identify the multivariate outliers, which can then be eliminated to obtain approximately normal data.
Abstract: Multivariate outliers may be modeled using the contaminated multivariate normal distribution with two parameters indicating the percentage of outliers and the degree of contamination. Recent developments in elliptical distribution theory are used to determine estimators of these parameters. These estimators can be used with an index of Mahalanobis distance to identify the multivariate outliers, which can then be eliminated to obtain approximately normal data. The performance of the proposed estimators and outliers rejection procedures are evaluated in a small simulation study.
34 citations