scispace - formally typeset
Search or ask a question

Showing papers on "Mahalanobis distance published in 1986"


Journal ArticleDOI
TL;DR: In this article, the standard distance for the quantity in univariate analysis is generalized to the multivariate situation, where it coincides with the square root of the Mahalanobis distance between two samples.
Abstract: We propose to use the term standard distance for the quantity in univariate analysis and show that it can be easily generalized to the multivariate situation, where it coincides with the square root of the Mahalanobis distance between two samples.

245 citations


Journal ArticleDOI
TL;DR: An adaptive rule which selects k by iteratively maximizing the local Mahalanobis distance is shown to be efficient, thus abrogating the need to know the underlying population variance-covariance structure.
Abstract: A simulation study was performed to investigate the sensitivity of the k -nearest neighbor (NN k ) rule of classification to the choice of k . The optimal choice of k was found to be a function of the dimension of the sample space, the size of the space, the covariance structure and the sample proportions. The nearest neighbor rules chosen using the k suggested by the simulations had correct classification rates at least as high as those rates for the linear discriminant function and the logistic regression method. In particular, the rule became more efficient as the difference in the covariance matrices increased, and also when the difference in sample proportion was large. An adaptive rule which selects k by iteratively maximizing the local Mahalanobis distance is shown to be efficient, thus abrogating the need to know the underlying population variance-covariance structure.

83 citations


Journal ArticleDOI
TL;DR: In this article, the univariate weak convergence theorem of Murota and Takeuchi (1981) is extended for the Mahalanobis transform of the empirical characteristic function, and a maximal deviation statistic is proposed for testing the composite hypothesis of $d$-variate normality.
Abstract: The univariate weak convergence theorem of Murota and Takeuchi (1981) is extended for the Mahalanobis transform of the $d$-variate empirical characteristic function, $d \geq 1$. Then a maximal deviation statistic is proposed for testing the composite hypothesis of $d$-variate normality. Fernique's inequality is used in conjunction with a combination of analytic, numerical analytic, and computer techniques to derive exact upper bounds for the asymptotic percentage points of the statistic. The resulting conservative large sample test is shown to be consistent against every alternative with components having a finite variance. (If $d = 1$ it is consistent against every alternative.) Monte Carlo experiments and the performance of the test on some well-known data sets are also discussed.

70 citations


Journal ArticleDOI
TL;DR: In this article, the authors used the Furthestneighbor cluster analysis to select experimental design points from existing series of candidates when the design variables are too interrelated to be manipulated independently.
Abstract: This article concerns the selection of experimental design points from existing series of candidates when the design variables are too interrelated to be manipulated independently. Designs with an even spread of points are shown to estimate the parameters of an assumed linear or polynomial model reasonably efficiently while providing good tests of lack of fit. Furthestneighbor cluster analysis can be used to select the points of such a design under either the Euclidean or the Mahalanobis measure of distance. The technique is used to select the base fuels in actual series of experiments to measure the effect of blending a particular alcohol into gasolines. A new blending model parameterization is proposed, which relates the blending octane number of this alcohol to both its concentration and to the properties of the base fuel. An analagous generalized least squares model is discussed, which gives a simple expression for the expected mean squares in different error strata.

32 citations


Journal ArticleDOI
TL;DR: It is shown that neither Z nor W is absolutely superior to the other and their relative performance is dependent on the extent of correlation among the training observations and the size of the separation between the two populations, as measured by the Mahalanobis distance.

12 citations


Journal ArticleDOI
TL;DR: In this article, Monte Carlo estimates have been obtained for the unconditional probability of misclassification incurred by the "estimative" optimum allocation rule in discriminant analysis involving mixtures of binary and continuous variables.
Abstract: Monte Carlo estimates have been obtained for the unconditional probability of misclassification incurred by the “estimative” optimum allocation rule in discriminant analysis involving mixtures of binary and continuous variables. The rule is based on the location model and leads effectively to a different linear discriminant function for each of the multinomial locations defined by the binary variables. A comparison is made between the Monte Carlo estimates and an approximation based on an asymptotic expansion of the distribution of the location “estimative” linear discriminant function derived by Vlachonikolis. Results are presented for various combinations involving equal sample sizes of 50, 100 and 200; two and three binary variables; one, three and five continuous variables; three different settings of location Mahalanobis distances and several choices of location probabilities.

7 citations


Book ChapterDOI
01 Jan 1986
TL;DR: In this article, the authors discuss three multivariate techniques, namely discriminant analysis, cluster analysis and canonical correlation analysis; for each of these three techniques, examples are given in the literature which use PCA as a dimension-reducing technique.
Abstract: Principal component analysis is often used as a dimension-reducing technique within some other type of analysis. For example, Chapter 8 described the use of PCs as regressor variables in a multiple regression analysis. The present chapter discusses three multivariate techniques, namely discriminant analysis, cluster analysis and canonical correlation analysis; for each of these three techniques, examples are given in the literature which use PCA as a dimension-reducing technique.

4 citations


Journal ArticleDOI
TL;DR: In this paper, the classification performance of forward subset selection procedures designed for use in the two group, P -variate normal classification problem was examined in Monte Carlo studies of 54 cases where the P measurements were statistically independent and provided an optimal probability of correct classification of 90%.
Abstract: The classification performance of forward subset selection procedures designed for use in the two group, P -variate normal classification problem was examined in Monte Carlo studies of 54 cases where the P measurements were statistically independent and provided an optimal probability of correct classification of 90%. The cases were characterized by differing reference sample sizes, sample size ratios and different rates at which the Mahalanobis distances would increase if the forward selection algorithm were applied to the population parameters. Classification performance appears to be dependent upon these underlying rates, which would be unknown in practice. Therefore, uniform specification of optimal “significance levels” for the standard F tests cannot be made A two-stage subset selection procedure which involves determining this rate before applying the F tests is suggested.

4 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the multiple decision problem of subset selection, restricting attention to procedures which control the probability that the best population is selected, and derive necessary and sufficient conditions for both pointwise and uniform (on compact sets) consistency.

2 citations


Proceedings ArticleDOI
D. Friedman1
01 Apr 1986
TL;DR: Vowel classification is considered from the viewpoint of cluster separation in a vector space, with Mahalanobis distance as the criterion and the number of significant axes of variation needed to characterize each speaker is found to be on the order of four.
Abstract: Vowel classification is considered from the viewpoint of cluster separation in a vector space, with Mahalanobis distance as the criterion. The number of significant axes of variation needed to characterize each speaker, weighted with respect to cluster separation, is found from actual formant data to be on the order of four, and the potential improvement in separation accountable to structure in the data is estimated at about 3 db by comparison with results for the same procedure applied to random data.

2 citations


Journal ArticleDOI
TL;DR: In this paper, a random sample of eleven-grade students were asked about certain work values and their fathers were also asked about the same work values, and a question raised in the study was whether eleventh grade students were closer to their fathers or to their age group with regard to those work values.
Abstract: A random sample of eleventh-grade students were asked about certain work values. For a subsample of these students, their fathers were also asked about the same work values. A question raised in the study was whether eleventh-grade students were closer to their fathers or to their age group with regard to those work values. To answer this question, we suggest a method based on Mahalanobis distances; two aspects of the problem are discussed. First, how the distances are constructed; second, which statistical procedures apply for assessing significant differences among the distances. Formal statistical tests are employed, as well as graphical methods for summarizing the results. Unlike classical methods, the procedure suggested does not require distributional assumptions.

Journal ArticleDOI
TL;DR: The conclusions are that the combination of the principal component analysis method and the method by Mahalanobis's distance is useful in saving manpower for routine judgment of the same data produced in the same line for several measured items.
Abstract: Research and development of LSIs require rapid evaluation of the characteristics of fabricated devices For this purpose automatic measurement systems have been developed in various laboratories Some outlying data unavoidably exist in the data collected by the automatic data acquisition system, and it is necessary to judge the outliers in data processing In this paper two algorithms to judge outliers are examined and one new algorithm is presented They applied for Si wafer inspection data collected automatically The three algorithms are the outlier judgment method proposed by Grubbs, the judgment method using Mahalanobis's distance and the method combining the principal component analysis method with the latter method The conclusions are as follows: (1) Grubb's method is easy to use for one kind of measured items (1) Mahalanobis's method is useful for several kinds of measured items and is especially effective when they are correlated (3) The combination of the principal component analysis method and the method by Mahalanobis's distance is useful in saving manpower for routine judgment of the same data produced in the same line for several measured items

Journal ArticleDOI
TL;DR: In this paper, the authors generalize the notion of classification of an observation (sample), into one of the given n populations to the case where some or all of the populations into which the new observation is to be classified may be new but related in a simple way to the given N populations.
Abstract: In this paper, we generalize the notion of classification of an observation (sample), into one of the given n populations to the case where some or all of the populations into which the new observation is to be classified may be new but related in a simple way to the given n populations. The discussion is in the frame-work of the given set of observations obeying the usual multivariate general linear hypothesis model. The set ofpopulations into which the new observation may be classified could be linear manifolds of the parameter space or their closed subsets or closed convex subsets or a combination of them or simply t subsets of the parameter space each of which has a finite number of elements. In the last case alikelihood ratio procedure can be obtained easily. Classification procedures given here are based on Mahalanobis distance. Bonferroni lower bound estimate of the probability of correctly classifying an observation is given for the case when the covariance matrix is known or is estimated from a l...

Journal ArticleDOI
TL;DR: In this article, the authors describe two BASIC computer programs that calculate Hotelling's T2 statistic either for one sample or for two samples, and output of the program includes the Mahalanobis distance D2, the F ratio associated with T2, and its probability level.
Abstract: This paper describes two BASIC computer programs that calculate Hotelling's T2 statistic either for one sample or for two samples. Output of the program includes the Mahalanobis distance D2, the F ratio associated with T2, and its probability level.