scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1975"


Journal ArticleDOI
TL;DR: In this article, two graphical methods are proposed for identifying bivariate observations that may unduly influence the sample correlation coefficient and robust estimators of correlation are developed and a Monte Carlo comparative study is made of these and other wellknown estimators.
Abstract: Two graphical methods are proposed for identifying bivariate observations that may unduly influence the sample correlation coefficient. Secondly, robust estimators of correlation are developed and a Monte Carlo comparative study is made of these and other wellknown estimators. Also considered are methods for developing positive-definite estimates of correlation matrices and extensions of robustness to other problems such as regression are mentioned.

355 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed several many outlier procedtues to detect more than one outlier in a sample and compared several different procedtuses in various sample sizes.
Abstract: This article is concerned with “many outlier” procedures i.e., procedures that can detect more than one outlier in a sample. Several many outlier procedtues are proposed in Section 2 and via power comparisons in Section 3 are found to be much superior to one outlier procedures in detecting many outliers. We then compare several different. many outlier procedures in Section 4 and find that the procedutre based on the extreme studentized deviate (ESD) is slightly the best. Finally, 5%, 1% and .5% points are given for the ESD procedure for various sample sizes in Section 5.

257 citations


Journal ArticleDOI
TL;DR: In this article, a test for the largest residual being an outlier is implemented through table development, and tables of critical valltes for tests at levels (α ≤ 0.10, 00.5, and 0.01) are included.
Abstract: Residuals from fit are often examined in regression analysis. A test suggested by Ellenberg [5] and Prescott [7] for the largest residual being an outlier is implemented through table development. Tables of critical valltes for tests at levels (α ≤ 0.10, 00.5, and 0.01 are included.

234 citations


Journal ArticleDOI
TL;DR: In this article, a test statistic for detecting outliers in linear models involving residuals standardized by their individual standard deviations is considered and it is suggested that its critical values are adequately approximated by upper bounds for the critical values of a similar test statistic involving residual values standardized by a constant standard deviation.
Abstract: A test statistic for detecting outliers in linear models involving residuals standardized by their individual standard deviations is considered and it is suggested that its critical values are adequately approximated by upper bounds for the critical values of a similar test statistic involving residuals standardized by a constant standard deviation. The test procedure is applicable to any linear model and does not require a re-analysis with the suspected outlier omitted or treated as a missing value. Two regression analyses are given to illustrate the procedure.

96 citations


Journal ArticleDOI
TL;DR: In this paper, the statistical properties of residuals from additivity in two-way tables are investigated. But the results are based mainly on empirical sampling and involve average values, correlation properties, and the use of the W-statistic (Shapiro and Wilk [24] and of probability plotting methods.
Abstract: This paper deals with some of the statistical properties of, and methods of analysis for, conventional residuals from additivity in two-way tables. Attention is given to three cases: (i) normal fluctuations superimposed on an additive model; (ii) one outlier added to the condition described in (i); and (iii) two outliers superimposed on the condition in (i). The results are based mainly on empirical sampling and involve average values, correlation properties, the use of the W-statistic (Shapiro and Wilk [24]) and of probability plotting methods. Generally speaking, in the null case of no outliers, the residuals do behave much like a normal sample. When one outlier is present, the direct statistical treatment of residuals provides a complete basis for data-analytic judgments, especially through judicious use of probability plots. When two outliers are present, however, the resulting residuals will often not have any noticeable statistical peculiarities.

48 citations


Posted Content
TL;DR: In this paper, a modification to the w-estimator is proposed that is robust to outlier contamination even in small samples, given a sufficiently good preliminary estimator, and a candidate for a preliminary slope estimator based on the data is proposed arid its performance under simulation examined.
Abstract: The estimator holding the central place in the theory of the multivariate "errors-in-the-variables" (EV) model results from performing orthogonal recession on variables rescaled according to the covariance matrix of the errors [7]. Our first principal finding, via Monte Carlo on the univariate model, essentially relegates this estimator to use only in large samples on very well-behaved data, i.e., with no trace of outlier contamination. A modification, requiring a robust preliminary slope, is proposed that essentially sets out the generalization to EV of the w-estimator in regression. It is demonstrated that the modification is robust to outlier contamination even in small samples, given a sufficiently good preliminary estimator. A candidate for a preliminary slope estimator based on the data is proposed arid its performance under simulation examined. Least-absolute residuals estimation in EV is cited as an alternative candidate.

46 citations



Journal ArticleDOI
TL;DR: A procedure is developed, using a Bayesian cost criterion, to detect and eliminate outliers from a data base and at the same time provide estimates of the state of a dynamical system.
Abstract: An outlier is a data point that contains no information about the system to be estimated. A procedure is developed, using a Bayesian cost criterion, to detect and eliminate outliers from a data base and at the same time provide estimates of the state of a dynamical system. The approach is applied to a Gauss-Markov discrete-time system and to a parameter estimation problem. For the latter case, exact solutions of estimator bias and convariance are obtained and conditions for filter divergence are discussed. The approach in this paper differs from others in that a maximum a posteriori estimate is obtained over long block lengths of data so that clustering schemes can be employed.

10 citations


Journal ArticleDOI
TL;DR: The joint distribution of two weighted residuals for a normal theory regression model is derived and some of its properties are studied in this article, where a useful bound depending on the residual variances for the correlation coefficient between any two residuals is obtained.
Abstract: The joint distribution of two weighted residuals for a normal theory regression model is derived and some of its properties are studied. A useful bound depending on the residual variances for the correlation coefficient between any two residuals is obtained. An application of this bound in the detection of a single outlier is also considered.

3 citations


01 Jan 1975
TL;DR: A novel system for pattern recognition in unsupervised environments which combines the conceptual elegance of clustering schemes based on inter-sample distance measures with the computational simplicity of histogram approaches, is presented in this study.
Abstract: A novel system for pattern recognition in unsupervised environments. which combines the conceptual elegance of clustering schemes based on inter-sample distance measures with the computational simplicity of histogram approaches, is presented in this study. The mUlti-dimensional histogram of the entire data set is first derived and by scanning this histogram space, the significant hills therein are identified. The centroids of these hills are deemed to be representative of the given input sample set. This representative pseudo-sample set is then input to the CURRY system (International Journal of System Science, Vol. 6, No. I, January 1975, pp 23-32), which has the innovative capability of self learning the number of clusters inherent in the environment, to derive the nuclei of these inherent clusters. The total input data set is then clustered with these cluster nuclei as prototypes. The two major advantages of this new approach are: • The conceptual satisfaction of lessening the sensitivity, of the clustering approaches based on inter-sample distance measures, to individual outliers of the sample distributions through selection of a representative pseudo~ sample set. • The computational economy achieved in processing large data sets, such as those arising in remote sensing environments, through the choice of a significantly smaller but representative subs et of pseudosamples. 2A-45 AUTOMATIC CLASSIFICATION OF AIRCRAFT AND SATELLITE IMAGES USING MIXED INTEGER PROGRAMMING M. Rebollo and L. F. Escudero IBM Center IBM Espana International Business Machines, S. A. E. Madrid, Spain

2 citations


Posted Content
TL;DR: In this article, a modification to the w-estimator is proposed that is robust to outlier contamination even in small samples, given a sufficiently good preliminary estimator, and a candidate for a preliminary slope estimator based on the data is proposed arid its performance under simulation examined.
Abstract: The estimator holding the central place in the theory of the multivariate "errors-in-the-variables" (EV) model results from performing orthogonal recession on variables rescaled according to the covariance matrix of the errors [7]. Our first principal finding, via Monte Carlo on the univariate model, essentially relegates this estimator to use only in large samples on very well-behaved data, i.e., with no trace of outlier contamination. A modification, requiring a robust preliminary slope, is proposed that essentially sets out the generalization to EV of the w-estimator in regression. It is demonstrated that the modification is robust to outlier contamination even in small samples, given a sufficiently good preliminary estimator. A candidate for a preliminary slope estimator based on the data is proposed arid its performance under simulation examined. Least-absolute residuals estimation in EV is cited as an alternative candidate.

01 Jul 1975
TL;DR: In this article, a sequential test of a simple hypothesis of the distribution of a random variable against a simple alternate hypothesis is proposed, which terminates as soon as one of a sequence of sequentially observed sample medians falls outside a 'continuation region'.
Abstract: : A sequential test of a simple hypothesis of the distribution of a random variable against a simple alternate hypothesis is proposed The test terminates as soon as one of a sequence of sequentially observed sample medians falls outside a 'continuation region' The test can also be used for hypotheses concerning the median of the sampled population, and is especially useful when hypothesized distributions may provide poor fit in the tails, in which case 'outliers' may seriously degrade the performance of traditional procedures such as the Sequential Probability Ratio Test Applications to testing hypotheses about the circular error probable of weapon systems are discussed, and the tables of stopping bounds for such tests are presented (Author)