scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1978"


Journal ArticleDOI
TL;DR: In this paper, the influence function is used to detect outliers in discriminant analysis, which is a quadratic function of the deviation of the discriminant score for the perturbed observation from the mean of the corresponding group.
Abstract: The influence function is used to develop criteria for detecting outliers in discriminant analysis. For Mahalanobis' D2, the influence function is a quadratic function of the deviation of the discriminant score for the perturbed observation from the discriminant score for the mean of the corresponding group. A X2 approximation to the null distribution of the influence function values appears to be suitable for graphical representation.

94 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated cases in which disturbances are normally distributed with constant variance except for one or more outliers whose disturbances are taken from a normal distribution with a much larger variance.
Abstract: Previous research has indicated that minimum absolute deviations (MAD) estimators tend to be more efficient than ordinary least squares (OLS) estimators in the presence of large disturbances. Via Monte Carlo sampling this study investigates cases in which disturbances are normally distributed with constant variance except for one or more outliers whose disturbances are taken from a normal distribution with a much larger variance. It is found that MAD estimation retains its advantage over OLS through a wide range of conditions, including variations in outlier variance, number of regressors, number of observations, design matrix configuration, and number of outliers. When no outliers are present, the efficiency of MAD estimators relative to OLS exhibits remarkably slight variation.

52 citations


Journal ArticleDOI
TL;DR: The authors proposed a two-stage test for the presence of two outliers or one outlier in two-way tables, where the test statistics are estimated by Monte Carlo generations, and approximations to the percentage points are suggested.
Abstract: Previous work by Gentleman and Wilk on outliers in two-way tables is summarized, and their statistic QK is discussed. Instead of a plot of QK values, we propose a two-stage test for the presence of two outliers or one outlier. Percentage points for the test statistics are estimated by Monte Carlo generations, and approximations to the percentage points are suggested. The 8 × 12 case is examined in detail and extensions to other cases are briefly discussed.

39 citations


Journal ArticleDOI
TL;DR: In this article, it is proposed that bivariate data should be trimmed of those points which define the convex hull, for robust estimation of the product-moment correlation coefficient.
Abstract: SUMMARY It is proposed that bivariate data should be trimmed of those points which define the convex hull, for robust estimation of the product-moment correlation coefficient. Properties of this method are examined by a Monte Carlo investigation. Other applications are mentioned. THE product-moment correlation coefficient, like many other parametric estimators, is sensitive to outliers and disturbances in the tails of the bivariate distribution of quantitative variables. For this reason there is much to recommend the routine application of a trimming procedure before this statistic is calculated. Previous authors, e.g. Nath (1971), have investigated what may be termed a rectangular trimming procedure: that is, each distribution independently is truncated in each tail. While this method has the virtue of simplicity, there may be certain objections to its use. Firstly, it takes no account of the multivariate structure of the data. This may well be important particularly if the truncated sample will be used in more complex multivariate procedures. Secondly, the rectangular trimmed product-moment correlation coefficient is almost certain to be biased toward zero as an estimate of the population correlation. Nath (1971) gives an example of a correlation of 079 which reduced to 065 on single truncation of either distribution resulting in 23 per cent of the sample being eliminated. A bias correction may be calculated assuming a bivariate normal distribution, but is computationally tedious, even though Dyer (1973) has proposed a method avoiding complex iteration.

34 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined the effect of various correlation structures of observations on rules for estimating a mean which are designed to quard against the possibility of spurious observations (that is, observations generated in a manner not intended).
Abstract: : This paper examines the effect of various correlation structures of observations on rules for estimating a mean which are designed to quard against the possibility of spurious observations (that is, observations generated in a manner not intended). The premium and protection of these rules are evaluated and discussed for the equi-correlation case and for the case of an autoregressive process of first order. It is shown that the premium and protection of a given rule which is designed for the estimator of a general mean mu when spuriosity may exist and when the observations are independent, lacks robustness to departures from independence. It is also shown that in moderate sized samples a spurious observation could seriously bias the usual estimator of the autoregressive coefficient alpha. One application of these results is in the case of a first order autoregressive model which can be used to represent many time series data encountered in business and economics.

32 citations


Book ChapterDOI
01 Jan 1978
Abstract: The bias and mean square error of various location estimators, expressible as linear functions of order statistics, are studied when an unidentified single outlier is present in a sample of size n. Specific attention is paid to the cases when the outlier comes from a population differing from the target population either in location or scale. When, in addition, the target population is normal, exact numerical results have been obtained for n = 5, 10, 20 and are presented here for n = 10. The estimators included are the mean, median, trimmed means, Winsorized means, linearly weighted means, and Gastwirth mean.

31 citations


Journal ArticleDOI
TL;DR: A graphical method is described for the examination of the behaviour of tests for outliers when one or two outliers of varying magnitude are present in the data.
Abstract: A graphical method is described for the examination of the behaviour of tests for outliers when one or two outliers of varying magnitude are present in the data. The method involves computing a sensitivity surface for the test statistic and, from this surface, determining contours corresponding to selected critical values. The contour diagrams so formed give an indication of the behaviour of the test statistic for a variety of data configurations. The tests for the single sample case include one‐outlier tests, many‐outlier tests and sequentially applied tests for a specified maximum number of outliers.

16 citations


DissertationDOI
01 Jan 1978

14 citations


Journal ArticleDOI
TL;DR: In this article, three well-known test statistics for the precense of two outliers are studied from the points of views of exact null distribution and power, and a recursive version of the Pearson-Chandrasekar test is shown to perform well.
Abstract: Summary Three well-known test statistics for the precense of two outliers are studied from the points of views of exact null distribution and power. In a sample case, it is shown that the Murphy two outlier statistic may suffer from masking. A recursive version of the Pearson-Chandrasekar test is shown to perform well.

8 citations


ReportDOI
01 May 1978
TL;DR: In this article, the general track smoothing program (MASM3DRJ) was used at NUWES using linear, parabolic, and logarithmic functions to fit 3D data files on torpedo paths by the method of least squares.
Abstract: : The general track smoothing program (MASM3DRJ) in use at NUWES uses linear, parabolic, and logarithmic functions to fit 3-D data files on torpedo paths by the method of least squares. Polynomial functions of the first (linear), second (parabolic), third, and fourth orders were fitted to data for a variety of path segments of a torpedo run at NUWES using the method of least squares. Results suggest expansion of the program to include higher order polynomials and fitting shorter path segments will provide substantial reduction in residual errors. The method of sequential differences was tried on the data and can be incorporated in the smoothing program as a means of identifying outlier data points and of selecting the appropriate polynomial order for fitting the data.

5 citations


Journal ArticleDOI
TL;DR: In this paper, a modification of the rule suggested by Tiao and Guttman (1967) is considered for the estimation of the mean of a normal population using data with an unspecified number of outliers.
Abstract: A modification of the rule suggested by Tiao and Guttman (1967) is considered for the estimation of the mean of a normal population using data with an unspecified number of outliers. The properties of the modified rule are investigated by considering its premium and protection. For the case where the number of outliers, i, is specified, it is shown that the Tiao-Guttman tables for m 2 are adequate for all values of m.

Book ChapterDOI
01 Jan 1978
TL;DR: In this article, the problem of outliers in the regression model is considered and the use of the maximum absolute studentized residual, Rn, for identification of the outlier has been suggested by a number of authors.
Abstract: The problem of outliers in the regression model is considered. For the case of one outlier at most, the use of the maximum absolute studentized residual, Rn, for identification of the outlier has been suggested by a number of authors. Simulation studies of the power of a conservative test based on Rn for identifying single outliers in regression models with one, two, and three independent variables are reported. The case of multiple outliers is also considered and techniques for their identification are discussed. A simulation study of a sequential procedure for handling two outliers is reported.

Journal ArticleDOI
TL;DR: In this paper, the presence of an outlier score (that is, an extreme score for an independent t test or an extreme difference score for a correlated t test) makes it difficult to get a t large enough for a t table to show significance.
Abstract: Summary The presence of an outlier score-that is, an extreme score for an independent t test or an extreme difference score for a correlated t test—makes it difficult to get a t large enough for a t table to show significance When there are outlier scores, the use of a randomization test to determine significance is more likely to reveal a treatment effect than the use of t tables


Book ChapterDOI
TL;DR: In this article, the authors discuss the real-time validation of air quality data and propose to use robust statistical parameters to minimize the role of high and low data in the validation process.
Abstract: Publisher Summary This chapter discusses the real time validation of air quality data. Collecting air quality data consists of the following steps: sampling, analysis, data processing, and data storage. The first two steps may entail many mistakes, which greatly influence the value of the collected data. Therefore, complementary methods for validation of air quality data are necessary. The need for validation is strong when the reliability of the chosen techniques is low, and it is especially strong when dealing with data outliers. One way to solve the problem of air quality data validation is to minimize the role of high and low data by using robust statistical parameters. Instead of an arithmetic mean, the median should be preferred in the case of homogeneous distributions. When data from two networks are compared, quartiles rather than extreme percentiles should be used. This solution of the outlier problem, however, has only restricted value. From the point of view of community health, it is no solution at all. Another way to solve the problem of air quality data validation is to check a number of essential steps in the process of data acquisition and data transmission and to check a number of parameters in the monitors.