scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1972"


Journal ArticleDOI
TL;DR: An overview of concepts and techniques pertaining to (i) the robust estimation of multivariate location and dispersion; (ii) the analysis of two types of multidimensional residuals; and (iii) the detection of multiresponse outliers.
Abstract: SUMMARY The paper gives an overview of concepts and techniques pertaining to (i) the robust estimation of multivariate location and dispersion; (ii) the analysis of two types of multidimensional residuals-namely those that occur in the context of principal components analysis as well as the more familiar residuals associated with least squares fitting; and (iii) the detection of multiresponse outliers. The emphasis is on methods for informal exploratory analysis and the coverage is both a survey of existing techniques and an attempt to propose, tentatively, some new methodology which needs further investigation and development. Some examples of use of the methods are included.

793 citations


Journal ArticleDOI
TL;DR: In this article, the authors considered two types of outliers that may occur in a time series, i.e., a gross error of observation or recording error affects a single observation, and a single "innovation" is extreme.
Abstract: THE detection of outliers has mainly been considered for single random samples, although some recent work deals also with standard linear models; see, for example, Anscombe (1960) and Kruskal (1960). Essentially similar problems arise in time series (Burman, 1965) but there seems no published work taking into account correlations between successive observations. In the past, the search for outliers in time series has been based on the assumption that the observations are independently and identically normally distributed. This assumption leads to analyses which will be called random sample procedures. Two types of outlier that may occur in a time series are considered in this paper. A Type I outlier corresponds to the situation in which a gross error of observation or recording error affects a single observation. A Type II outlier corresponds to the situation in which a single "innovation" is extreme. This will affect not only the particular observation but also subsequent observations. For the development of tests and the interpretation of outliers, it is necessary to distinguish among the types of outlier likely to be contained in the process. The present approach is based on four possible formulations of the problem: the outliers are all of Type I; the outliers are all of Type II; the outliers are all of the same type but whether they are of Type I or of Type II is not known; and the outliers are a mixture of the two types. Since more practical solutions than those given by likelihood ratio methods are often obtained from simplifications of likelihood ratio criteria, some simpler criteria are derived. These criteria are of the form /&2a, where A is the estimated error in the observation tested and ^ is the estimated standard error of A. Throughout this paper, trend and seasonal components are assumed either negligible or to have been eliminated. The method adopted to remove these components might affect the results in some way.

751 citations


Journal ArticleDOI
TL;DR: In this article, the problem of repeated application and masking is discussed and two new statistics are proposed to over-come these problems: Lk which is based on the k largest (observed) values and Ek which is in absolute value (in absolute value) residuals.
Abstract: Several widely used tests for outlying observations are reviewed. Problems of repeated application and “masking” are described. Suggested as appropriate to over-come these problems are two new statistics: Lk which is based on the k largest (observed) values and Ek which is based on the k largest (in absolute value) residuals. Tables of approximate critical values for these statistics are given for 0.01, .025,0.05, and 0.10 levels of significance and for sample size n = 3 (1) 20 (5) 50.

253 citations


Journal ArticleDOI
TL;DR: In this article, the problem of selecting an efficient estimator of the expected value in the presence of an outlying observation with higher expected value is discussed, and an iterative procedure for the estimation of the mean is provided and the method is illustrated by considering an example.
Abstract: This paper extends some of the results obtained in a recent paper by Kale and Sinha [3] for the exponential distribution. The problem of selecting an efficient estimator of the expected value in the presence of an outlying observation with higher expected value is discussed. An iterative procedure for the estimation of the mean is provided and the method is illustrated by considering an example.

36 citations


Journal ArticleDOI
TL;DR: One-sided and two-sided test statistics for detecting a single outlier are proposed in this paper, where Bonferroni and other inequalities are used to obtain upper and lower limits for the true upper percentage points.
Abstract: SUMMARY The problem of detecting a single outlier in a linear regression model is considered. Results of Srikantan (1961), David & Paulson (1965) and Doornbos (1966) are unified and extended. The various cases considered are: (i) known variance, (ii) external studentization, and (iii) pooled studentization. One-sided and two-sided test statistics for detecting a single outlier are proposed. These statistics are maxima of suitably standardized or studentized weighted residuals. Bonferroni and other inequalities are used to obtain upper and lower limits for the true upper percentage points of the proposed statistics. Some appropriate measures of performance are introduced and studied.

19 citations


Journal ArticleDOI
TL;DR: The behavior of Rt, the uniformly minimum variance unbiased estimator of the reliability function for the one-parameter exponential family, is discussed when an outlier observation is present and a new estimator Rt* is proposed, where α reflects the “outlier effect on the scale” parameter.
Abstract: The behavior of Rt, the uniformly minimum variance unbiased estimator (umvne) of the reliability function for the one-parameter exponential family, is discussed when an outlier observation is present. Bounds of E(Rt∣α) and MSE(Rt∣α) have been obtained and a new estimator Rt* is proposed, where α reflects the “outlier effect on the scale” parameter. A “semi-Bayesian” approach is discussed when α is treated as a random variable with a Beta-type prior. A similar solution, when δ reflects the “outlier effect of the location” has been obtained.

12 citations


Journal ArticleDOI
TL;DR: In this paper, confidence intervals and significance tests for the population mean were developed for continuous data with outliers, where either the smallest observation or the largest observation is an outlier, and two kinds of intervals and tests with this property were developed.
Abstract: The (continuous) data are n observations that are believed to be a random sample from a symmetrical population. Confidence intervals and significance tests for the population mean are desired. There is, however, the possibility that either the smallest observation or the largest observation is an outlier. That is, the population providing this observation differs from the symmetrical population providing the other n - 1 observations. If this occurs, intervals and tests are desired for the mean of the population providing the other n - 1 observations. Some investigation difficulties can be overcome if intervals and tests can be developed that are simultaneously usable for all of these three situations (a confidence coefficient, or significance level, has the same value for all three situations). Two kinds of intervals and tests with this property are developed. These results always involve both the next to smallest observations and should have at least moderately high efficiencies. Also, some extensions are considered, such as allowing each observation to be from a different population.

2 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that two-sided intervals and tests based on two symmetrically located order statistics (not the largest and smallest) of the n observations have this property.
Abstract: Available are independent observations (continuous data) that are believed to be a random sample. Desired are distribution-free confidence intervals and significance tests for the population median. However, there is the possibility that either the smallest or the largest observation is an outlier. Then, use of a procedure for rejection of an outlying observation might seem appropriate. Such a procedure would consider that two alternative situations are possible and would select one of them. Either (1) the n observations are truly a random sample, or (2) an outlier exists and its removal leaves a random sample of size n-1. For either situation, confidence intervals and tests are desired for the median of the population yielding the random sample. Unfortunately, satisfactory rejection procedures of a distribution-free nature do not seem to be available. Moreover, all rejection procedures impose undesirable conditional effects on the observations, and also, can select the wrong one of the two above situations. It is found that two-sided intervals and tests based on two symmetrically located order statistics (not the largest and smallest) of the n observations have this property.

1 citations


01 Jan 1972
TL;DR: In this paper, the authors present a manuscript that was written to be part of their doctoral dissertation, but it was left out, and the manuscript was used for research purposes only.
Abstract: Unpublished manuscript. (This was written to be part of my dissertation, but it was left out.)