scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1979"


Journal ArticleDOI
TL;DR: This work compared the methods of Deming, Mandel, and Bartlett in estimating the known slope of a regression line when the independent variable is measured with imprecision, and found the method of Demed to be the most useful.
Abstract: The least-squares method is frequently used to calculate the slope and intercept of the best line through a set of data points. However, least-squares regression slopes and intercepts may be incorrect if the underlying assumptions of the least-squares model are not met. Two factors in particular that may result in incorrect least-squares regression coefficients are: (a) imprecision in the measurement of the independent (x-axis) variable and (b) inclusion of outliers in the data analysis. We compared the methods of Deming, Mandel, and Bartlett in estimating the known slope of a regression line when the independent variable is measured with imprecision, and found the method of Deming to be the most useful. Significant error in the least-squares slope estimation occurs when the ratio of the standard deviation of measurement of a single x value to the standard deviation of the x-data set exceeds 0.2. Errors in the least-squares coefficients attributable to outliers can be avoided by eliminating data points whose vertical distance from the regression line exceed four times the standard error the estimate.

598 citations


Journal ArticleDOI
TL;DR: In this paper, the authors discuss the conditions under which conventional tools, such as financial ratios and measures of industry central tendency, achieve the intended objectives of analysis (e.g., size control).

265 citations


Journal ArticleDOI
TL;DR: It is shown that an outlier can so inflate the estimated SD that its presence is not detected by this method, and alternative estimators that are less influenced by outliers are described and their application to quality-control data is discussed.
Abstract: Results submitted to large-scale quality-control schemes are commonly judged against the mean and standard deviation (SD) of the results from other laboratories. It is desirable to ignore outlying values in estimating this mean and standard deviation, and results more than 2.5 or 3 SD from the mean are commonly rejected. I show that an outlier can so inflate the estimated SD that its presence is not detected by this method. Alternative estimators that are less influenced by outliers are described, and their application to quality-control data is discussed.

181 citations


Journal ArticleDOI
TL;DR: In this article, the aberrant innovation model and aberrant observation model are considered to characterize outliers in time series, allowing for a small probability that any given observation is "bad" and in this set-up the inference about the parameters of an autoregressive model is considered.
Abstract: SUMMARY Two models, the aberrant innovation model and the aberrant observation model, are considered to characterize outliers in time series. The approach adopted here allows for a small probability a that any given observation is 'bad' and in this set-up the inference about the parameters of an autoregressive model is considered.

172 citations


Journal ArticleDOI
TL;DR: In this paper, it is shown that any admissible inference procedure applied to a t sample will effectively ignore extreme outlying observations regardless of prior information, while the normal distribution is outlier-resistant.
Abstract: SUMMARY Inference is considered for a location parameter given a random sample. Outliers are not explicitly modelled, but rejection of extreme observations occurs naturally in any Bayesian analysis of data from distributions with suitably thick tails. For other distributions outlier rejection behaviour can never occur. These phenomena motivate new definitions of outlier-proneness and outlier-resistance. The definitions and methodology are Bayesian but the conclusions also have meaning for nonBayesians because they are proved for arbitrary prior distributions. Thus, for example, the t distribution is said to be outlier-prone because it is shown that any admissible inference procedure applied to a t sample will effectively ignore extreme outlying observations regardless of prior information. On the other hand, the normal distribution, for example, is said to be outlier-resistant because it never allows outlier rejection, regardless of prior information.

164 citations


Journal ArticleDOI
TL;DR: A subset of high accuracy algorithms, including single, average, and centroid linkage using correlation, and Ward's minimum variance technique, was identified and all of the algorithms were significantly more accurate than a random linkage algorithm, and accuracy was inversely related to coverage.
Abstract: Due to the effects of outliers, mixture model tests that require all objects to be classified can severely underestimate the accuracy of hierarchical clustering algorithms. More valid and relevant comparisons between algorithms can be made by calculating accuracy at several levels in the hierarchical tree and considering accuracy as a function of the coverage of the classification. Using this procedure, several algorithms were compared on their ability to resolve ten multivariate normal mixtures. All of the algorithms were significantly more accurate than a random linkage algorithm, and accuracy was inversely related to coverage. Algorithms using correlation as the similarity measure were significantly more accurate than those using Euclidean distance (p < .001). A subset of high accuracy algorithms, including single, average, and centroid linkage using correlation, and Ward's minimum variance technique, was identified.

164 citations


Journal ArticleDOI
TL;DR: In this article, a few robust procedures are mentioned, one of which is motivated by maximum likelihood estimation to make it seem more natural and use of this procedure in regression problems is considered in some detail, and an approximate error structure is stated for the robust estimates of the regression coefficients.
Abstract: Users of statistical packages need to be aware of the influence that outlying data points can have on their statistical analyses. Robust procedures provide formal methods to spot these outliers and reduce their influence. Although a few robust procedures are mentioned in this article, one is emphasized; it is motivated by maximum likelihood estimation to make it seem more natural. Use of this procedure in regression problems is considered in some detail, and an approximate error structure is stated for the robust estimates of the regression coefficients. A few examples are given. A suggestion of how these techniques should be implemented in practice is included.

141 citations


Book ChapterDOI
01 Jan 1979
TL;DR: In this article, the authors provide an overview of robust estimation, including adaptive least square estimators that modify least square schemes so that the outliers have much less influence on the final estimates.
Abstract: Publisher Summary This chapter provides an overview of robust estimation. It is recognized that outliers, which arise from heavy tailed distributions or are simply bad data points because of errors, have an unusually large influence on the least squares estimators. That is, the outliers pull the least squares fit toward them too much; a resulting examination of the residuals is misleading because then they look more like normal ones. Accordingly, robust methods have been created to modify least squares schemes so that the outliers have much less influence on the final estimates. One of the most satisfying robust procedures is that given by a modification of the principle of maximum likelihood. Robust methods have consequently been used successfully in many applications. There has been some evidence that adaptive procedures are of value. The basic idea of adapting is the selection of the estimation procedure after observing the data.

138 citations


Book ChapterDOI
01 Jan 1979
TL;DR: In this paper, the authors present a theory and methodology of robust estimation for time series having two distinctive types of outliers. But the authors do not define a robustness metric for the time series.
Abstract: Publisher Summary This chapter presents some theory and methodology of robust estimation for time series having two distinctive types of outliers. Research on robust estimation in the time series context has lagged behind, and perhaps understandably so in view of the increased difficulties imposed by dependency and the considerable diversity in qualitative features of time series data sets. For time series parameter, estimation problems, efficiency robustness, and min–max robustness are concepts directly applicable. Influence curves for parameter estimates may also be defined without special difficulties. A greater care is needed in defining breakdown points as the detailed nature of the failure mechanism may be quite important. A major problem that remains is that of providing an appropriate and workable definition of qualitative robustness in the time series context. For time series, the desire for a complete probabilistic description of either a nearly-Gaussian process with outliers, or the corresponding asymptotic distribution of parameter estimates, will often dictate that one specify more than a single finite-dimensional distribution of the process. It is only in special circumstances that the asymptotic distribution of the estimate will depend only upon a single univariate distribution or a single multivariate distribution.

56 citations


Journal ArticleDOI
TL;DR: An approach to the analysis of RIA data which incorporates robust estimation methods is described and an algorithm is presented for obtaining the M-estimates of nonlinear calibration curves.
Abstract: The minute concentrations of many biochemically and clinically important substances are currently estimated by radioimmunoassay (RIA). Traditionally, the most popular approaches to the statistical analysis of RIA data have been to linearize the data through transformation and fit the calibration curve using least squares or to directly fit a nonlinear calibration curve using least squares. Estimates of the hormone concentration in patients are then obtained using this curve. Unfortunately, the transformation is frequently unsuccessful in linearizing the data. Furthermore, the least squares fit can lead to erroneous results in both approaches since the many sources of error which exist in the RIA process often result in outlier observations. In this paper, an approach to the analysis of RIA data which incorporates robust estimation methods is described. An algorithm is presented for obtaining the M-estimates of nonlinear calibration curves. The curves to be fitted are modified hyperbolae based on 12 to 16 observations. A procedure, based on the application of the Bonferroni Inequality, is presented for obtaining tolerance-like interval estimates of the concentration of the hormone of interest in the patients. Results of simulations are cited to support the method of construction of confidence bands for the fitted calibration curve. Data obtained from the Veteran's Hospital, Buffalo, New York are used to illustrate the application of the algorithm which is presented.

42 citations


Book ChapterDOI
H.A. David1
01 Jan 1979
TL;DR: In this article, the bias and mean square error of various estimators of location and scale in the presence of an unidentified outlier is discussed, and it is pointed out that the more extreme observations in the sample are typically given little or no weight in the robust estimator.
Abstract: Publisher Summary This chapter discusses the bias and mean square error of various estimators of location and scale in the presence of an unidentified outlier. Publications on robust estimation sometimes convey the impression that tests for outliers are irrelevant. The argument seems to be that robust estimators are constructed to perform reasonably well as long as the number of outliers is not too large. The more extreme observations in the sample are typically given little or no weight in the robust estimator. Moreover, the argument proceeds, those observations that are rejected by some standard test for outliers may not be outliers at all; rather, the case may be of a long-tailed distribution. It is important, therefore, to reiterate another aim of outlier tests, crucial in the proper treatment of data, namely, the identification of observations deserving closer scrutiny. It is sometimes suggested that outlier tests be performed at the 10% or even 20% level. Raising the significance level will certainly improve the robustness of estimators based on the surviving observations.

Journal ArticleDOI
TL;DR: In this paper, it is shown that the computations involve only the residuals in the analysis of the complete data so that reanalysis of samples of reducing size with the suspect observations removed is unnecessary.
Abstract: SUMMARY Tables of critical values are provided for a sequential test for detecting up to three outliers in normal samples. The test procedure is based on the joint distribution of a series of Grubbs'-type statistics applied to reducing samples. It is shown that the computations involve only the residuals in the analysis of the complete data so that reanalysis of samples of reducing size with the suspect observations removed is unnecessary. THE problem of detecting outliers in normal samples has been extensively researched in recent years and a number of test statistics are available for both the single outlier case and the many outlier case for testing a specified number k of outliers. In particular the maximum normed residual, or, equivalently, the Grubbs'-type statistics and adaptations and extensions of them have received considerable attention. These involve an examination of the relative size of the largest residual or the reduction in the residual sum of squares due to the elimination of one or more suspect observations. Details of these tests may be found in Grubbs (1950, 1969), Grubbs and Beck (1972), Tietjen and Moore (1972) and many others. It is well known that although these procedures have certain optimal properties when the number of outliers present is either zero or the specified number k, they may produce mis- leading results when there are fewer or more than k outliers present in the sample. The usual difficulty in practice is deciding the number of outliers for which to test. One approach to solving this problem is to use repeated applications of single outlier procedures, deleting the "outlier" detected at each step and applying the test again to the reduced sample until an insignificant result is obtained. This is not recommended as the presence of two or more outliers may produce an insignificant result in the initial single outlier test. Recently Rosner (1975) examined a number of test statistics applied sequentially to reducing samples similar to the way described above except that the test statistics are calculated for the reducing sample a predetermined number of times, k, to produce k test statistics. These are then compared, in reverse order, with critical values based on their joint distribution under the assumption of no outliers present. The procedure is designed to detect from 1 to k outliers and Rosner's investigations showed that the sequential method using the series of maximum normed residuals from samples of reducing size appears to work very well. Critical values were tabulated by Rosner for k = 2. One possible disadvantage of this approach is the need to re-estimate the mean and variance after each deletion of a suspect observation. To overcome this difficulty Rosner (1977) recently proposed a modified procedure using the trimmed mean and variance obtained by

Journal ArticleDOI
TL;DR: In this article, a systematic approach is presented for extending application of the maximum likelihood criterion to obtain optimum correlations of experimental data without requiring the usual very restrictive assumptions: Gaussian response error dispersion, plus pre-known error or specially structured error relationships.

Journal ArticleDOI
TL;DR: The widely-used Tietjen—Moore multiple outlier statistic has a defect as originally proposed in that it may test the wrong observations as outliers, which is corrected by redefinition and the statistic extended to make use of possible additional information on underlying variance.
Abstract: The widely-used Tietjen—Moore multiple outlier statistic has a defect as originally proposed in that it may test the wrong observations as outliers. The defect is corrected by redefinition and the statistic extended to make use of possible additional information on underlying variance. Results of simulation of the revised statistic are presented.

Book ChapterDOI
01 Jan 1979
TL;DR: This chapter discusses the application of robust regression to trajectory data reduction, the development of methods for linear and nonlinear regression, which are insensitive to a large percentage of outlying observations.
Abstract: Publisher Summary This chapter discusses the application of robust regression to trajectory data reduction. A robust statistical procedure performs well under a variety of underlying distribution functions or in the presence of observations from contaminating distributions. Robust statistics provides a new approach to data editing in trajectory data reduction and has been seen to be highly successful in dealing with the same. There are several applications of robust statistics to data editing in trajectory data reduction: (1) data preprocessing, (2) instrument calibration, (3) N-station cine solution, (4) N-station radar solution, and (5) filtering. In data reduction, the development of methods for linear and nonlinear regression, which are insensitive to a large percentage of outlying observations, is the primary concern. Many sources of outliers are present in trajectory measuring systems. These sources can be broadly categorized as equipment malfunction, outside interference, and human error.

Journal ArticleDOI
TL;DR: In this paper, three tests for a single outlier were proposed and applied to gamma samples with unknown parameters; one based on the principle of maximum likelihood, the others using a transformation of the variables to approximate normality.
Abstract: Three tests for a single outlier are proposed and applied to gamma samples with unknown parameters; one based on the principle of maximum likelihood, the others using a transformation of the variables to approximate normality. Percentage points are tabulated where necessary. The performance of the tests in the presence of a single outlier is investigated in detail and their behaviour when two outliers are present is also considered.


Journal ArticleDOI
TL;DR: A procedure is presented for revising the standard analysis when spurious observations occur in experiments, and properties of this procedure are discussed and numerical results provided.
Abstract: Data collected in experiments often include spurious observations, that is, observations not from the population of interest, that need to be taken into account during data analysis. This article considers a dental experiment the objective of which is to estimate a person's discriminating pressure. Because of the monotony of repetition in this experiment, some data points result in inflated values. A procedure is presented for revising the standard analysis when these spurious observations occur. Properties of this procedure are discussed and numerical results provided. The dental experiment is used to illustrate the procedure. A comparison of this procedure with competing rules is also presented.

Journal ArticleDOI
TL;DR: A procedure has been implemented which enables a biomedical researcher to view both the estimated probability and the numerical value of a data point's coordinates, which circumvents the problem of interpreting a normal range in two or more dimensions.

Journal ArticleDOI
TL;DR: The concept of characteristic functional was first published by A. N. Kolmogorov to be unnoticed for twelve years and was reintroduced in 1947 by L. M. LeCam as discussed by the authors inspired by studies of meteorology.
Abstract: Invariably, a persistent study of a category of natural phenomena generates novel mathematical developments, occasionally including sophisticated mathematical concepts. Of the mathematical developments generated by meteorology and efforts to modify weather, the most sophisticated seems to be the concept of characteristic functional. It was first published by A. N. Kolmogorov to be unnoticed for twelve years. It was reintroduced in 1947 by L. M. LeCam, inspired by studies of meteorology. Substantially less sophisticated mathematical concepts stem from cloud seeding experiments. They include (i) “outlier prone” and “outlier resistant” distributions,(ii) two mechanisms of response to cloud seeding. and (iii) the concept of variability of response to cloudseeding.

Journal ArticleDOI
TL;DR: In this paper, a general method of obtaining the estimate of spuriosity is suggested, and the estimation of parameters in the exact distributions Normal, Gamma and Weibull is studied through the Bayesian approach by the use of generalized hypergeometric series.
Abstract: Outlier problem is studied in relation to the Pearson system of frequency curves represented by (′y/y)=(d-x)/(a+bx+cx 2) when the spuriosity is affectingd. Particular cases of approximation to normal and Gamma are treated. A general method of obtaining the estimate of spuriosity is suggested. Also, the estimation of parameters in the exact distributions Normal, Gamma and Weibull is studied through the Bayesian approach by the use of generalized hypergeometric series.