scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1984"


Book
01 Jan 1984
TL;DR: In this article, the authors define the binomial and normal probability distributions choosing samples, statistical inference, estimation and hypothesis testing sample size and power linear regression and correlation analysis of variance factorial designs.
Abstract: Basic definitions and concepts data graphics introduction to probability - the binomial and normal probability distributions choosing samples statistical inference - estimation and hypothesis testing sample size and power linear regression andcorrelation analysis of variance factorial designs transformations and outliers experimental design in clinical trials quality control validation computer-intensive methods nonparametric methods optimization techniques and screening designs.Appendices: some properties of the variance comparison of slopes and testing of linearity - determination of relative potency multiple regression tables outlier tests and chemical assays should a single unexplained failing assay be reason to reject abatch answers to exercises.

481 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the two-component extreme value (TCEV) distribution can be assumed as a parent flood distribution, i.e., one closely representative of the real flood experience, which is a solid theoretical basis for AFS analysis.
Abstract: Theoretical considerations, supported by statistical analysis of 39 annual flood series (AFS) of Italian basins, suggest that the two-component extreme value (TCEV) distribution can be assumed as a parent flood distribution, i.e., one closely representative of the real flood experience. This distribution belongs to the family of distributions of the annual maximum of a compound Poisson process, which is a solid theoretical basis for AFS analysis. However, the two-parameter distribution of this family, obtained on the assumption of identically distributed floods, does not account for the high variability of both observed skewness and largest order statistics, so that a significant number of observed floods qualify as outliers under this distribution. The more general TCEV distribution assumes individual floods to arise from a mixture of two exponential components. Its four parameters can be estimated by the maximum likelihood method. A regionalized TCEV distribution, with parameters representative of a set of 39 Italian AFS's, was shown to closely reproduce the observed distribution of skewness and that of the largest order statistic.

386 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed two summary statistics: an unweighted median and a weighted median, which are more efficient but less robust than the weighted median and the unweighting median, respectively.
Abstract: The outlying tendency of any case in a multiple regression of p predictors may be estimated by drawing all subsets of size p from the remaining cases and fitting the model. Each such subset yields an elemental residual for the case in question, and a suitable summary statistic of them can be used as an estimate of the case's outlying tendency. We propose two such summary statistics: an unweighted median, which is of bounded influence, and a weighted median, which is more efficient but less robust. The computational load of the procedure is reduced by using random samples in place of the full set of subsets of size p. As a byproduct the method yields useful information on the influence (or leverage) of cases and the mutual masking of high leverage points.

297 citations


Journal ArticleDOI
Mike West1
TL;DR: In this article, the authors consider a special class of heavy-tailed, unimodal and symmetric error distributions for which the analyses, though apparently intractable, can be examined in some depth by exploiting certain properties of the assumed error form.
Abstract: SUMMARY Bayesian inference in regression models is considered using heavy-tailed error distri- butions to accommodate outliers The particular class of distributions that can be con- structed as scale mixtures of normal distributions are examined and use is made of them as both error models and prior distributions in Bayesian linear modelling, includ- ing simple regression and more complex hierarchical models with structured priors depending on unknown hyperprior parameters The modelling of outliers in nominally normal linear regression models using alternative error distributions which are heavy-tailed relative to the normal provides an automatic means of both detecting and accommodating possibly aberrant observations Such realistic models do, however, often lead to analytically intractable analyses with complex posterior distributions in several dimensions that are difficult to summarize and understand In this paper we consider a special yet rather wide class of heavy-tailed, unimodal and symmetric error distributions for which the analyses, though apparently intractable, can be examined in some depth by exploiting certain properties of the assumed error form The distributions concerned are those that can be con- structed as scale mixtures of normal distributions In his paper concerning location parameters, de Finetti (1961) discusses such distributions and suggests the hypothetical interpretation that "each observation is taken using an instrument with normal error, but each time chosen at random from a collection of instruments of different precisions, the distribution of the

246 citations


Journal ArticleDOI
TL;DR: A robust kriging method is proposed forGeological data frequently have a heavy-tailed normal-in-the-middle distribution, which gives rise to grade distributions that appear to be normal except for the occurrence of a few outliers, and which, used in conjunction with a robust estimator of the variogram, provides good protection against the effects of data outliers.
Abstract: Geological data frequently have a heavy-tailed normal-in-the-middle distribution, which gives rise to grade distributions that appear to be normal except for the occurrence of a few outliers. This same situation also applies to log-transformed data to which lognormal kriging is to be applied. For such data, linear kriging is nonrobust in that (1)kriged estimates tend to infinity as the outliers do, and (2)it is also not minimum mean squared error. The more general nonlinear method of disjunctive kriging is even more nonrobust, computationally more laborious, and in the end need not produce better practical answers. We propose a robust kriging method for such nearly normal data based on linear kriging of an editing of the data. It is little more laborious than conventional linear kriging and, used in conjunction with a robust estimator of the variogram, provides good protection against the effects of data outliers. The method is also applicable to time series analysis.

115 citations


01 Jan 1984
TL;DR: In this article, the authors proposed a robust kriging method for log-transformed data based on linear krigging of an editing of the data, which is used in conjunction with a robust estimator of the variogram, providing good protection against the effects of data outliers.
Abstract: Geological data frequently have a heavy-tailed normal-in-the-middle distribution, which gives rise to grade distributions that appear to be normal except for the occurrence of a few outliers. This same situation also applies to log-transformed data to which lognormal kriging is to be applied. For such data, linear kriging is nonrobust in that (1)kriged estimates tend to infinity as the outliers do, and (2)it is also not minimum mean squared error. The more general nonlinear method of disjunctive kriging is even more nonrobust, computationally more laborious, and in the end need not produce better practical answers. We propose a robust kriging method for such nearly normal data based on linear kriging of an editing of the data. It is little more laborious than conventional linear kriging and, used in conjunction with a robust estimator of the variogram, provides good protection against the effects of data outliers. The method is also applicable to time series analysis.

114 citations


Journal ArticleDOI
TL;DR: It is demonstrated that an outlier affects not only the accuracy of the forecasts at the time of occurrence but also subsequent forecasts, including future forecasts derived from extrapolative methods.
Abstract: The effect of an additive outlier upon the accuracy of forecasts derived from extrapolative methods is investigated. It is demonstrated that an outlier affects not only the accuracy of the forecasts at the time of occurrence but also subsequent forecasts. Methods to adjust for additive outliers are discussed. The results of the paper are illustrated with two examples.

59 citations


Journal ArticleDOI
TL;DR: This work proposes the use of the normal probability plot and the cumulative sum plots of the recursive residuals to check the model assumptions of normality and homoscedasticity, and other aspects of model misfits such as change of regime, outliers, and omitted predictors, in place of plots based on ordinary residuals.
Abstract: Recursive residuals are independently and identically distributed and, unlike ordinary residuals, do not have the problem of deficiencies in one part of the data being smeared over all the residuals. In addition, recursive residuals may be interpreted as showing the effect of successively deleting observations from the data set. We propose the use of the normal probability plot and the cumulative sum plots of the recursive residuals, and of the square roots of the absolute values of the recursive residuals to check the model assumptions of normality and homoscedasticity, and other aspects of model misfits such as change of regime, outliers, and omitted predictors, in place of plots based on ordinary residuals. A further advantage of recursive residuals is that they are open to formal statistical testing, so that these plots can be automated and in fact produced only when a model misfit has been detected.

55 citations


Journal ArticleDOI
TL;DR: The authors summarizes the findings from research on effective reading programs as it relates to this basic question, and poses questions about this literature and in particular the applications being made of the findings and suggests directions for future research are suggested.
Abstract: WHAT DO we know about effective reading programs? This report summarizes the findings from research on effective schools as it relates to this basic question. The studies included in this review have all employed to one degree or another an "outlier" paradigm. This paradigm typically involves the identification and study of a school or set of schools which have been highly successful in terms of their effects on pupil achievement where extra-institutional factors would predict patterns of failure. Common characteristics of effective schools as revealed by the studies are organized around the dimensions of: program characteristics, leadership behaviors, and psychological conditions. The report poses questions about this literature and in particular the applications being made of the findings. Directions for future research are suggested.

34 citations


Journal ArticleDOI
TL;DR: In this paper, an outlier-insensitive, robust smoothing method is proposed for spectral data which rejects the influence of huge noise spikes, which can be tuned by two parameters: the first corresponds to the signal-to-noise ratio, the second to the halfwidth of the spectral bands.
Abstract: There are several smoothing procedures for spectral data which are affected by occasionally occurring outliers. Most of the known methods are based on local averages (or fits) of the spectral data. We introduce here an outlier-insensitive, robust smoothing method which rejects the influence of huge noise spikes. The proposed smoothing algorithm can be tuned by two parameters. The first corresponds to the signal-to-noise ratio, the second to the halfwidths of the spectral bands. We apply this new technique to several spectra and prove the advantages of our method of identifying peaks and baselines in Raman spectroscopy.

33 citations


Journal ArticleDOI
TL;DR: The estimation accuracy of the sample mean and 27 robust estimation and outlier detection techniques are compared by computer simulation and it is shown that the proper class of estimates depends on the degree of contaminations whether the contamination is symmetric or asymmetric, and the sample size.
Abstract: Although the poor performance of the mean as a location estimate when outliers are present in the data is well-known, there has b.een no clear consensus as to whether robust estimation or outlier detection Is the appropriate corrective procedure. In this paper, the estimation accuracy of the sample mean and 27 robust estimation and outlier detection techniques are compared by computer simulation. Both symmetric and asymmetric contamination are considered, It Is shown that the proper class of estimates depends on the degree of contaminations whether the contamination is symmetric or asymmetric, and the sample size. Several data sets considered previously by Rocke et.al. (1982) are also examined.

Journal ArticleDOI
TL;DR: In this article, a particular form of multiple outlier detection is adopted, based on goodness of fit, kurtosis, skewness and optimal statistics from the ratios of variances and Studentized deviates.

Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate the potential usefulness of robust regression analysis in treating influential response values in marketing data and show that robust regression can be used to detect aberrant response values or outliers in data.
Abstract: In marketing models, the presence of aberrant response values or outliers in data can distort the parameter estimates or regression coefficients obtained by means of ordinary least squares. The authors demonstrate the potential usefulness of the robust regression analysis in treating influential response values in marketing data.

Book ChapterDOI
01 Jan 1984
TL;DR: The Direct Quadratic Spectrum Estimation method is versatile in handling data that have irregular spacing or missing values; the method is computationally stable, is robust to isolated outlier observations in irregularly spaced data, is capable of fine frequency resolution, makes maximum use of all available data, and is easy to implement on a computer.
Abstract: The Direct Quadratic Spectrum Estimation (DQSE) method was defined in Marquardt and Acuff, 1982. Some of the theoretical properties of DQSE were explored. The method was illustrated with several numerical examples. The DQSE method is versatile in handling data that have irregular spacing or missing values; the method is computationally stable, is robust to isolated outlier observations in irregularly spaced data, is capable of fine frequency resolution, makes maximum use of all available data, and is easy to implement on a computer. Moreover, DQSE, coupled with irregularly spaced data, can provide a powerful diagnostic tool because irregularly spaced data are inherently resistant to aliasing problems that often are a limitation with equally spaced data.

Journal ArticleDOI
TL;DR: In this paper, the authors estimate the common distribution for the parameters at the second stage of the prior model and use this distribution empirically from the data, permitting the data to determine the nature of the shrinkages.
Abstract: The usual Bayes-Stein shrinkages of maximum likelihood estimates towards a common value may be refined by taking fuller account of the locations of the individual observations. Under a Bayesian formulation, the types of shrinkages depend critically upon the nature of the common distribution assumed for the parameters at the second stage of the prior model. In the present paper this distribution is estimated empirically from the data, permitting the data to determine the nature of the shrinkages. For example, when the observations are located in two or more clearly distinct groups, the maximum likelihood estimates are roughly speaking constrained towards common values within each group. The method also detects outliers; an extreme observation will either the regarded as an outlier and not substantially adjusted towards the other observations, or it will be rejected as an outlier, in which case a more radical adjustment takes place. The method is appropriate for a wide range of sampling distributions and may also be viewed as an alternative to standard multiple comparisons, cluster analysis, and nonparametric kernel methods.

Journal ArticleDOI
TL;DR: In this article, a set of independent observations is assumed to come from one or more normal populations having the same unknown variance and different unknown means, but priors are chosen for these quantities which make it very likely that one population is dominant.
Abstract: A set of independent observations is assumed to come from one or more normal populations having the same unknown variance and different unknown means. Ignorance priors are associated with these parameters. The number of populations is also unknown as is the number of observations from each, but priors are chosen for these quantities which make it very likely that one population is dominant. Observations from the rest are considered outliers. Using these priors in conjunction with Akaike's predictive likelihood, which is derived for the class of models considered, one can obtain a quasi-Bayesian posterior probability for each possible model. A “robust” estimate of the mean value of the dominant population and “corrected” values for the outliers can be calculated from the posterior probabilities, once the outliers have been designated. Darwin's data and Herndon's data are analyzed to illustrate the procedure.

Journal ArticleDOI
TL;DR: In this article, sensitivity analysis and robust regression are applied to the problem of measuring the performance of investment portfolios and the sensitivity of systematic risk estimates is gauged via the deletion of outliers from the data set.
Abstract: The novel statistical techniques of sensitivity analysis and robust regression are applied to the problem of measuring the performance of investment portfolios. The sensitivity of systematic risk estimates is gauged via the deletion of outliers from the data set. The robust regression procedure of least absolute residuals is employed to derive new estimates of Jensen's measure of investment performance.

Journal ArticleDOI
TL;DR: In this article, the authors present a routine that calculates four outlier detection statistics, which can be used in an iterative procedure to detect multiple outliers in a set of points that are identified as possible outliers.
Abstract: This paper presents a routine that calculates four outlier detection statistics. The routine determines a series of points that are identified as possible outliers, and calculates the values that can be used to test them. These values can be used in an iterative procedure to detect multiple outliers.

Journal ArticleDOI
TL;DR: The main emphasis is on parameter estimation (i.e. approximation of the posterior expectations of the parameter) while Goldstein (1976) uses similar tools to concentrate on predictions and de Vylder (1982) pays particular attention to restricted covariance matrices in the context of credibility theory.
Abstract: Regression models are introduced in the framework of Bayesian cuts (see, for example, Florens and Mouchart, 1977). It is well known that once the prior distribution is not natural-conjugate or the sampling process is non-normal, the computation of the posterior distribution may rapidly become intractable. This involves the temptation of specifying simplistic models in order to keep control on the tractability. Examples of these complications are well known when introducing fat-tails distributions in order to treat outliers or when facing asymmetrically distributed residuals. When the main interest lies in the computation of the posterior expectations of the parameters, we show that Least-Squares (L.S.) approximations allows one to render tractable those nonstandard models. This suggests that approximate solutions to reasonable models may be attractive alternatives to the exact solution to simplified models. The Least-Squares approximations may be interpreted in the framework of a normal approximation to the joint distribution of parameters and observations. This suggests that their practical relevance will crucially depend on the choice of co-ordinates. This note handles that choice in the field of regression models. In this paper the main emphasis is on parameter estimation (i.e. approximation of the posterior expectations of the parameter) while Goldstein (1976) uses similar tools to concentrate on predictions and de Vylder (1982) pays particular attention to restricted covariance matrices in the context of credibility theory. In Section 2 we introduce the basic model. Then in Section 3 we present our L.S. approximations trying to take advantage of the use of unbiased estimators, while in Section 4 we consider the particular case of singular covariance matrices.

ReportDOI
01 Jun 1984
TL;DR: In this paper, it is shown that sampling plans based on prior distributions and costs are only efficient in an outlier model, i.e., if almost all lots are good quality and only a low number of lots, denoted as outlier lots, have very poor quality.
Abstract: : This paper deals with some pitfalls linked with the sampling model based on prior distribution and costs. First, a model is designed which encompasses most of the existing Bayesian cost models. The efficiency of sampling plans is investigated in a numerical study. It is shown that under realistic assumptions, described by Dodge (1969) and Schilling (1982), sampling plans based on prior distributions and costs are only efficient in an outlier model, i.e. if almost all lots are good quality and only a low number of lots, denoted as outlier lots, have very poor quality. Furthermore, it is demonstrated that for the Polya distribution a gain of sampling is linked with a high percentage of rejections, i.e. when the prior distribution cost relationship is such that less than 5% of the lots should be rejected sampling becomes inefficient.


Proceedings ArticleDOI
01 Mar 1984
TL;DR: This work presents an algorithm which yields 2-D spectral estimates, robust to additive outliers, which is iterative in nature and involves fitting a 1-D noncausal spatial autoregressive (SAR) to the given data.
Abstract: Most of the parametric techniques of the spectrum estimation assume a specific distribution for the observations. Even a small number of outliers in the observations, violating the distribution assumption can yield poor spectral estimates. We present an algorithm which yields 2-D spectral estimates, robust to additive outliers. The algorithm is iterative in nature and involves fitting a 2-D noncausal spatial autoregressive (SAR) to the given data.

01 Jan 1984
TL;DR: Using Tukey's test, consequent transformation and graphical analysis for outlier elimination showed that nonadditivity in the data was caused by either the presence of outliers, or the absence of a suitable transformation or both.
Abstract: To bring out the relative efficiency of various types of fishing gears, in the analysis of catch data, a combination of Tukey's test, consequent transformation and graphical analysis for outlier elimination has been introduced, which can be advantageously used for applying ANOYA. techniques, Application of these procedures to actual sets of data showed that nonadditivity in the data was caused by either the presence of outliers, or the absence of a suitable transformation or both. As a corollary, the concurrent model: Xi; = fL + ex i + ~ ; +). C>

Book ChapterDOI
01 Jan 1984
TL;DR: A simplified version of the AM estimate introduced by Martin in connection with robust estimation for autoregressive moving average model with additive outliers is investigated and it is seen that the estimate is not consistent under the ideal model.
Abstract: A simplified version of the AM estimate introduced by Martin in connection with robust estimation for autoregressive moving average model with additive outliers is investigated It is seen that the estimate is not consistent under the ideal model, so a similar procedure is introduced and studied Simulation results in the simple case of the first order autoregressive model show that the estimates are very robust against the additive outlier model, but not quite robust against the innovation outlier model