scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1992"


Journal ArticleDOI
TL;DR: In this article, the authors evaluated measures for making comparisons of errors across time series and found that the median absolute error of a given method to that from the random walk forecast is not reliable, and therefore inappropriate for comparing accuracy across series.

1,063 citations


DatasetDOI
01 Jan 1992
TL;DR: In this paper, the effect of outliers on reaction time analyses is evaluated and the power of different methods of minimizing the effect on the analysis of variance (ANOVA) is discussed.
Abstract: The effect of outliers on reaction time analyses is evaluated. The first section assesses the power of different methods of minimizing the effect of outliers on analysis of variance (ANOVA) and makes recommendations about the use of transformations and cutoffs. The second section examines the effect of outliers and cutoffs on different measures of location, spread, and shape and concludes using quantitative examples that robust measures are much less affected by outliers and cutoffs than measures based on moments. The third section examines fitting explicit distribution functions as a way of recovering means and standard deviations and concludes that unless fitting the distribution function is used as a model of distribution shape, the method is probably not worth routine use.

460 citations


Journal ArticleDOI
TL;DR: A linear neural unit with a modified anti-Hebbian learning rule is shown to be able to optimally fit curves, surfaces, and hypersurfaces by adaptively extracting the minor component of the input data set.

252 citations


Journal ArticleDOI
TL;DR: The authors describes and compares six procedures that can be used in a regression model to adjust for outliers in the data and nonlinearities in the relationship between the dependent and independent variables.
Abstract: This paper describes and compares six procedures that can be used in a regression model to adjust for outliers in the data and nonlinearities in the relationship between the dependent and independe...

68 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared the accuracy of several formulas for the standard error of the mean uncorrected correlation in meta-analytic and validity generalization studies, and concluded that the common formula for the sampling variance, V r =V/r /K where K is the number of studies in the meta-analysis, gives reasonably accurate results.
Abstract: This article compares the accuracy of several formulas for the standard error of the mean uncorrected correlation in meta-analytic and validity generalization studies. The effect of computing the mean correlation by weighting the correlation in each study by its sample size is also studied. On the basis of formal analysis and simulation studies, it is concluded that the common formula for the sampling variance of the mean correlation, V r =V/ r /K where K is the number of studies in the meta-analysis, gives reasonably accurate results. This formula gives accurate results even when sample sizes and ps are unequal and regardless of whether or not the statistical artifacts vary from study to study. It is also shown that using sample-size weighting may result in underestimation of the standard error of the mean uncorrected correlation when there are outlier sample sizes

57 citations


ReportDOI
03 Jan 1992
TL;DR: The graphical methods focus on two- and three-dimensional data and common task such as finding outliers and tail structure, assessing central structure and comparing central structures, and prioritizes views of computational fluid dynamics solution sets on the fly.
Abstract: This report addresses the monumental challenge of developing exploratory analysis methods for large data sets. The goals of the report are to increase awareness of large data sets problems and to contribute simple graphical methods that address some of the problems. The graphical methods focus on two- and three-dimensional data and common task such as finding outliers and tail structure, assessing central structure and comparing central structures. The methods handle large sample size problems through binning, incorporate information from statistical models and adapt image processing algorithms. Examples demonstrate the application of methods to a variety of publicly available large data sets. The most novel application addresses the too many plots to examine'' problem by using cognostics, computer guiding diagnostics, to prioritize plots. The particular application prioritizes views of computational fluid dynamics solution sets on the fly. That is, as each time step of a solution set is generated on a parallel processor the cognostics algorithms assess virtual plots based on the previous time step. Work in such areas is in its infancy and the examples suggest numerous challenges that remain. 35 refs., 15 figs.

52 citations


Journal ArticleDOI
TL;DR: In this article, a robust regression estimator, involving the least median of squares (LMS), is applied to the estimation of paleostress tensors from fault plane data; not only can the parameters of the tensor be estimated but also the quality of the data-set assessed.

50 citations


Journal ArticleDOI
TL;DR: In this article, a generalization of Wilks's single-outlier test for detecting from 1 to k outliers in a multivariate data set is proposed and appropriate critical values determined.
Abstract: A generalization of Wilks's single‐outlier test suitable for application to the many‐outlier problem of detecting from 1 to k outliers in a multivariate data set is proposed and appropriate critical values determined. The method used follows that suggested by Rosner employing sequential application of the generalized extreme Studentized deviate to univariate samples of reducing size, in which the type I error is controlled both under the hypothesis of no outliers and under the alternative hypothesis of 1, 2,. . ., k outliers. It is shown that critical values for the sequential application of Wilks's test to detect many outliers depend only on those for a single outlier test which may be approximated by percentage points from the F‐distributions as tabulated by Wilks. Relationships between Wilks's test statistic, the Mahalanobis distance between the ‘outlier’ and the mean vector, and Hotelling's T2‐test between the outlier and the rest of the data, are used to reduce the amount of computation involved in applying the sequential procedure. Simulations are used to show that the method behaves well in detecting multiple outliers in samples larger than about 25. Finally, an example with three dimensions is used to illustrate how the method is applied.

50 citations


Journal Article
TL;DR: In this article, a generalized extreme studentized residual (GESR) procedure was proposed to detect multiple y outhers in linear regression, and the performance of this procedure was compared with others by Monte Carlo techniques and found to be superior.
Abstract: This article is concerned with procedures for detecting multiple y outhers in linear regression. A generalized extreme studentized residual (GESR) procedure, which controls type I error rate, is developed. An approximate formula to calculate the percentiles is given for large samples and more accurate percentiles for n ≤ 25 are tabulated. The performance of this procedure is compared with others by Monte Carlo techniques and found to be superior. The procedure. however, fails in detecting y outliers that are on high-leverage cases. For this. a two-phase procedure is suggested. In phase 1, a set of suspect observations is identified by GESR and one of the diagnostics applied sequentially. In phase 2, a backward testing is conducted using the GESR procedure to see which of the suspect cases are outlicrs. Several examples are analyzed.

44 citations


Journal ArticleDOI
TL;DR: The results indicate that the specification power of these statistics could be significantly jeopardized by an additive outlier, while an innovational outlier seems to cause no harm to them.
Abstract: We investigate the usefulness of sample autocorrelations and partial autocorrelations as model specification tools when the observed time series is contaminated by an outlier. The results indicate that the specification power of these statistics could be significantly jeopardized by an additive outlier. On the other hand, an innovational outlier seems to cause no harm to them.

36 citations


Journal ArticleDOI
TL;DR: In this article, the authors apply the ideas of Spiegelhalter and Smith to the problem of finding the largest number of outliers from a distribution with a given density, and show that with a vague improper prior for contaminating parameters, most posterior weight is put on the model.
Abstract: Suppose we think that most observations in a sample have been generated from a distribution with density f(x) but we fear that a few outliers from a distribution with density g(x) may have contaminated our sample. In many situations, we might assume that f(x) is a density depending on a parameter θ and that g(x) is of the same form as f but with parameter θ + δ or θδ. A number of Bayesian models for this problem when f is normal have been discussed by Freeman. He points out that with a vague improper prior for contaminating parameters, most posterior weight is put on the model allowing for the largest number of outliers. He therefore confines attention to proper priors when trying to answer the question of “how many outliers?” However, in many situations we do not have very certain information on the contaminating parameters and would like to make inferences about outliers when using improper priors for the parameters of the model. In this article, we apply the ideas of Spiegelhalter and Smith to...

Journal ArticleDOI
TL;DR: In this paper, the authors find two fundamental statistics in discriminant analysis on which many influence measures depend and discuss several other measures also proposed and discussed, such as approximate contours of the measure can be plotted to reveal influence information.

Journal ArticleDOI
TL;DR: The averaging of data is rarely done properly, and the intent of this article is to clarify the issues and provide a tool that allows researchers to improve their averaging techniques.
Abstract: An Excel macro is presented for averaging spreadsheet data. The macro has several special features: (1) The data are weighted by the inverse variance of each datum to decrease the contribution-of noisy outliers. (2) There is a provision for a power or a log transform of the data before averaging. The rationale for transforming the data before averaging is discussed (3) The output includes the average value, its standard error, and the reduced chi-square that measures the goodness of fit (4) The standard error is corrected by a heterogeneity factor based on the reduced chi-square The averaging of data is rarely done properly, and the intent of this article is to clarify the issues and provide a tool that allows researchers to improve their averaging techniques.

Proceedings ArticleDOI
09 Aug 1992
TL;DR: In this article, the authors derived a general expression of the optimal v for which the breakdown point of the LMS attains the highest possible fraction of outliers that any regression equivariant estimator can handle.
Abstract: The least median of squares (LMS) estimator minimizes the vth ordered squared residual. The authors derived a general expression of the optimal v for which the breakdown point of the LMS attains the highest possible fraction of outliers that any regression equivariant estimator can handle. This fraction is equal to half of the minimum surplus divided by the number of measurements in the network. The surplus of a fundamental set is defined as the smallest number of measurements whose removal from that fundamental set turns at least one measurement in the network into a critical one. Based on the surplus concept, a system decomposition scheme that significantly increases the number of outliers that can be identified by the LMS is developed. In addition, it dramatically reduces the computing time of the LMS, opening the door to real-time applications of that estimator to large-scale systems. Finally, outlier diagnostics based on robust Mahalanobis distances are proposed. >

Journal ArticleDOI
TL;DR: Two less time-consuming alternatives to high-breakdown regression estimators that will often be able to withstand up to a fraction of 1/3 of contaminated data, even when there are several predictor variables are proposed.

Journal ArticleDOI
TL;DR: In this paper, an approach based on the one-step M-estimator of location was proposed to control the probability of a Type I error even when distributions are skewed, have different shapes, and the variances are unequal.
Abstract: Experience with real data indicates that psychometric measures often have heavy-tailed distributions. This is known to be a serious problem when comparing the means of two independent groups because heavy-tailed distributions can have a serious effect on power. Another problem that is common in some areas is outliers. This paper suggests an approach to these problems based on the one-step M-estimator of location. Simulations indicate that the new procedure provides very good control over the probability of a Type I error even when distributions are skewed, have different shapes, and the variances are unequal. Moreover, the new procedure has considerably more power than Welch's method when distributions have heavy tails, and it compares well to Yuen's method for comparing trimmed means. Wilcox's median procedure has about the same power as the proposed procedure, but Wilcox's method is based on a statistic that has a finite sample breakdown point of only 1/n, wheren is the sample size. Comments on other methods for comparing groups are also included.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors estimate the parameters of a robust AR model where the driving noise is a mixture of a Gaussian and an outlier process and propose an iterative procedure that involves parameter estimation for uncorrupted speech and data cleaning based on the robust Kalman filter, which is used to enhance speech corrupted by white noise.
Abstract: There are two major problems in estimating vocal tract characteristics by conventional linear prediction: estimation accuracy being subject to the characteristics of the excitation source, and the output quality and the estimation accuracy deteriorating with additive background noise. The authors solved these problems as follows: first, estimate the parameters of a robust AR model where the driving noise is a mixture of a Gaussian and an outlier process; then, propose an iterative procedure that involves parameter estimation for uncorrupted speech and data cleaning based on the robust Kalman filter; lastly, the above results are used to enhance speech corrupted by white noise. The results are more efficient and less biased for uncorrupted speech, and superior at low SNR for noisy speech. >

Posted Content
TL;DR: In this article, it is shown how estimation of a missing observation is analogous to the removal of an outlier effect; both problems are closely related with the signal plus noise decomposition of the series.
Abstract: The paper deals with estimation of missing observations in possible nonstationary ARIMA models. First, the model is assumed known, and the structure of the interpolation filter is analyzed. Using the inverse or dual autocorrelation function it is seen how estimation of a missing observation is analogous to the removal of an outlier effect; both problems are closely related with the signal plus noise decomposition of the series. The results are extended to cover, first, the case of a missing observation near the two extremes of the series; then to the case of a sequence of missing observations, and finally to the general case of any number of sequences of any length of missing observations. The optimal estimator can always be expressed, in a compact way, in terms of the dual autocorrelation function or a truncation thereof; is mean squared error is equal to the inverse of the (appropriately chosen) dual autocovariance matrix. The last part of the paper illustrates a point of applied interest: When the model is unknown, the additive outlier approach may provide a convenient and efficient alternative to the standard Kalman filter-fixed point smoother approach for missing observations estimation.

Journal ArticleDOI
TL;DR: A method that takes advantage of a reparametrization in order to recursively obtain the minimax estimates and associated bounds for the error is described and extended to output-error models.

Journal ArticleDOI
TL;DR: In this article, the optimal linear estimators used obtain unbiased, minimum variance results based on the temporal and spatial correlation of the data and estimates of sample uncertainty for data collected in ambient air monitoring networks.

Journal ArticleDOI
TL;DR: A new application of a Kalman filter implementation of exponential smoothing with monitoring for outliers and level shifts is presented, including the four model-selection criteria and the estimation of the required parameters by maximum likelihood.
Abstract: This paper presents a new application of a Kalman filter implementation of exponential smoothing with monitoring for outliers and level shifts. The assumption is that each observation comes from one of three models: steady, outlier, or level shift. This concept was introduced as a multiprocess model by Harrison and Stevens (1976). However, their handling of the models is different. In this paper four different model-selection criteria are introduced and compared by applying them to data. The new features of the application include the four model-selection criteria and the estimation of the required parameters by maximum likelihood.

Book ChapterDOI
TL;DR: This work presents two algorithms to estimate missing values in time series using the Kalman Filter and the additive outlier approach, developed by Pefia, Ljung and Maravall.
Abstract: This work presents two algorithms to estimate missing values in time series. The first is the Kaiman Filter, as developed by Kohn and Ansley (1986) and others. The second is the additive outlier approach, developed by Pena, Ljung and Maravall. Both are exact and lead to the same results. However, the first is, in general, faster and the second more flexible.

Proceedings ArticleDOI
TL;DR: This work evaluates the performance of several well known M-estimators under different noise conditions and highlights the effects of tuning constants and the necessity of simultaneous scale and parameter estimation.
Abstract: Depth maps are frequently analyzed as if, to an adequate approximation, the errors are normally, identically, and independently distributed. This noise model does not consider at least two types of anomalies encountered in sampling: A few large deviations in the data, often thought of as outliers; and a uniformly distributed error component arising from rounding and quantization. The theory of robust statistics formally addresses these problems and is efficiently used in a robust sequential estimator (RSE) of our own design. The specific implementation was based on a t-distribution error model, and this work extends this concept to several well known M-estimators. We evaluate the performance of these estimators under different noise conditions and highlight the effects of tuning constants and the necessity of simultaneous scale and parameter estimation.

Journal ArticleDOI
TL;DR: Ridder and Norgaard as discussed by the authors used a flow injection analysis system with photodiode array detection for the simultaneous determination of Co(II) and Ni(II), in the concentration range 0-1 ppm.

Proceedings ArticleDOI
15 Jun 1992
TL;DR: The AIC is extended to a t-distribution noise model, which more realistically represents anomalies in the data such as outliers and quantization errors and its performance is compared with that of AIC and Consistent AIC.
Abstract: Modeling of the unknown surface, a key first step in the perception of surfaces in range images using the function approximation approach, is considered. Akaike's entropy-based information criterion (AIC) is a simple but powerful tool for choosing the best fitting model among several competing models. However, the AIC presupposes a fixed data set and a normality assumption on the error's distribution. The AIC is extended to a t-distribution noise model, which more realistically represents anomalies in the data such as outliers and quantization errors. This criterion is modified to be used with a robust sequential algorithm to accommodate the variable data size resulting from fitting different models. The modified criterion is applied to real range data, and its performance is compared with that of AIC and Consistent AIC. >

Proceedings Article
30 Nov 1992
TL;DR: Using statistical physics techniques including the Gibbs distribution, binary decision fields and effective energies, self-organizing PCA rules are proposed which are capable of resisting outliers while fulfilling various PCA-related tasks such as obtaining the first principal component vector, the first k principal component vectors, and directly finding the subspace spanned by the firstk vector principal components vectors.
Abstract: In the presence of outliers, the existing self-organizing rules for Principal Component Analysis (PCA) perform poorly. Using statistical physics techniques including the Gibbs distribution, binary decision fields and effective energies, we propose self-organizing PCA rules which are capable of resisting outliers while fulfilling various PCA-related tasks such as obtaining the first principal component vector, the first k principal component vectors, and directly finding the subspace spanned by the first k vector principal component vectors without solving for each vector individually. Comparative experiments have shown that the proposed robust rules improve the performances of the existing PCA algorithms significantly when outliers are present.

Journal ArticleDOI
01 Mar 1992-Empirica
TL;DR: In this article, the authors apply intervention analysis to three Austrian economic time series, namely retail sales, purchases of durables, and car purchases, to evaluate the reliability of these methods in practical situations.
Abstract: Time series are often subject to the influence of non-repetitive events. Economic variables make here no exception. For example, the announcement and implementation of new regulations, major changes in economic policy or in the tax legislation, and similar events may cause substantial disturbances in economic time series. The presence of outliers may lead to wrongly identified models and inappropriately estimated model parameters giving rise to poor forecasts and erroneous conclusions. In the past, these problems had mostly to be ignored, because simple yet efficient techniques for the treatment of outliers did not exist. The situation improved slightly whenBox-Tiao (1975) proposed intervention analysis. However, the fact that a detailed knowledge of the structure of the series to be analysed is required for a successful application of this technique, is a severe restriction for its use in practical work. But, in the meantime, there exist already techniques which solve the outlier problem more or less automatically. For a detailed discussion of these techniques and their computer implementation seeChen-Liu-Hudak (1990). It is the aim of this paper to gain information on the reliability of these methods in practical situations. For this purpose, we apply them in the analysis of three Austrian economic time series, namely retail sales, purchases of durables, and car purchases. We believe that these series are well suited for our objective. They are strongly contaminated by outliers and, additionally, there already exist sophisticated intervention models which can serve as benchmarks in the comparison.

Journal ArticleDOI
TL;DR: In this paper, Zhao, Krishnaiah and Bai used an information theoretic criterion to detect the number of outliers in a data set, and considered univariable mean-slippage and dispersion-slIPPage outlier structure of the observations.
Abstract: We use an information theoretic criterion proposed by Zhao, Krishnaiah and Bai (1986) to detect the number of outliers in a data set. We consider univariable mean-slippage and dispersion-slippage outlier structure of the observations. Multivariate generalizations and the consistency of the estimates are also considered. Numerical examples are presented in tables.

Journal ArticleDOI
TL;DR: The model reference technique and Huber's minimax principle have been successfully used to develop an offline output error method for robust identification of systems and it is concluded that this method is much superior to the other methods and therefore can be widely used in many real-time applications.
Abstract: The model reference technique and Huber's minimax principle have been successfully used to develop an offline output error method for robust identification of systems. This method is named the robust iterative output error method with modified residuals. A convergence analysis of the proposed method has been included as well as some simulation results. In the presence of a small number of large errors (called outliers) in the input-output data, the presented method has demonstrated its distinctive advantages over not only the nonrobust methods but also previously developed robust methods. The main advantages are a fast convergence speed and satisfactory robustness. It is concluded that the method developed here is much superior to the other methods and therefore can be widely used in many real-time applications. >

Book ChapterDOI
01 Jan 1992
TL;DR: In this article, the authors discuss several nonlinear regression methods estimating contaminated radioimmunoassay data, such as L p -norm or nonlinear generalizations of Huber's M-estimator.
Abstract: The paper discusses several nonlinear regression methods estimating contaminated radioimmunoassay data. The underlying model is an overdispersed Poisson process with four regression line parameters and one parameter related to the overdispersion of the variance. A generalized least-squares (GLS) algorithm can be used for parameter estimation of noncontaminated data. In the presence of outliers different methods are discussed such as L p -norm or nonlinear generalizations of Huber’s M-estimator. The best estimation results we get by a winsorized version of the GLS algorithm.