scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1986"


Journal ArticleDOI
TL;DR: In this paper, an iterative procedure is proposed to identify the outliers, to remove their effects, and to specify a tentative model for the underlying process, which is essentially based on the iterative estimation procedure of Chang and Tiao (1983) and the extended sample autocorrelation function (ESACF) model identification method of Tsay-Tiao (1984).
Abstract: Outliers are commonplace in data analysis. Time series analysis is no exception. Noting that the effect of outliers on model identification statistics could be serious, this article is concerned with the problem of time series model specification in the presence of outliers. An iterative procedure is proposed to identify the outliers, to remove their effects, and to specify a tentative model for the underlying process. The procedure is essentially based on the iterative estimation procedure of Chang and Tiao (1983) and the extended sample autocorrelation function (ESACF) model identification method of Tsay and Tiao (1984). An example is given. Properties of the proposed procedure are discussed.

350 citations


Journal ArticleDOI
TL;DR: In this paper, a procedure for limiting the influence of these outliers on the estimates of the model parameters is described, where the model effects are estimated by augmenting the original observations with auxiliary observations that contain the prior information represented by the variances.
Abstract: Outliers may occur with respect to any of the random components in the mixed linear model. A procedure for limiting the influence of these outliers on the estimates of the model parameters is described. Given the variances or estimates of them, the model effects are estimated by augmenting the original observations with auxiliary observations that contain the prior information represented by the variances. Large residuals among either the original or the auxiliary observations are interpreted as outlying random errors or outlying random effects, as appropriate, and Winsorized. The robust estimation of the variances is obtained by modifying the defining equations for the restricted maximum likelihood estimates under normality along the lines of Huber's proposal 2. A numerical example illustrates the use of the methodology, both as a diagnostic and as an estimation tool.

160 citations


Journal ArticleDOI
TL;DR: In this paper, the authors identify sample outliers as two basic types: representative outliers and non-representative outliers, i.e., sample elements whose data values are incorrect or unique in some sense.
Abstract: Outliers in sample data are a perennial problem for applied survey statisticians. Moreover, it is a problem for which traditional sample survey theory offers no real solution, beyond the sensible advice that such sample elements should not be weighted to their fullest extent in estimation. Sample outliers can be identified as of two basic types. Here we are concerned with the first type, which may conveniently be termed representative outliers. These are sample elements with values that have been correctly recorded and that cannot be assumed to be unique. That is, there is no good reason to assume there are no more similar outliers in the nonsampled part of the target population. The remaining sample outliers, which by default are termed nonrepresentative, are sample elements whose data values are incorrect or unique in some sense. Methods for dealing with these nonrepresentative outliers lie basically within the scope of survey editing and imputation theory and are, therefore, not considered in ...

158 citations


Journal ArticleDOI
TL;DR: The UNEQ method can be very useful for classification purposes but requires the populations to be homogeneous as is the case for other techniques.

138 citations


Journal ArticleDOI
Mike West1
TL;DR: In this article, an approche bayesienne basee sur des comparaisons des predictions a partir du modele standard avec celles d'un modele alternatif simple is presented.
Abstract: On presente une approche bayesienne basee sur des comparaisons des predictions a partir du modele standard avec celles d'un modele alternatif simple

89 citations


Journal ArticleDOI
TL;DR: In this paper, a method of testing for the presence of an outlier of unknown type is proposed and the properties of a rule based on the likelihood ratio which attempts to distinguish the two types of outlier are examined and compared with those of corresponding Bayes rules.
Abstract: Distinguishing an outlier in a time series arising through measurement error from one arising through a perturbation of the underlying system can be of use in data validation. In this paper a method of testing for the presence of an outlier of unknown type is proposed. Then the properties of a rule based on the likelihood ratio which attempts to distinguish the two types of outlier are examined and compared with those of the corresponding Bayes rules. An example involving data from an industrial production process is studied.

49 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a plotting position based on expressions for the non-exceedance probability of the form Fr = (r − α)(n − β), where α and β are allowed to vary with sample size and the shape parameter of the general extreme value distribution.

39 citations


Proceedings ArticleDOI
01 Dec 1986
TL;DR: In this paper, membership set theory is used to characterize the set of all the parameter vectors that are consistent with the assumed knowledge of bounds on the acceptable errors between the data and the model outputs.
Abstract: When modeling a system, it is of importance to assess the uncertainty on the estimated values of the parameters. This is usually done by taking advantage of the asymptotic properties of maximum likelihood estimators. However, the significance of the results obtained can be questionned when little information is available on the noise statistical properties and when the number of data points is limited. Membership set theory then appears as a promising tool to overcome some of these difficulties. Its purpose is to characterize the set of all the parameter vectors that are consistent with the assumed knowledge of bounds on the acceptable errors between the data and the model outputs. However, the membership set estimators presented in the literature so far are restricted to model linear in the parameters. The approach described here has been designed for handling nonlinear models as well as linear ones. It involves the maximisation of the number of data points that do not have to be considered as outliers and the characterization of the boundary of the domain of the parametric space where this number is maximum. The method is applied to two examples. It is shown to be extremely robust to outliers and to be able to handle even models that are not uniquely Identifiable.

36 citations


Journal ArticleDOI
TL;DR: In this paper, the properties of several jackknife-based estimators are investigated in the context of the nonlinear regression model, and it is shown that a reweighted estimator combined with a modified variance estimate provides a technique that is reasonably robust with respect to outliers, leverage points, and curvature effects.
Abstract: The properties of several jackknife-based estimators are investigated in the context of the nonlinear regression model. These estimators are suggested to overcome lack of balance of the design and the nonlinearity of the model. It is shown that a reweighted estimator combined with a modified variance estimate provides a technique that is reasonably robust with respect to outliers, leverage points, and curvature effects. Several examples are presented to illustrate these properties.

23 citations


Journal ArticleDOI
Ursula Gather1
TL;DR: In this article, the authors consider situations where exactly one contaminated observation from a distribution with possibly larger mean θ/b, with b ∊ (0.1) unknown, may be present, and propose and compare some classes of estimators, namely trimmed and Winsorized estimators based on preceding outlier-tests.
Abstract: Some estimators of the mean θ of an exponential distribution are studied under the assumption of different outlier-generating distribution models. We consider situations where exactly one contaminated observation from a distribution with possibly larger mean θ/b, with b ∊ (0.1) unknown, may be present. Since the estimator then no longer has uniformly minimal mean squared error, we propose and compare some classes of estimators, namely trimmed and Winsorized estimators based on preceding outlier-tests. providing robust estimation of 9 in this situation. Some results of Rauhut (1982) are improved.

19 citations


Journal ArticleDOI
TL;DR: A new procedure is proposed for estimating normal ranges of clinical laboratory tests, which can be applied to data with possibly more than one outlier from healthy subjects, and is selected by the Akaike information criterion.
Abstract: This paper proposes a new procedure for estimating normal ranges of clinical laboratory tests, which can be applied to data with possibly more than one outlier from healthy subjects. The proposed procedure determines the optimal model among a class of models in which it is assumed that an observed distribution of 'normal values' can be transformed to the Gaussian form by one of several specified transformations, and if there exist outliers among the data, then each of the transformed outliers also follows a Gaussian distribution with different mean from, but the same variance as, the transformed distribution of normal values. The optimal model is defined as the best combination of the transformation to normality and the number of outliers identified, and is selected by the Akaike information criterion (AIC). Our procedure is illustrated with data from 200 healthy male subjects on 25 laboratory tests.


Journal ArticleDOI
Abstract: Summary In this paper we consider the multiple outlier problem in time series analysis. The underlying undisturbed time series is assumed to be an autoregressive process. The location of the suspicious values is supposed to be known. We introduce conditional least squares estimators for the parameters. The estimates are shown to be strongly consistent. Using similar arguments as in the theory of linear models, we get a test statistic for the general linear hypothesis. Its asymptotic distribution is derived.


Journal ArticleDOI
TL;DR: The use of sequential uniform residuals is proposed to screen outliers in process control data to control the number of observations incorrectly rejected when the process is in control.
Abstract: The use of sequential uniform residuals is proposed to screen outliers in process control data. The exact distribution theory of these statistics allows a precise control of the number of observations incorrectly rejected when the process is in control ..

Journal ArticleDOI
TL;DR: In this article, five widely used test statistics for detecting outliers and influential observations were studied using Monte Carlo method and the test statistic based on Studentized residuals, with critical values given by Tietjen, Moore and Beckman (1973), appears to be the best procedure for detecting a single outlier in simple linear regression.
Abstract: Five widely used test statistics for detecting outliers and influential observations were studied using Monte Carlo method . The test statistic based on Studentized residuals, with critical values given by Tietjen, Moore and Beckman (1973), appears to be the best procedure for detecting a single outlier in simple linear regression.

Journal ArticleDOI
TL;DR: Andrews (1972) introduced a method of plotting high-dimensional data in two dimensions that is exploited as a graphical technique for the detection of the period and outliers in time series data.
Abstract: Andrews (1972) introduced a method of plotting high-dimensional data in two dimensions. This method is exploited as a graphical technique for the detection of the period and outliers in time series data. Some examples are given.


Journal ArticleDOI
TL;DR: In this article, an approximation for an arbitrary order Taylor expansion in a multivariate setting is presented that can be used to gain insight into the characteristics of a wide class of estimators, including maximum likelihood and least squares.
Abstract: An approximation is presented that can be used to gain insight into the characteristics – such as outlier sensitivity, bias, and variability – of a wide class of estimators, including maximum likelihood and least squares. The approximation relies on a convenient form for an arbitrary order Taylor expansion in a multivariate setting. The implicit function theorem can be used to construct the expansion when the estimator is not defined in closed form. We present several finite-sample and asymptotic properties of such Taylor expansions, which are useful in characterizing the difference between the estimator and the expansion.

Journal ArticleDOI
TL;DR: In this article, recursive residuals are extended to the case that observations may occur in groups, which are ordered in time, but within which there is no time ordering, and analogous quantities are considered for two-dimensional "time", for the balanced twoway table.
Abstract: . Recursive residuals are extended to the case that observations may occur in groups, which are ordered in time, but within which there is no time ordering. Analogous quantities are considered for two–dimensional ‘time’, for the balanced two–way table. Examples of their use are given.

Journal ArticleDOI
TL;DR: In this article, the number of points is assumed known apriori and then this assumption is relaxed so that both the position and number of possible aberrant observations is unknown, and Monte Carlo evidence is provided on the performance of both tests and they are found to perform reasonably.
Abstract: This paper considers procedures for the detection of heteroscedasticity when it occurs at unknown points. First the number of points is assumed known apriori and then this assumption is relaxed so that both the position and the number of possible aberrant observations is unknown. Monte Carlo evidence is provided on the performance of both tests and they are found to perform reasonably.

Journal ArticleDOI
TL;DR: In this article, a simple estimation of the exponential mean parameter in small samples in the presence of outliers is discussed, and the Huber-type estimator performs quite well in all situations of censoring and contarnination.
Abstract: This paper studies methods for simple estimation of the exponential mean parameter in small samples in the presence of outliers. Existing estimation methods are discussed. Adaptation of these methods to allow for Type I censoring is investigated. New robust procedures are proposed. A series of simulation experiments Indicate trimming provides significant protection against outliers while the premium is usually small when trimming uncontarninated samples. A linearly weighted mean is recommended for uncontarninated samples, both censored and complete. In larger samples, (n - 20), the proposed Huber-type estimator performs quite well in all situations of censoring and contarnination

Journal ArticleDOI
TL;DR: In this article, a new measure of outlier resistance based on Huber's maximum bias was proposed, which is essentially independent of the sample size and depends on the fraction of contamination.
Abstract: SUMMARY This paper presents a new measure of outlier resistance based on Huber's maximum bias. After standardizing by the maximum bias of the median for the same situation, outlier resistance seems to be essentially independent of n and depends on the fraction of contamination. The advantages over alternative measures of resistance involve this relative stability with the sample size, and the ability to measure resistance to a large fraction of outliers. As an example, this method of measuring outlier resistance is applied to a comparison of a 20% trimmed mean with a Huber M-estimator.

Journal ArticleDOI
TL;DR: In this paper, the analytical methods used are critically reviewed and the normality of the results is tested, and if the results can be accepted, arithmetic mean, its confidence interval and coefficient of the precision are computed.
Abstract: For evaluating the analytical data on reference samples, the following approach is proposed: 1. Analytical methods used are critically reviewed. 2. Outliers are eliminated. 3. Normality of the results is tested. 4. If normality of the results can be accepted, arithmetic mean, its confidence interval and coefficient of the precision are computed. 5. If the results are not normally distributed, median and its statistical characteristics are given. 6. For sets of data with anomalous skewness, logarithmic and lambda transformations are useful approaches.

01 Mar 1986
TL;DR: In this paper, the problem of outliers in circular data is studied from a Bayesian point of view, where surprising observations are identified by means of a predictive measure, and the mean shift model and some aspects of the contamination of the concentration parameter for a Von Mises distribution are analyzed.
Abstract: The problem of outliers in circular data is studied from a Bayesian point of view. Susprising observations are identified by means of a predictive measure. On the basis of Box-Tiao methodology, the mean-shift model and some aspects of the contamination of the concentration parameter for a Von Mises distribution are analyzed. Intuitive aspects of the resultant weights and their applications in some classical examples are included

Book ChapterDOI
01 Jan 1986
TL;DR: A problem of detecting a shift in regression when it is masked by outliers is considered, and results of a simulation study comparing several tests and estimates of the change point are summarized.
Abstract: Robust recursive estimation provides considerable computational advantage over iterative robust regression estimation, especially for large and ordered (e.g., with time) data sets. The robust recursive estimates are less sensitive than recursive least squares to the outliers and structural shifts, and produce residuals which are more effective in constructing tests for detecting a shift. In this paper we consider a problem of detecting a shift in regression when it is masked by outliers, and summarize results of a simulation study comparing several tests and estimates of the change point.

23 Sep 1986
TL;DR: This work considers a stationary Gaussian information process transmitted through an additive noise channel, and develops a theory for outlier resistant filtering and smoothing operations, which combine excellent performance at the nominal model with strong resistance to outliers.
Abstract: : We consider a stationary Gaussian information process transmitted through an additive noise channel We assume that the noise and information processes are mutually independent, and we mode the noise process as nominally Gaussian with additive outliers For the above system model, we first develop a theory for outlier resistant filtering and smoothing operations We then design specific such nonlinear operations, and we study their performance The performance criteria are the asymptotic mean squared error at the Gaussian nominal model, the breakdown point, and the influence function We find that our operations combine excellent performance at the nominal model with strong resistance to outliers

Journal ArticleDOI
TL;DR: The impact of including outliers in the analyses and the use of the aforementioned regression techniques to detect the atypical values are illustrated using three examples derived from actual experimental data.


Journal ArticleDOI
TL;DR: The conclusions are that the combination of the principal component analysis method and the method by Mahalanobis's distance is useful in saving manpower for routine judgment of the same data produced in the same line for several measured items.
Abstract: Research and development of LSIs require rapid evaluation of the characteristics of fabricated devices For this purpose automatic measurement systems have been developed in various laboratories Some outlying data unavoidably exist in the data collected by the automatic data acquisition system, and it is necessary to judge the outliers in data processing In this paper two algorithms to judge outliers are examined and one new algorithm is presented They applied for Si wafer inspection data collected automatically The three algorithms are the outlier judgment method proposed by Grubbs, the judgment method using Mahalanobis's distance and the method combining the principal component analysis method with the latter method The conclusions are as follows: (1) Grubb's method is easy to use for one kind of measured items (1) Mahalanobis's method is useful for several kinds of measured items and is especially effective when they are correlated (3) The combination of the principal component analysis method and the method by Mahalanobis's distance is useful in saving manpower for routine judgment of the same data produced in the same line for several measured items