scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1999"


Journal ArticleDOI
TL;DR: While robust estimators have been used in a variety of computer vision applications, three are considered here: in analysis of range images, which has been used successfully to estimate surface model parameters in small image regions, and stereo and motion analysis, which characterizes the relative imaging geometry of two cameras imaging the same scene.
Abstract: Estimation techniques in computer vision applications must estimate accurate model parameters despite small-scale noise in the data, occasional large-scale measurement errors (outliers), and measurements from multiple populations in the same data set. Increasingly, robust estimation techniques, some borrowed from the statistics literature and others described in the computer vision literature, have been used in solving these parameter estimation problems. Ideally, these techniques should effectively ignore the outliers and measurements from other populations, treating them as outliers, when estimating the parameters of a single population. Two frequently used techniques are least-median of squares (LMS) [P. J. Rousseeuw, {J. Amer. Statist. Assoc., 79 (1984), pp. 871--880] and M-estimators [Robust Statistics: The Approach Based on Influence Functions, F. R. Hampel et al., John Wiley, 1986; Robust Statistics, P. J. Huber, John Wiley, 1981]. LMS handles large fractions of outliers, up to the theoretical limit of 50% for estimators invariant to affine changes to the data, but has low statistical efficiency. M-estimators have higher statistical efficiency but tolerate much lower percentages of outliers unless properly initialized. While robust estimators have been used in a variety of computer vision applications, three are considered here. In analysis of range images---images containing depth or X, Y, Z measurements at each pixel instead of intensity measurements---robust estimators have been used successfully to estimate surface model parameters in small image regions. In stereo and motion analysis, they have been used to estimate parameters of what is called the ''fundamental matrix,'' which characterizes the relative imaging geometry of two cameras imaging the same scene. Recently, robust estimators have been applied to estimating a quadratic image-to-image transformation model necessary to create a composite, ''mosaic image'' from a series of images of the human retina. In each case, a straightforward application of standard robust estimators is insufficient, and carefully developed extensions are used to solve the problem.

504 citations


Journal ArticleDOI
TL;DR: The authors compared empirical type I error and power of different permutation techniques for the test of significance of a single partial regression coefficient in a multiple regression model, using simulations, and found that two methods that had been identified as equivalent formulations of permutation under the reduced model were actually quite different.
Abstract: This study compared empirical type I error and power of different permutation techniques for the test of significance of a single partial regression coefficient in a multiple regression model, using simulations. The methods compared were permutation of raw data values, two alternative methods proposed for permutation of residuals under the reduced model, and permutation of residuals under the full model. The normal-theory t-test was also included in simulations. We investigated effects of (1) the sample size, (2) the degree of collinearity between the predictor variables, (3) the size of the covariable’s parameter, (4) the distribution of the added random error and (5) the presence of an outlier in the covariable on these methods. We found that two methods that had been identified as equivalent formulations of permutation under the reduced model were actually quite different. One of these methods resulted in consistently inflated type 1 error. In addition, when the covariable contained an extreme outlier,...

403 citations


Book ChapterDOI
15 Sep 1999
TL;DR: This paper formally introduces a new notion of outlier which bases outlier detection on the same theoretical foundation as density-based cluster analysis, and demonstrates that this notion of an outlier is more appropriate for detecting different types of outliers than previous approaches.
Abstract: For many KDD applications finding the outliers, i.e. the rare events, is more interesting and useful than finding the common cases, e.g. detecting criminal activities in E-commerce. Being an outlier, however, is not just a binary property. Instead, it is a property that applies to a certain degree to each object in a data set, depending on how ‘isolated’ this object is, with respect to the surrounding clustering structure. In this paper, we formally introduce a new notion of outliers which bases outlier detection on the same theoretical foundation as density-based cluster analysis. Our notion of an outlier is ‘local’ in the sense that the outlier-degree of an object is determined by taking into account the clustering structure in a bounded neighborhood of the object. We demonstrate that this notion of an outlier is more appropriate for detecting different types of outliers than previous approaches, and we also present an algorithm for finding them. Furthermore, we show that by combining the outlier detection with a density-based method to analyze the clustering structure, we can get the outliers almost for free if we already want to perform a cluster analysis on a data set.

183 citations


Journal ArticleDOI
01 Jun 1999
TL;DR: It is shown that a more principled approach may be taken using extreme value statistics in the area of novelty detection, and points that lie outside of the range of expected extreme values may be flagged as outliers.
Abstract: Extreme value theory is a branch of statistics that concerns the distribution of data of unusually low or high value, i.e. in the tails of some distribution. These extremal points are important in many applications as they represent the outlying regions of normal events against which we may wish to define abnormal events. In the context of density modelling, novelty detection or radial-basis function systems, points that lie outside of the range of expected extreme values may be flagged as outliers. There has been interest in the area of novelty detection, but decisions as to whether a point is an outlier or not tend to be made on the basis of exceeding some (heuristic) threshold. It is shown that a more principled approach may be taken using extreme value statistics.

180 citations


Journal ArticleDOI
TL;DR: In this article, the robust estimation of geodetic datum transformation is discussed, where the robust initial estimates of the transformation parameters should have a high breakdown point in order to provide reliable residuals for the following estimation.
Abstract: The robust estimation of geodetic datum transformation is discussed. The basic principle of robust estimation is introduced. The error influence functions of the robust estimators, together with those of least-squares estimators, are given. Particular attention is given to the robust initial estimates of the transformation parameters, which should have a high breakdown point in order to provide reliable residuals for the following estimation. The median method is applied to solve for robust initial estimates of transformation parameters since it has the highest breakdown point. A smooth weight function is then used to improve the efficiency of the parameter estimates in successive iterative computations. A numerical example is given on a datum transformation between a global positioning system network and the corresponding geodetic network in China. The results show that when the coordinates are contaminated by outliers, the proposed method can still give reasonable results.

149 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider simultaneous outlier identification rules for multivariate data, generalizing the concept of α outlier identifiers, as presented by Davies and Gather for the case of univariate samples, and investigate how the finite-sample breakdown points of estimators used in these identification rules influence the masking behavior of the rules.
Abstract: In this article, we consider simultaneous outlier identification rules for multivariate data, generalizing the concept of so-called α outlier identifiers, as presented by Davies and Gather for the case of univariate samples. Such multivariate outlier identifiers are based on estimators of location and covariance. Therefore, it seems reasonable that characteristics of the estimators influence the behavior of outlier identifiers. Several authors mentioned that using estimators with low finite-sample breakdown point is not recommended for identifying outliers. To give a formal explanation, we investigate how the finite-sample breakdown points of estimators used in these identification rules influence the masking behavior of the rules.

140 citations


Journal ArticleDOI
TL;DR: A class of weighted bootstrap techniques, called biased bootstrap or b-bootstrap methods, is introduced in this article, which is motivated by the need to adjust empirical methods, such as the uniform bootstrap, in a surgical way to alter some of their features while leaving others unchanged.
Abstract: Summary. A class of weighted bootstrap techniques, called biased bootstrap or b-bootstrap methods, is introduced. It is motivated by the need to adjust empirical methods, such as the ‘uniform’ bootstrap, in a surgical way to alter some of their features while leaving others unchanged. Depending on the nature of the adjustment, the b-bootstrap can be used to reduce bias, or to reduce variance or to render some characteristic equal to a predetermined quantity. Examples of the last application include a b-bootstrap approach to hypothesis testing in nonparametric contexts, where the b-bootstrap enables simulation ‘under the null hypothesis’, even when the hypothesis is false, and a b-bootstrap competitor to Tibshirani’s variance stabilization method. An example of the bias reduction application is adjustment of Nadaraya‐Watson kernel estimators to make them competitive with local linear smoothing. Other applications include density estimation under constraints, outlier trimming, sensitivity analysis, skewness or kurtosis reduction and shrinkage.

140 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose outlier robust tests for smoothing transition autoregressive (STAR) nonlinearity, which are designed such that they have a better level and power behavior than standard nonrobust tests in situations with outliers.
Abstract: Regime-switching models, like the smooth transition autoregressive [STAR] model, are typically applied to time series of moderate length. Hence, the nonlinear features that these models intend to describe may be reflected in only a few observations. Conversely, neglected outliers in a linear time series of moderate length may incorrectly suggest STAR (or other) type(s of) nonlinearity. In this article we propose outlier robust tests for STAR-type nonlinearity. These tests are designed such that they have a better level and power behavior than standard nonrobust tests in situations with outliers. We formally derive local and global robustness properties of the new tests. Extensive Monte Carlo simulations show the practical usefulness of the robust tests. An application to several quarterly industrial production indexes illustrates that apparent nonlinearity in time series sometimes seems due to only a few outliers.

119 citations


Journal ArticleDOI
TL;DR: In this paper, two procedures for unit root testing in the presence of additive outliers are proposed. The first procedure is to ignore the possibility of additive outlier and use modified Phillips-Perron statistics, and the second procedure uses a simple outlier detection statistic to identify outliers and then properly adjust standard Dickey-Fuller unit root tests.
Abstract: This paper presents some results on testing for a unit root in the presence of additive outliers. Two procedures are proposed. The first procedure is to ignore the possibility of additive outliers and use modified Phillips–Perron statistics. The second procedure uses a new and very simple outlier detection statistic to identify outliers and then properly adjust standard Dickey–Fuller unit root tests. Simulations show that these procedures are robust to additive outliers in terms of size and power.

107 citations


01 Jun 1999
TL;DR: The findings presented in this report establish the traffic flow prediction superiority of seasonal time series methods, especially seasonal ARIMA modeling, over the recently developed methods.
Abstract: Extensive data collection is now commonplace for urban freeway and street network systems. Research efforts are underway to unleash the system management potential inherent in this unprecedented access to traffic condition data. A key element in this research area is traffic condition forecasting. Reliable and accurate condition forecasts will enable transportation management systems to dynamically anticipate the future state of the system rather than merely respond to the current situation. Recent traffic flow prediction efforts have focused on application of neural networks, nonparametric regression using nearest neighbor algorithms, multiple class linear regression based on automatic clustering, time series analysis techniques, and hybrid models combining two or more of these approaches. Seasonal time series methods, such as seasonal ARIMA models and Holt-Winters smoothing, have not been among the investigated forecasting techniques. The need to explore these techniques is motivated by a strong theoretical expectation that they will provide accurate and parsimonious traffic condition models. The findings presented in this report establish the traffic flow prediction superiority of seasonal time series methods, especially seasonal ARIMA modeling, over the recently developed methods. The research also contributes a specific application of time series outlier modeling theory to vehicular traffic flow data. This outlier detection and modeling procedure uncovered a common ARIMA model form among the seasonally stationary series used in this research. This common model form is ARIMA (1,0,1)(O,1,1)S, where S is the length of the series seasonal cycle. A glossary of terms is included.

103 citations


Journal ArticleDOI
TL;DR: The proper likelihood for the AO approach in the general non-stationary case is developed and the equivalence of this and the skipping method is shown, and the two methods are compared through simulation and their relative advantages assessed.

Journal ArticleDOI
TL;DR: In this paper, the root mean square error of prediction (RMSEP) is used as a criterion for judging the performance of a multivariate calibration model; often it is even the sole criterion.

Journal ArticleDOI
TL;DR: A robust wavelet thresholding technique based on the minimax description length (MMDL) principle is derived and a novel approach to selecting an adapted or best basis (BB) that results in optimal signal reconstruction is proposed.
Abstract: Approaches to wavelet-based denoising (or signal enhancement) have generally relied on the assumption of normally distributed perturbations. To relax this assumption, which is often violated in practice, we derive a robust wavelet thresholding technique based on the minimax description length (MMDL) principle. We first determine the least favorable distribution in the /spl epsiv/-contaminated normal family as the member that maximizes the entropy. We show that this distribution, and the best estimate based upon it, namely the maximum-likelihood estimate, together constitute a saddle point. The MMDL approach results in a thresholding scheme that is resistant to heavy tailed noise. We further extend this framework and propose a novel approach to selecting an adapted or best basis (BB) that results in optimal signal reconstruction. Finally, we address the practical case where the underlying signal is known to be bounded, and derive a two-sided thresholding technique that is resistant to outliers and has bounded error.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a computationally cheaper feasibility condition for LTS, MVE and MCD, and showed how the combination of the criteria leads to improved performance on large data sets.

Journal ArticleDOI
TL;DR: Comparison of the results from the least-squares method with those of the robust method shows that the results of the station systematic errors from the robust estimator are more reliable.
Abstract: Methods for analyzing laser-ranging residuals to estimate station-dependent systematic errors and to eliminate outliers in satellite laser ranges are discussed. A robust estimator based on an M-estimation principle is introduced. A practical calculation procedure which provides a robust criterion with high breakdown point and produces robust initial residuals for following iterative robust estimation is presented. Comparison of the results from the least-squares method with those of the robust method shows that the results of the station systematic errors from the robust estimator are more reliable.

Journal ArticleDOI
TL;DR: This work uses an appearance based object representation, namely the parametric eigenspace, but the planning algorithm is actually independent of the details of the specific object recognition environment, so that the probabilistic implementation always outperforms the other approaches.
Abstract: One major goal of active object recognition systems is to extract useful information from multiple measurements. We compare three frameworks for information fusion and view-planning using different uncertainty calculi: probability theory, possibility theory and Dempster-Shafer theory of evidence. The system dynamically repositions the camera to capture additional views in order to improve the classification result obtained from a single view. The active recognition problem can be tackled successfully by all the considered approaches with sometimes only slight differences in performance. Extensive experiments confirm that recognition rates can be improved considerably by performing active steps. Random selection of the next action is much less efficient than planning, both in recognition rate and in the average number of steps required for recognition. As long as the rate of wrong object-pose classifications stays low the probabilistic implementation always outperforms the other approaches. If the outlier rate increases averaging fusion schemes outperform conjunctive approaches for information integration. We use an appearance based object representation, namely the parametric eigenspace, but the planning algorithm is actually independent of the details of the specific object recognition environment.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: An effective and novel approach to infer sign and direction of principal curvatures at each input site from noisy 3D data that is robust to considerable amounts of outlier noise as its effect is reduced by collecting a large number of tensor votes.
Abstract: We describe an effective and novel approach to infer sign and direction of principal curvatures at each input site from noisy 3D data. Unlike most previous approaches, no local surface fitting, partial derivative computation of any kind, nor oriented normal vector recovery is performed in our method. These approaches are noise-sensitive since accurate, local, partial derivative information is often required, which is usually unavailable from real data because of the unavoidable outlier noise inherent in many measurement phases. Also, we can handle points with zero Gaussian curvature uniformly (i.e., without the need to localize and handle them first as a separate process). Our approach is based on Tensor Voting, a unified, salient structure inference process. Both the sign and the direction of principal curvatures are inferred directly from the input. Each input is first transformed into a synthetic tensor A novel and robust approach based on tensor voting is proposed for curvature information estimation. With faithfully inferred curvature information, each input ellipsoid is aligned with curvature-based dense tensor kernels to produce a dense tensor field. Surfaces and crease curves are extracted from this dense field, by using an extremal feature extraction process. The computation is non-iterative, does not require initialization, and robust to considerable amounts of outlier noise as its effect is reduced by collecting a large number of tensor votes. qualitative and quantitative results on synthetic as well as real and complex data are presented.

Journal ArticleDOI
TL;DR: A procedure for computing a fast approximation to regression estimates based on the minimization of a robust scale that allows identification of multiple outliers, avoiding masking effe...
Abstract: We propose a procedure for computing a fast approximation to regression estimates based on the minimization of a robust scale. The procedure can be applied with a large number of independent variables where the usual algorithms require an unfeasible or extremely costly computer time. Also, it can be incorporated in any high-breakdown estimation method and may improve it with just little additional computer time. The procedure minimizes the robust scale over a set of tentative parameter vectors estimated by least squares after eliminating a set of possible outliers, which are obtained as follows. We represent each observation by the vector of changes of the least squares forecasts of the observation when each of the data points is deleted. Then we obtain the sets of possible outliers as the extreme points in the principal components of these vectors, or as the set of points with large residuals. The good performance of the procedure allows identification of multiple outliers, avoiding masking effe...

Journal ArticleDOI
TL;DR: This work develops and proposes two new types of residuals: the suggested log-odds and normal deviate residuals are simple and intuitively appealing and their theoretical properties and empirical performance make them very suitable for outlier identification.
Abstract: Summary. The identification of individuals who ‘died far too early’ or ‘lived far too long’ as compared to their survival probabilities from a Cox regression can lead to the detection of new prognostic factors. Methods to identify outliers are generally based on residuals. For Cox regression, only deviance residuals have been considered for this purpose, but we show that these residuals are not very suitable. Instead, we develop and propose two new types of residuals: the suggested log-odds and normal deviate residuals are simple and intuitively appealing and their theoretical properties and empirical performance make them very suitable for outlier identification. Finally, various practical aspects of screening for individuals with outlying survival times are discussed by means of a cancer study example.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: A new approach is proposed that allows, by introducing additional variables, to model the outliers and to detect their presence, and allows for extensions that cannot be handled by the usual robust regression methods.
Abstract: When recording data, large errors may occur occasionally. The corresponding abnormal data points, called outliers, can have drastic effects on the estimates. There are several ways to cope with outliers-detect and delete or adjust the erroneous data,-use a modified cost function. We propose a new approach that allows, by introducing additional variables, to model the outliers and to detect their presence. In the standard linear regression model this leads to a linear inverse problem that, associated with a criterion that ensures sparseness, is solved by a quadratic programming algorithm. The new approach (model+criterion) allows for extensions that cannot be handled by the usual robust regression methods.

Journal ArticleDOI
01 Dec 1999-Test
TL;DR: In this article, the authors define a new notion of order statistics and ranks for multivariate data based on density estimation and define a class of multivariate estimators of location that can be regarded as multivariate L-estimators.
Abstract: In one dimension, order statistics and ranks are widely used because they form a basis for distribution free tests and some robust estimation procedures. In more than one dimension, the concept of order statistics and ranks is not clear and several definitions have been proposed in the last years. The proposed definitions are based on different concepts of depth. In this paper, we define a new notion of order statistics and ranks for multivariate data based on density estimation. The resulting ranks are invariant under affinc transformations and asymptotically distribution free. We use the corresponding order statistics to define a class of multivariate estimators of location that can be regarded as multivariate L-estimators. Under mild assumptions on the underlying distribution, we show the asymptotic normality of the estimators. A modification of the proposed estimates results in a high breakdown point procedure that can deal with patches of outliers. The main idea is to order the observations according to their likelihoodf(X 1),...,f(X n ). If the densityf happens to be cllipsoidal, the above ranking is similar to the rankings that are derived from the various notions of depth. We propose to define a ranking based on a kernel estimate of the densityf. One advantage of estimating the likelihoods is that the underlying distribution does not need to have a density. In addition, because the approximate likelihoods are only used to rank the observations, they can be derived from a density estimate using a fixed bandwidth. This fixed bandwidth overcomes the curse of dimensionality that typically plagues density estimation in high dimension.

Journal ArticleDOI
TL;DR: Examples show that cluster analysis, the technique most widely used for this purpose, can fail to reveal outliers clearly identified by other methods.
Abstract: Multivariate statistical analysis of artefact compositional data, usually undertaken to investigate structure in the data, often incidentally reveals the presence of multivariate outliers Much statistical methodology dealing with the detection of such outliers is not well suited to archaeometric data that, in the event, consist of two or more groups The paper provides examples to illustrate the importance of detecting and dealing with outliers, and critically examines a range of different approaches to outlier detection The examples show that cluster analysis, the technique most widely used for this purpose, can fail to reveal outliers clearly identified by other methods

Journal ArticleDOI
TL;DR: The effectiveness of the suggested method in detecting masked multiple outliers, and more generally in ordering spatial data, is shown by means of a number of simulated datasets that reveal the power of the method in getting inside the data in a way which is more simple and powerful than it would be using standard diagnostic procedures.
Abstract: In this article we suggest a unified approach to the exploratory analysis of spatial data. Our technique is based on a forward search algorithm that orders the observations from those most in agreement with a specified autocorrelation model to those least in agreement with it. This leads to the identification of spatial outliers—that is, extreme observations with respect to their neighboring values—and of nonstationary pockets. In particular, the focus of our analysis is on spatial prediction models. We show that standard deletion diagnostics for prediction are affected by masking and swamping problems when multiple outliers are present. The effectiveness of the suggested method in detecting masked multiple outliers, and more generally in ordering spatial data, is shown by means of a number of simulated datasets. These examples clearly reveal the power of our method in getting inside the data in a way which is more simple and powerful than it would be using standard diagnostic procedures. Further...

Journal ArticleDOI
TL;DR: A new selective training method is proposed that controls the influence of outliers in the training data on the generated models, and the resulting models are shown to possess feature statistics which are more clearly separated for confusable patterns.
Abstract: Traditional maximum likelihood estimation of hidden Markov model parameters aims at maximizing the overall probability across the training tokens of a given speech unit. As such, it disregards any interaction or biases across the models in the training procedure. Often, the resulting model parameters do not result in minimum error classification in the training set. A new selective training method is proposed that controls the influence of outliers in the training data on the generated models. The resulting models are shown to possess feature statistics which are more clearly separated for confusable patterns. The proposed selective training procedure is used for hidden Markov model training, with application to foreign accent classification, language identification, and speech recognition using the E-set alphabet. The resulting error rates are measurably improved over traditional forward-backward training under open test conditions. The proposed method is similar in terms of its goal to maximum mutual information estimation training, however it requires less computation, and the convergence properties of maximum likelihood estimation are retained in the new formulation.

16 Sep 1999
TL;DR: This thesis considers three extensions of the basic smooth transition model and the influence of atypical observations on testing procedures for smooth transition non-linearity and on the estimation of smooth transition models.
Abstract: textThe dynamic properties of many economic time series variables can be characterised as state-dependent or regime-switching. A popular model to describe this type of non-linear behaviour is the smooth transition model, which accommodates two regimes facilitating a gradual transition from one regime to the other. The first part of this thesis considers three extensions of the basic smooth transition model. Models are developed which allow for more than two regimes, for time-varying properties in conjunction with regime-switching behaviour, and for modeling several time series jointly. Particular emphasis is placed on the inter-related issues of specification and inference in such models. The second part of the thesis concerns the influence of atypical observations on testing procedures for smooth transition non-linearity and on the estimation of smooth transition models. Traditional methods that are used for these purposes are found to be very sensitive to such outliers. Therefore, outlier robust testing procedures and estimation methods are developed

Posted Content
TL;DR: In this paper, the authors introduce a family of classifiers based on order statistics for robust handling of such cases, and derive expressions for the reductions in error expected when such combiners are used.
Abstract: Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In the typical setting investigated till now, each classifier is trained on data taken or resampled from a common data set, or (almost) randomly selected subsets thereof, and thus experiences similar quality of training data. However, in certain situations where data is acquired and analyzed on-line at several geographically distributed locations, the quality of data may vary substantially, leading to large discrepancies in performance of individual classifiers. In this article we introduce and investigate a family of classifiers based on order statistics, for robust handling of such cases. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when such combiners are used. We show analytically that the selection of the median, the maximum and in general, the $i^{th}$ order statistic improves classification performance. Furthermore, we introduce the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that they are quite beneficial in presence of outliers or uneven classifier performance. Experimental results on several public domain data sets corroborate these findings.

Journal ArticleDOI
18 Oct 1999-Metrika
TL;DR: In this paper, the authors consider penalized likelihood smoothers, which are estimators which maximize penalized probabilities or posterior densities for additive and innovations outlier models, using Fisher scoring steps or iterative Kalman-type filters.
Abstract: In likelihood-based approaches to robustify state space models, Gaussian error distributions are replaced by non-normal alternatives with heavier tails. Robustified observation models are appropriate for time series with additive outliers, while state or transition equations with heavy-tailed error distributions lead to filters and smoothers that can cope with structural changes in trend or slope caused by innovations outliers. As a consequence, however, conditional filtering and smoothing densities become analytically intractable. Various attempts have been made to deal with this problem, reaching from approximate conditional mean type estimation to fully Bayesian analysis using MCMC simulation. In this article we consider penalized likelihood smoothers, this means estimators which maximize penalized likelihoods or, equivalently, posterior densities. Filtering and smoothing for additive and innovations outlier models can be carried out by computationally efficient Fisher scoring steps or iterative Kalman-type filters. Special emphasis is on the Student family, for which EM-type algorithms to estimate unknown hyperparameters are developed. Operational behaviour is illustrated by simulation experiments and by real data applications.

Journal ArticleDOI
TL;DR: The outlier issue in experiments that are carefully planned for the purpose of response surface exploration, in which a second order polynomial is fitted to the measurements, is discussed, to exhibit two possible approaches.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a general approach to alleviate the problem of contamination of a sampled distribution by a heavy-tailed distribution, which can degrade the performance of a statistical estimator.
Abstract: Contamination of a sampled distribution, for example by a heavy-tailed distribution, can degrade the performance of a statistical estimator. We suggest a general approach to alleviating this problem, using a version of the weighted bootstrap. The idea is to 'tilt' away from the contaminated distribution by a given (but arbitrary) amount, in a direction that minimizes a measure of the new distribution's dispersion. This theoretical proposal has a simple empirical version, which results in each data value being assigned a weight according to an assessment of its influence on dispersion. Importantly, distance can be measured directly in terms of the likely level of contamination, without reference to an empirical measure of scale. This makes the procedure particularly attractive for use in multivariate problems. It has several forms, depending on the definitions taken for dispersion and for distance between distributions. Examples of dispersion measures include variance and generalizations based on high order moments. Practicable measures of the distance between distributions may be based on power divergence, which includes Hellinger and Kullback-Leibler distances. The resulting location estimator has a smooth, redescending influence curve and appears to avoid computational difficulties that are typically associated with redescending estimators. Its breakdown point can be located at any desired value ∈ ∈ (0, 1/2) simply by 'trimming' to a known distance (depending only on ∈ and the choice of distance measure) from the empirical distribution. The estimator has an affine equivariant multivariate form. Further, the general method is applicable to a range of statistical problems, including regression.

Journal ArticleDOI
TL;DR: This report proposes the use of a robust estimator that is more tolerant of outliers in the reference population data and does not require as large a sample size as the nonparametric calculation method, nor does it require the reference data to be transformed to a gaussian distribution, which is not always possible.
Abstract: The clinical chemist is faced with the problem of defining reference intervals for many analytes. Problems that hinder such a determination are the presence of outliers in the data set and the inability to accumulate the recommended sample size (1). We previously have demonstrated the theoretical basis for the application of robust methods to resolve these problems (2). In particular, we consider the problem of establishing that the reference population is “healthy” to be a nearly impossible task because many disease processes may be missed in the examination process. Our own experience shows that diabetics were initially classified as healthy in our test population, and only after a thorough review of all of the data was it possible to elicit the presence of this disease. In this report, we propose the use of a robust estimator we have described previously (2). The advantage of this approach is that it is more tolerant of outliers in the reference population data and does not require as large a sample size as the nonparametric calculation method, nor does it require the reference data to be transformed to a gaussian distribution, which is not always possible. We then apply and compare this robust estimator with both the traditional nonparametric and parametric analysis in determining reference intervals for a well-studied population. The Fernald Medical Monitoring Program provided us with a documented healthy sample (T) to test the three methods: parametric, nonparametric, and robust. Our computer-generated sample (W) offered the possibility to test our estimates of reference intervals in a population with a greater potential incidence of diseases (3). The robust approach offered the opportunity to look at …