scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1997"


Journal ArticleDOI
TL;DR: For outlier discrimination or down-weighting, sample median values have the advantage of being much less outlier-based than sample mean values would be.
Abstract: Experience with a variety of diffraction data-reduction problems has led to several strategies for dealing with mismeasured outliers in multiply measured data sets. Key features of the schemes employed currently include outlier identification based on the values ymedian = median(|Fi|2), σmedian = median[σ(|Fi|2)], and |Δ|median = median(|Δi|) = median[||Fi|2-median (|Fi|2)|] in samples with i = 1, 2 ..... n and n ≥ 2 measurements; and robust/resistant averaging weights based on values of |zi| = |Δi|/max{σmedian, |Δ|median[n/(n−1)]1/2}. For outlier discrimination or down-weighting, sample median values have the advantage of being much less outlier-based than sample mean values would be.

1,263 citations


Journal ArticleDOI
TL;DR: A variety of robust methods for the computation of the Fundamental Matrix, the calibration-free representation of camera motion, are developed from the principal categories of robust estimators, viz. case deletion diagnostics, M-estimators and random sampling, and the theory required to apply them to non-linear orthogonal regression problems is developed.
Abstract: This paper has two goals The first is to develop a variety of robust methods for the computation of the Fundamental Matrix, the calibration-free representation of camera motion The methods are drawn from the principal categories of robust estimators, viz case deletion diagnostics, M-estimators and random sampling, and the paper develops the theory required to apply them to non-linear orthogonal regression problems Although a considerable amount of interest has focussed on the application of robust estimation in computer vision, the relative merits of the many individual methods are unknown, leaving the potential practitioner to guess at their value The second goal is therefore to compare and judge the methods Comparative tests are carried out using correspondences generated both synthetically in a statistically controlled fashion and from feature matching in real imagery In contrast with previously reported methods the goodness of fit to the synthetic observations is judged not in terms of the fit to the observations per se but in terms of fit to the ground truth A variety of error measures are examined The experiments allow a statistically satisfying and quasi-optimal method to be synthesized, which is shown to be stable with up to 50 percent outlier contamination, and may still be used if there are more than 50 percent outliers Performance bounds are established for the method, and a variety of robust methods to estimate the standard deviation of the error and covariance matrix of the parameters are examined The results of the comparison have broad applicability to vision algorithms where the input data are corrupted not only by noise but also by gross outliers

844 citations


Proceedings Article
14 Aug 1997
TL;DR: A unified outlier detection system can replace a whole spectrum of statistical discordancy tests with a single module detecting only the kinds of outliers proposed.
Abstract: As said in signal processing, "One person's noise is another person's signal." For many applications, such as the exploration of satellite or medical images, and the monitoring of criminal activities in electronic commerce, identifying exceptions can often lead to the discovery of truly unexpected knowledge. In this paper, we study an intuitive notion of outliers. A key contribution of this paper is to show how the proposed notion of outliers unifies or generalizes many existing notions of outliers provided by discordancy tests for standard statistical distributions. Thus, a unified outlier detection system can replace a whole spectrum of statistical discordancy tests with a single module detecting only the kinds of outliers proposed. A second contribution of this paper is the development of an approach to find all outliers in a dataset. The structure underlying this approach resembles a data cube, which has the advantage of facilitating integration with the many OLAP and data mining systems using data cubes.

319 citations


Journal ArticleDOI
TL;DR: This method of outlier handling combined with the classifier is applied to the well-known problem of automatic, constrained classification of chromosomes into their biological classes and it is shown that it decreases the error rate relative to the classical, normal, model by more than 50%.

164 citations


Book
17 Sep 1997
TL;DR: In this article, the authors discuss the importance of choosing the right statistics and the analysis of variance in a regression line confidence interval for the slope, intercept, and predicted value of the predicted value.
Abstract: Introduction - choosing the right statistics Part 1 Descriptive statistics: cumulative frequency histogram frequency polygon cumulative distribution frequency curve random sample Part 2 Distribution descriptives: measures of location measures of dispersion skewness Kurtosis Part 3 Probability distributions: the normal distribution skew distributions Part 4 Confidence limits Part 5 Accuracy and precision: accuracy precision Part 6 Significance testing: F-test the student "t"-test Part 7 Outlier tests: the Dixon test the Grubbs tests the Cochran test the Bartlett test robust statistics Part 8 The analysis of variance: nature of variation one way analysis of variance two way analysis of variance Part 9 Linear regression: correlation coefficient variation in a result estimated from a regression line confidence intervals for the slope, intercept and predicted value Part 10 Polynomial regression Part 11 Repeatability standard deviation Part 12 Reproducibility standard deviation: repeatability (r) reproducibility (R) Part 13 Analytical quality control: control charts application of control charts Part 14 Statistical sampling Appendices

160 citations


Journal ArticleDOI
TL;DR: A robust algorithm for model selection in regression models using Shao's cross-validation methods for choice of variables as a starting point is provided, demonstrating a substantial improvement in choosing the correct model in the presence of outliers with little loss of efficiency at the normal model.
Abstract: This article gives a robust technique for model selection in regression models, an important aspect of any data analysis involving regression. There is a danger that outliers will have an undue influence on the model chosen and distort any subsequent analysis. We provide a robust algorithm for model selection using Shao's cross-validation methods for choice of variables as a starting point. Because Shao's techniques are based on least squares, they are sensitive to outliers. We develop our robust procedure using the same ideas of cross-validation as Shao but using estimators that are optimal bounded influence for prediction. We demonstrate the effectiveness of our robust procedure in providing protection against outliers both in a simulation study and in a real example. We contrast the results with those obtained by Shao's method, demonstrating a substantial improvement in choosing the correct model in the presence of outliers with little loss of efficiency at the normal model.

137 citations


Journal ArticleDOI
TL;DR: A methodology for fusing multiple instances of biometric data to improve the performance of a personal identity verification system is developed and it is shown that the fusion based on rank order statistic, i.e., the median, is robust to outliers.

133 citations


Journal ArticleDOI
TL;DR: The "pseudo outlier bias" metric is developed using techniques from the robust statistics literature, and it is used to study the error in robust fits caused by distributions modeling various types of discontinuities.
Abstract: When fitting models to data containing multiple structures, such as when fitting surface patches to data taken from a neighborhood that includes a range discontinuity, robust estimators must tolerate both gross outliers and pseudo outliers. Pseudo outliers are outliers to the structure of interest, but inliers to a different structure. They differ from gross outliers because of their coherence. Such data occurs frequently in computer vision problems, including motion estimation, model fitting, and range data analysis. The focus in this paper is the problem of fitting surfaces near discontinuities in range data. To characterize the performance of least median of the squares, least trimmed squares, M-estimators, Hough transforms, RANSAC, and MINPRAN on this type of data, the "pseudo outlier bias" metric is developed using techniques from the robust statistics literature, and it is used to study the error in robust fits caused by distributions modeling various types of discontinuities. The results show each robust estimator to be biased at small, but substantial, discontinuities. They also show the circumstances under which different estimators are most effective. Most importantly, the results imply present estimators should be used with care, and new estimators should be developed.

126 citations


Journal ArticleDOI
TL;DR: In this article, a high-breakdown criterion for linear discriminant analysis is proposed, which is intended to supplement rather than replace the usual sample-moment methodology of discri...
Abstract: The classification rules of linear discriminant analysis are defined by the true mean vectors and the common covariance matrix of the populations from which the data come. Because these true parameters are generally unknown, they are commonly estimated by the sample mean vector and covariance matrix of the data in a training sample randomly drawn from each population. However, these sample statistics are notoriously susceptible to contamination by outliers, a problem compounded by the fact that the outliers may be invisible to conventional diagnostics. High-breakdown estimation is a procedure designed to remove this cause for concern by producing estimates that are immune to serious distortion by a minority of outliers, regardless of their severity. In this article we motivate and develop a high-breakdown criterion for linear discriminant analysis and give an algorithm for its implementation. The procedure is intended to supplement rather than replace the usual sample-moment methodology of discri...

123 citations


Journal ArticleDOI
TL;DR: In this paper, the nonparametric version of the classical mixed model is considered and the common hypotheses of (parametric) main effects and interactions are reformulated in a non-parametric setup.

118 citations


Journal ArticleDOI
TL;DR: In this article, a Bayes invariant optimal multi-decision procedure is provided for detecting at most k (k > 1) such perturbations, which does not depend on the loss function nor on the prior distribution of the shifts under fairly mild assumptions.
Abstract: The problem of determining a normal linear model with possible perturbations, viz. change-points and outliers, is formulated as a problem of testing multiple hypotheses, and a Bayes invariant optimal multi-decision procedure is provided for detecting at most k (k > 1) such perturbations. The asymptotic form of the procedure is a penalized log-likelihood procedure which does not depend on the loss function nor on the prior distribution of the shifts under fairly mild assumptions. The term which penalizes too large a number of changes (or outliers) arises mainly from realistic assumptions about their occurrence. It is different from the term which appears in Akaike‘s or Schwarz‘ criteria, although it is of the same order as the latter. Some concrete numerical examples are analyzed.

Journal Article
TL;DR: This paper discusses model selection from the point of view of robustness and points out the extreme sensitivity of many classical model selection procedures to outliers and other departures from the distributional assumptions of the model.
Abstract: Model selection is a key component in any statistical analysis. In this paper we discuss this issue from the point of view of robustness and we point out the extreme sensitivity of many classical model selection procedures to outliers and other departures from the distributional assumptions of the model. First, we focus on regression and review a robust version of Mallows's Cp as well as some related approaches. We then go beyond the regression model and discuss a robust version of the Akaike Information Criterion for general parametric models.

Journal ArticleDOI
TL;DR: A Bayesian procedure to analyze performance data from production functions that implies the consideration of each parameter of the production function as a different trait and the full set of posterior conditional distributions needed for the Gibbs sampling algorithm can be easily obtained.

Book ChapterDOI
TL;DR: The concept of restricted maximum likelihood estimation (REML), robust REML estimation, and Fellner's algorithmic approach are described in the chapter, which summarizes estimation based on maximising the Gaussian likelihood and discusses estimation basedon maximising a Student t likelihood and other modifications to the GaRussian likelihood.
Abstract: Publisher Summary This chapter discusses various approaches for the robust estimation of mixed models. It summarizes estimation based on maximising the Gaussian likelihood and discusses estimation based on maximising a Student t likelihood and other modifications to the Gaussian likelihood. The concept of restricted maximum likelihood estimation (REML), robust REML estimation, and Fellner's algorithmic approach are described in the chapter. The classical mixed linear model is obtained by assuming that the error components have Gaussian distributions. In this case, estimation is relatively straightforward. However, in practice, any of the observed error components can contain outliers, which result in non-Gaussian distributions for the error components and hence the response. Outlier contamination is often usefully represented by a mixture model in which a Gaussian kernel or core model is mixed with a contaminating distribution. Within this framework, various objectives can be entertained. Depending on the context, following objectives can be considered (1) estimating the parameters of the distributions of the error components, (2) estimating the variances of the distributions of the error components, whatever they happen to be, and (3) estimating the parameters of the core Gaussian distributions.

Journal ArticleDOI
TL;DR: Six data sets recording fetal control mortality in mouse litters are presented and it is shown that beta-binomial model provides a reasonable description but that the fit can be significantly improved by using a mixture of a beta- binomial model with a binomial distribution.
Abstract: SUMMARY Six data sets recording fetal control mortality in mouse litters are presented. The data are clearly overdispersed, and a standard approach would be to describe the data by means of a beta-binomial model or to use quasi-likelihood methods. For five of the examples, we show that the beta-binomial model provides a reasonable description but that the fit can be significantly improved by using a mixture of a beta-binomial model with a binomial distribution. This mixture provides two alternative solutions, in one of which the binomial component indicates a high probability of death but is selected infrequently; this accounts for outlying litters with high mortality. The influence of the outliers on the beta-binomial fits is also demonstrated. The location and nature of the two main maxima to the likelihood are investigated through profile log-likelihoods. Comparisons are made with the performance of finite mixtures of binomial distributions. Data describing fetal deaths in litters are frequently overdispersed relative to the binomial distribution due to differential litter effects. In some cases, sample sizes may be too small for a significant effect of overdispersion to be detected. In others, however, data sets are certainly large enough, and six illustrations of such data sets are given in Tables 1-6. The data of Tables 3-6 have been published before, while those of Tables 1 and 2 have been formed by pooling smaller sets of data that have previously been analyzed by James and Smith (1982). We shall refer to the data sets, in order, as El, E2, HS1, HS2, HS3, AVSS. Different data sets highlight different aspects of the work of this paper. Statisticians analyzing such data might adopt a parametric approach, for example, by fitting a beta-binomial or a correlated-binomial distribution. The more robust approach provided by quasilikelihood is an alternative (see Liang and Hanfelt, 1994). The various options are described in Morgan (1992, Chapter 6) and model details are given in Section 2. Also shIown in Tables 1-6 are the expected values from fitting the beta-binomial distribution. The fits appear to be reasonably good except for the small numbers of litters with high mortality. Because of small cell frequencies, we use simulation to test goodness-of-fit. Taking the El data set as an example, we can simulate from the fitted beta-binomial model a data set that matches El with regard to numbers of litters of the various different sizes observed and that uses as parameter values the maximum-likelihood estimates resulting from fitting the model to the real data. The model can then be fitted to the simulated data and the maximum log-likelihood recorded. This process may then be repeated many times to provide a goodness-of-fit reference for the maximum

Journal ArticleDOI
TL;DR: In this article, the authors examined the effect on the estimated parameters of moving various kinds of intervention along the series, such as seasonal adjustment and detrending of series, and provided insights into the fragility of inferences to specific shocks.

Journal ArticleDOI
TL;DR: In this paper, a robust approach based on an M-estimator is proposed, which provides better results than conventional indirect methods under the presence of outliers, dealing with both uncorrelated and correlated errors.

Book
20 Jun 1997
TL;DR: In this article, the authors discuss the concept of outlier robustness for estimating linear and nonlinear models and propose a method for estimating the error distribution of an estimator in a linear model and a nonlinear model.
Abstract: I: Efficient Inference for Planned Experiments.- 1 Planned Experiments.- 1.1 Deterministic and Random Designs.- 1.2 Linear and Nonlinear Models.- 1.3 Identifiability of Aspects.- 2 Efficiency Concepts for Outlier-Free Observations.- 2.1 Assumptions on the Error Distribution.- 2.2 Optimal Inference for Linear Problems.- 2.3 Efficient Inference for Nonlinear Problems.- II: Robust Inference for Planned Experiments.- 3 Smoothness Concepts of Outlier Robustness.- 3.1 Distributions Modelling Outliers.- 3.2 Smoothness of Estimators and Functionals.- 3.3 Frechet Differentiability of M-Functionals.- 4 Robustness Measures: Bias and Breakdown Points.- 4.1 Asymptotic Bias and Breakdown Points.- 4.2 Bias and Breakdown Points for Finite Samples.- 4.3 Breakdown Points in Linear Models.- 4.4 Breakdown Points for Nonlinear Problems.- 5 Asymptotic Robustness for Shrinking Contamination.- 5.1 Asymptotic Behaviour of Estimators in Shrinking Neighbourhoods.- 5.2 Robust Estimation in Contaminated Linear Models.- 5.3 Robust Estimation of Nonlinear Aspects.- 5.4 Robust Estimation in Contaminated Nonlinear Models.- 6 Robustness of Tests.- 6.1 Bias and Breakdown Points.- 6.2 Asymptotic Robustness for Shrinking Contamination.- III: High Robustness and High Efficiency.- 7 High Robustness and High Efficiency of Estimation.- 7.1 Estimators and Designs with Minimum Asymptotic Bias.- 7.2 Optimal Estimators and Designs for a Bias Bound.- 7.3 Robust and Efficient Estimation of Nonlinear Aspects.- 7.4 Robust and Efficient Estimation in Nonlinear Models.- 8 High Robustness and High Efficiency of Tests.- 8.1 Tests and Designs with Minimum Asymptotic Bias.- 8.2 Optimal Tests and Designs for a Bias Bound.- 9 High Breakdown Point and High Efficiency.- 9.1 Breakdown Point Maximizing Estimators and Designs.- 9.2 Combining High Breakdown Point and High Efficiency.- Outlook.- A.1 Asymptotic Linearity of Frechet Differentiable Functionals.- A.2 Properties of Special Matrices and Functions.- References.- List of Symbols.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian approach is presented for modeling a time series by an autoregressive moving-average model, which is robust to innovation and additive outliers and identifies such outliers.
Abstract: A Bayesian approach is presented for modeling a time series by an autoregressive--moving-average model. The treatment is robust to innovation and additive outliers and identifies such outliers. It enforces stationarity on the autoregressive parameters and invertibility on the moving-average parameters, and takes account of uncertainty about the correct model by averaging the parameter estimates and forecasts of future observations over the set of permissible models. Posterior moments and densities of unknown parameters and observations are obtained by Markov chain Monte Carlo in O(n) operations, where n is the sample size. The methodology is illustrated by applying it to a data set previously analyzed by Martin, Samarov and Vandaele (Robust methods for ARIMA models. Applied Time Series Analysis of Economic Data, ASA--Census--NBER Proceedings of the Conference on Applied Time Series Analysis of Economic Data (ed. A. Zellner), 1983, pp. 153--69) and to a simulated example.

Journal ArticleDOI
TL;DR: In this article, the robust statistics for coordinate transformation with simulated data were studied in view of the robustness of the Baarda-, Pope-, or t-testing procedure based on the least square estimation (LSE) for geodesy.
Abstract: The conventional iterative outlier detection procedures (CIODP), such as the Baarda-, Pope-, or t-testing procedure, based on the least-squares estimation (LSE) are used to detect the outliers in geodesy. Since the finite sample breakdown point (FSBP) of LSE is about 1/n, the FSBPs of the CIODP are also expected to be the same, about 1/n. In this paper, this problem is studied in view of the robust statistics for coordinate transformation with simulated data. Outliers have been examined in two groups: “random” and “jointly influential.” Random outliers are divided again into two subgroups: “random scattered” and “adjacent.” The single point displacements can be thought of as jointly influential outliers. These are modeled as the shifts along either the x- and y-axis or parallel to any given direction. In addition, each group is divided into two subgroups according to the magnitude of outliers: “small” and “large.” The FSBPs of either the Baarda-, Pope-, or t-testing procedure are the same and about 1/n. I...

Journal ArticleDOI
TL;DR: Rocke and Woodruff as discussed by the authors described an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points, which can enable reliable, fast, robust estimation with heavily contaminated multivariate data in high (>20r dimension.

Book ChapterDOI
TL;DR: This chapter describes the behavior of an outlier identification procedure in the “average” and concludes that one-step and inward and outward testing procedures using robust location and scale estimators show a better behavior than their “classical" competitors.
Abstract: Publisher Summary This chapter discusses the general principle of outlier generating models It describes four main types of outlier identification rules—namely, block procedures, inward testing methods, outward testing methods, and one-step identification procedures The chapter describes the behavior of an outlier identification procedure in the “average” It focuses on the average proportion of correctly detected outliers The results for labeled, Ferguson type and α outlier models are highlighted in the chapter Summarizing such results, also based on further simulations with other outlier generating models, and in addition taking into account the findings in Davies and Gather, a conclusion is drawn that states that one-step and inward and outward testing procedures using robust location and scale estimators show a better behavior than their “classical” competitors

Journal Article
TL;DR: This study uses Monte Carlo stimulation to determine, in the absence of case mix variation, if random variation noise could obscure the signal of differences in underlying rates of quality of care problems.
Abstract: This study examines the relationship between outlier status based on adjusted mortality rates and theoretical underlying quality of care in hospitals. We use Monte Carlo stimulation to determine, in the absence of case mix variation, if random variation noise could obscure the signal of differences in underlying rates of quality of care problems. Classification of hospitals as "outliers" is done compared with "true" hospital quality, based on underlying rates for quality of care problems in mortality cases. Predictive error rates with respect to "quality" for both "outlier" and "non-outlier" hospitals are substantial under a variety of patient load and cutoff point choices for determining outlier status. Using overall death rates as an indicator of underlying quality of care problems may lead to substantial predictive error rates, even when adjustment for case mix is excellent. Outlier status should only be used as a screening tool and not as the information provided to the public to make informed choices about hospitals.

Journal ArticleDOI
TL;DR: In this article, the authors developed Bayesian estimators for the location parameter of a family of symmetric location-scale distributions, which represents a very wide class of Cauchy to normal distributions.
Abstract: Motivated by the attractive features of robust priors and the MML estimators, we develop Bayesian estimators for the location parameter of a family which represents a very wide class of symmetric location-scale distributions ranging from Cauchy to normal distributions. We show that the new estimators are clearly superior to those obtained earlier by other authors. The proposed method can also be extended to asymmetric location-scale distributions. That will form Part II of this work.

Journal ArticleDOI
TL;DR: A simple modified likelihood ratio test is proposed that overcomes the difficulties in the current problem of testing an outlier from a multivariate mixture distribution of several populations.
Abstract: The problem of testing an outlier from a multivariate mixture distribution of several populations has many important applications in practice. One particular example is in monitoring worldwide nuclear testing, where we wish to detect whether an observed seismic event is possibly a nuclear explosion (an outlier) by comparing it with the training samples from mining blasts and earthquakes. The combined population of seismic events from mining blasts and earthquakes can be viewed as a mixture distribution. The classical likelihood ratio test appears to not be applicable in our problem, and in spite of the importance of this problem, little progress has been made in the literature. This article proposes a simple modified likelihood ratio test that overcomes the difficulties in the current problem. Bootstrap techniques are used to approximate the distribution of the test statistic. The advantages of the new test are demonstrated via simulation studies. Some new computational findings are also reported.

Journal ArticleDOI
01 Oct 1997-Tellus A
TL;DR: In this article, it is shown analytically that the choice of weighting metric, used in defining the anomaly correlation between spatial maps, can change the resulting probability distribution of the correlation coefficient.
Abstract: he skill in predicting spatially varying weather/climate maps depends on the definition of the measure of similarity between the maps. Under the justifiable approximation that the anomaly maps are distributed multinormally, it is shown analytically that the choice of weighting metric, used in defining the anomaly correlation between spatial maps, can change the resulting probability distribution of the correlation coefficient. The estimate of the numbers of degrees of freedom based on the variance of the correlation distribution can vary from unity up to the number of grid points depending on the choice of weighting metric. The (pseudo-) inverse of the sample covariance matrix acts as a special choice for the metric in that it gives a correlation distribution which has minimal kurtosis and maximum dimension. Minimal kurtosis suggests that the average predictive skill might be improved due to the rarer occurrence of troublesome outlier patterns far from the mean state. Maximum dimension has a disadvantage for analogue prediction schemes in that it gives the minimum number of analogue states. This metric also has an advantage in that it allows one to powerfully test the null hypothesis of multinormality by examining the second and third moments of the correlation coefficient which were introduced by Mardia as invariant measures of multivariate kurtosis and skewness. For these reasons, it is suggested that this metric could be usefully employed in the prediction of weather/climate and in fingerprinting anthropogenic climate change. The ideas are illustrated using the bivariate example of the observed monthly mean sea-level pressures at Darwin and Tahitifrom 1866–1995. DOI: 10.1034/j.1600-0870.1997.t01-4-00001.x

Journal ArticleDOI
TL;DR: In this article, the problem of bad data identification is investigated both as a post estimation problem and as an outlier rejection problem when using the least absolute value estimation method, and modifications to the existing bad data processing methods are proposed in order to account for the current magnitude measurements.
Abstract: Earlier papers have shown that the use of power system line current magnitude measurements may lead to nonuniquely observable systems. This paper studies the bad data identification problem under these conditions. The definition of measurement criticality is revised in order to account for the nonuniquely observable cases. The problem of bad data identification is investigated both as a post estimation problem when using the least squares estimation method and as an outlier rejection problem when using the least absolute value estimation method. Modifications to the existing bad data processing methods are proposed in order to account for the current magnitude measurements.

Book ChapterDOI
01 Jan 1997
TL;DR: In this paper, Rousseeuw et al. introduced the breakdown value, which is defined as the largest percentage of ill-fitting data that a method can cope with in a general and asymptotic setting.
Abstract: Following seminal papers by Box (1953) and Tukey (1960), which demonstrated the need for robust statistical procedures, the theory of robust statistics blossomed in the 1960s and 1970s. An early milestone was Huber’s (1964) paper which introduced univariate M-estimators and the minimax asymptotic variance criterion. Hampel (1974) proposed the influence function of an estimator as a way to describe the effect of a single outlier. In order to measure the effect of several outliers, he introduced the breakdown value (Hampel, 1971) in a general and asymptotic setting. Donoho and Huber (1983) advocated a finite-sample version of the breakdown value, in line with Hodges’s (1967) study in the univariate framework. Heuristically, the breakdown point is the largest percentage of ill-fitting data that a method can cope with. For a formal definition, see equation (2.1) of the reprinted Rousseeuw (1984).

Proceedings ArticleDOI
17 Jun 1997
TL;DR: A method for calculating optic flow, using robust statistics, is developed that generally out-performs all competing methods in terms of accuracy and is applicable in a wide range of other computer vision problems.
Abstract: A method for calculating optic flow, using robust statistics, is developed. The method generally out-performs all competing methods in terms of accuracy. One of the key features in the success of this method, is that we use least median of squares, which is known to be robust to outliers. The computational cost is kept very low by using an approximate solution to the least median of squares only in a first stage that detects outliers. The essential ingredients of our method should be applicable in a wide range of other computer vision problems.

Journal ArticleDOI
TL;DR: This work considers a 'forward' procedure in which very robust methods are used to select a small, outlier free, subset of the data, which is increased in size using a search which avoids the inclusion of outliers.
Abstract: Outliers can have a large influence on the model fitted to data. The models we consider are the transformation of data to approximate normality and also discriminant analysis, perhaps on transformed observations. If there are only one or a few outliers, they may often be detected by the deletion methods associated with regression diagnostics. These can be thought of as 'backwards' methods, as they start from a model fitted to all the data. However such methods become cumbersome, and may fail, in the presence of multiple outliers. We instead consider a 'forward' procedure in which very robust methods, such as least median of squares, are used to select a small, outlier free, subset of the data. This subset is increased in size using a search which avoids the inclusion of outliers. During the forward search we monitor quantities of interest, such as score statistics for transformation or, in discriminant analysis, misclassification probabilities. Examples demonstrate how the method very clearly reveals structure in the data and finds influential observations, which appear towards the end of the search. In our examples these influential observations can readily be related to patterns in the original data, perhaps after transformation.