scispace - formally typeset
Open AccessJournal ArticleDOI

Comprehensive definitions of breakdown points for independent and dependent observations

Reads0
Chats0
TLDR
In this paper, the authors provide a new definition of breakdown in finite samples, with an extension to asymptotic breakdown, and illustrate their suggestion by using examples from linear and non-linear regression, and time series.
Abstract
Summary. We provide a new definition of breakdown in finite samples, with an extension to asymptotic breakdown. Previous definitions centre on defining a critical region for either the parameter or the objective function. If for a particular outlier configuration the critical region is entered, breakdown is said to occur. In contrast with the traditional approach, we leave the definition of the critical region implicit. Our proposal encompasses previous definitions of breakdown in linear and non-linear regression settings. In some cases, it leads to a different and more intuitive notion of breakdown than other procedures that are available. An important advantage of our new definition is that it also applies to models for dependent observations where current definitions of breakdown typically fail. We illustrate our suggestion by using examples from linear and non-linear regression, and time series.

read more

Content maybe subject to copyright    Report

TI 2000-40/2
Tinbergen Institute Discussion Paper
Comprehensive Definitions of
Breakdown-Points for
Independent and Dependent
Observations
Marc G. Genton
André Lucas

Tinbergen Institute
The Tinbergen Institute is the institute for economic research of the
Erasmus Universiteit Rotterdam, Universiteit van Amsterdam and
Vrije Universiteit Amsterdam.
Tinbergen Institute Amsterdam
Keizersgracht 482
1017 EG Amsterdam
The Netherlands
Tel.: +31.(0)20.5513500
Fax: +31.(0)20.5513555
Tinbergen Institute Rotterdam
Burg. Oudlaan 50
3062 PA Rotterdam
The Netherlands
Tel.: +31.(0)10.4088900
Fax: +31.(0)10.4089031
Most TI discussion papers can be downloaded at
http://www.tinbergen.nl

Comprehensive Denitions of
Breakdown-Points for
Indep endent and Dep endent Observations
Marc G. Genton and Andre Lucas
May 3, 2000
Abstract
We provide a new denition of breakdown in nite samples with
an extension to asymptotic breakdown. Previous denitions center
around dening a critical region for either the parameter or the ob-
jective function. If for a particular outlier constellation the critical
region is entered, breakdown is said to o ccur. In contrast to the tradi-
tional approach, we leave the denition of the critical region implicit.
Our denition encompasses all previous denitions of breakdown in
b oth linear and non-linear regression settings. In some cases, it leads
to a dierent notion of breakdown than other pro cedures available.
An advantage is that our new denition also applies to mo dels for
dep endent observations (time-series, spatial statistics) where current
breakdown denitions typically fail. We illustrate our p oints using
examples from linear and non-linear regression as well as time-series
and spatial statistics.
Key words:
Bias curve; Linear regression; Non-linear regression;
Outliers; Spatial statistics; Statistical robustness; Time series.
Marc G. Genton is Lecturer, Department of Mathematics, 2-390, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139-4307, gen-
ton@math.mit.edu. Andre Lucas is Associate-Professor, Department of Finance,
ECO/FIN, Vrije Universiteit, De Boelelaan 1105, 1081HV Amsterdam, the Netherlands,
alucas@econ.vu.nl. Andre Lucas thanks the Dutch Organization for Scientic Research
(N.W.O.) for nancial supp ort.
1

1 Intro duction
The issue of qualitative robustness and esp ecially the denition of breakdown
has made considerable progress over the last three decades. Hamp el (1971)
dened breakdown as the fraction of contamination (or outliers) that suces
to drive the estimator beyond all b ounds. Since the original intro duction
of the concepts of breakdown and the breakdown-point by Hamp el (1971),
the breakdown-p oint has b een extended to nite samples (Donoho and Hu-
b er, 1983), b ounded parameter spaces, dep endent observations (Martin and
De Jong, 1977; Martin, 1980), test statistics (He et al., 1990; He, 1991),
and non-linear regression mo dels (Stromberg and Rupp ert, 1992; Sakata and
White, 1995, 1998). Esp ecially Stromberg and Rupp ert (1992) and Sakata
and White (1995) convincingly argue that the bias in the parameter esti-
mates is not always a good criterion to assess breakdown of an estimator.
Instead, Stromberg and Rupp ert propose to consider the fraction of con-
tamination that drives at least one of the tted values to its supremum or
inmum. Sakata and White argue that the tted value may sometimes not
be a satisfactory criterion either, and therefore prop ose several alternative
criterion functions to assess breakdown.
Though these alternative denitions cover a wide range of mo dels and
estimators, one can easily construct examples that are not covered by the
available denitions. A very simple example is given by the autoregressive
time-series mo del of order 1,
Y
i
=
Y
i
1
+
e
i
;
(1)
with
2
(
1
;
1) and
e
i
an i.i.d. innovation. Supp ose
Y
i
is observed with error
as
~
Y
i
=
Y
i
+
Z
i
, where
Z
i
=
when
i
=
i
0
for a single
i
0
2f
1
;:::;n
1
g
, and
Z
i
=0 otherwise. Then the OLS estimator of
based on the contaminated
sample
~
Y
1
;:::;
~
Y
n
, is given by
^
=
P
n
i
=2
~
Y
i
~
Y
i
1
P
n
i
=2
~
Y
2
i
1
=
(
Y
i
0
1
+
Y
i
0
+1
)+
P
n
i
=2
Y
i
Y
i
1
2
+2
Y
i
0
+
P
n
i
=2
Y
2
i
1
:
(2)
Clearly, as
!1
,
^
!
0. So the OLS estimator in this simple time-series
mo del breaks with one outlier to zero, which is at the center of the parameter
space. This form of breakdown typically rules out the classical denition of
Hamp el, b ecause the estimator does not diverge. Moreover, it also violates
the straightforward extension of Hamp el's denition to compact parameter
spaces. In that denition, breakdown o ccurs if the estimator is pushed to the
edge of the parameter space. Here, however, the estimator do es not go to the
edge, but rather to the center of the parameter space. Also note that this
2

simple example do es not t the more recent denitions of breakdown either.
In particular, following the denition of He and Simpson (1992, 1993), break-
down o ccurs if the supremum bias is reached. This, however, need not b e the
case if
is negative or p ositive, in which case the sup bias is reached upon
breakdown to plus one or minus one instead of zero, respectively. Alterna-
tively, Stromb erg and Rupp ert and also Sakata and White dene breakdown
as the p oint where the mo del's t (
^
Y
i
1
) or some other criterion function
tends to either its supremum or its inmum for some observation in the sam-
ple. Clearly, this would again induce breakdown to either plus or minus one
given the restricted parameter space, and
not
breakdown to zero.
Given the drawbacks of the previous denitions available, we introduce
a new concept of breakdown. All previous denitions make explicit use of
a criterion function combined with a critical region. For example, Hamp el's
original denition uses the absolute bias as the criterion function and inn-
ity as the critical region. If the criterion function enters the critical region
for a certain fraction of outliers/contamination, breakdown is said to have
o ccurred. Following Sakata and White (1995), we consider a sp ecic model
badness measure as our criterion function. This encompasses the denitions
of Hamp el (badness is bias) as well as Stromb erg and Ruppert (badness is
mo del t). In contrast to previous work, however, we leave the denition of
the critical region implicit. In particular, we lo ok for the fraction of contam-
ination such that the set of p ossible badness values under extreme outlier
congurations do es not expand any more if additional outliers are added. In
this way, we are able to accomo date most of the earlier denitions of break-
down. In addition, we also cover situations of breakdown that are not covered
by the earlier denitions. We illustrate the main issues with examples from
linear and non-linear regression as well as time-series and spatial statistics.
In some cases, our denition of breakdown gives a dierent breakdown
p oint than available denitions. We provide a typical example in the non-
linear regression context, confronting our breakdown p oint with that of Stromb erg
and Rupp ert. The new notion of breakdown checks whether the non-contaminated
sample information still has some inuence on the estimator. If this is no
longer the case, the estimator is said to have broken down. This may happ en
even in case the mo del's t over a pre-specied domain of interest remains
b ounded.
The remainder of the pap er is set up as follows. In Section 2 we in-
tro duce the basic notation and our new denition of breakdown for nite
samples. The denition is related to alternative ones in Section 3. Some il-
lustrative examples are given in Section 4. Section 5 extends the denition of
the breakdown-point to the asymptotic case and provides some illustrations.
Section 6 concludes.
3

Citations
More filters
Journal ArticleDOI

Robust Estimation in Signal Processing: A Tutorial-Style Treatment of Fundamental Concepts

TL;DR: The treatment concerns statistical robustness, which deals with deviations from the distributional assumptions, and addresses single and multichannel estimation problems as well as linear univariate regression for independently and identically distributed (i.i.d.) data.
Journal ArticleDOI

Robust Likelihood Methods Based on the Skew-t and Related Distributions

TL;DR: In this paper, the robustness problem is tackled by adopting a parametric class of distributions flexible enough to match the behaviour of the observed data, and the skew-t distribution is explored in more detail and reasons to adopt this option as a sensible general-purpose compromise between robustness and simplicity, both of treatment and interpretation of the outcome.
Journal ArticleDOI

Estimators of Fractal Dimension: Assessing the Roughness of Time Series and Spatial Data

TL;DR: In this paper, the authors review and assess estimators of fractal dimension by their large sample behavior under infill asymptotics, in extensive finite sample simulation studies, and in a data example on arctic sea-ice profiles.
Book

Robust Statistics for Signal Processing

TL;DR: K-distribution, 103, 121–123 M-estimator, 1, 5, 10–12, 19, 21, 23, 27, 46, 48, 53, 54, 61–63, 67, 100, 110–113, 117, 118, 120, 121, 140, 141, 157, 162, 167, 183, 197, 206, 214, 229, 249, 260 ARMA model, 196 Huber’s
Journal ArticleDOI

Robust estimation for ARMA models

TL;DR: In this article, a new class of robust estimates for ARMA models is introduced, which are M-estimates, but the residuals are computed so the effect of one outlier is limited to the period where it occurs.
References
More filters
Book

Robust Regression and Outlier Detection

TL;DR: This paper presents the results of a two-year study of the statistical treatment of outliers in the context of one-Dimensional Location and its applications to discrete-time reinforcement learning.
Book

Robust statistics: the approach based on influence functions

TL;DR: This paper presents a meta-modelling framework for estimating the values of Covariance Matrices and Multivariate Location using one-Dimensional and Multidimensional Estimators.
Journal ArticleDOI

A General Qualitative Definition of Robustness

TL;DR: In this paper, two very closely related definitions of robustness of a sequence of estimators are given which take into account the types of deviations from parametric models that occur in practice.
Journal ArticleDOI

High breakdown point conditional dispersion estimation with application to s&p 500 daily returns volatility

Shinichi Sakata, +1 more
- 01 May 1998 - 
TL;DR: The authors showed that quasi-maximum likelihood (QML) estimators for conditional dispersion models can be severely affected by a small number of outliers such as market crashes and rallies.
Journal ArticleDOI

Highly Robust Estimation of the Autocovariance Function

TL;DR: Genton et al. as discussed by the authors proposed a new autocovariance estimator, based on a highly robust estimator of scale, and its robustness properties are studied by means of the influence function, and a new concept of temporal breakdown point.
Frequently Asked Questions (9)
Q1. What have the authors contributed in "Comprehensive definitions of breakdown-points for independent and dependent observations" ?

The authors provide a new de nition of breakdown in nite samples with an extension to asymptotic breakdown. 

The main point of Stromberg and Ruppert (1992) to discuss this model is that if outliers are such that the estimator for K diverges while that for remains constant, the estimator is broken in the Donoho-Huber sense. 

For a simulated data set, the authors contaminate the 3 observations most to the right by moving them in parallel to the ray X. Using (16), the authors are looking for a (or ) such that the squared vertical discrepancies between the observations and the pictured line segments are minimal. 

if the authors take the badness function to be the bias, the only way to get a constant boundary set is to let the estimator diverge to plus or minus in nity. 

Using their de nition of breakdown, it is clear that the breakdown-point of the (highly robust) LMS estimator in a time-series context is far below 0.5, and even far below 0.5/(p+ 1) with p the order of the autoregression. 

De nition 1 The breakdown-point of the estimator ̂ of is given by" lim !0 minm 1nlim !1 Rn( Y n ;Z n;m) \\lim !1 Rn( Y n ;Z n;m+1) 6= ; 8 Y n :The de nition looks for the smallest fraction of extreme outliers for which the boundary of the set of possible badness values does not expand any morein all directions if an additional outlier is added to the sample. 

So with m outliers, the boundary badness set for extreme outliers and given X 2 [0; 3] is given by either f X; ̂mXg or f Xg, where ̂m can still vary for increasing m. 

Now consider a highly robust variogram estimator ̂HR(h; Yn) = S 2(Yi+h Yi), (e.g. Genton, 1998a), where S2 is a highly robust estimator or the variance of the process Yi+h Yi. Typically, S2 has breakdown-point b(n h)=2 1c=(n h), where b c denotes the integer part. 

The breakdown-point "(̂; Y ; Z ) of the estimator ̂ at the (uncontaminated) process Y for the set of allowable outlier con gurations Z , isgiven by"(̂; Y ; Z ) = inf9 > 0 : lim !1 R( Y ;Z ) \\lim !1 R( Y ;Z + )