scispace - formally typeset
Open AccessJournal ArticleDOI

Kernel density estimation for heavy-tailed distributions using the champernowne transformation

TLDR
In this article, a unified approach to the estimation of loss distributions is presented, which involves determining the threshold level between large and small losses, and then estimating the density of the transformed data by use of the classical kernel density estimator.
Abstract
When estimating loss distributions in insurance, large and small losses are usually split because it is difficult to find a simple parametric model that fits all claim sizes. This approach involves determining the threshold level between large and small losses. In this article, a unified approach to the estimation of loss distributions is presented. We propose an estimator obtained by transforming the data set with a modification of the Champernowne cdf and then estimating the density of the transformed data by use of the classical kernel density estimator. We investigate the asymptotic bias and variance of the proposed estimator. In a simulation study, the proposed method shows a good performance. We also present two applications dealing with claims costs in insurance.

read more

Content maybe subject to copyright    Report

Statistics, Vol. 39, No. 6, December 2005, 503–518
Kernel density estimation for heavy-tailed distributions using
the champernowne transformation
TINE BUCH-LARSEN†, JENS PERCH NIELSEN‡, MONTSERRAT GUILLÉN*§ and
CATALINA BOLANCɶ
†Department of Research, Codan, 60 Gammel Kongevej, DK-1790 Copenhagen V, Denmark
‡Royal & SunAlliance, 60 Gammel Kongevej, DK-1790 Copenhagen V, Denmark
§Department of Econometrics, RFA-IREA, University of Barcelona, Diagonal,
690, 08034 Barcelona, Spain
¶Department of Econometrics, RFA-IREA, University of Barcelona, 08034 Barcelona, Spain
(Received 17 January 2005; in final form 17 October 2005)
When estimating loss distributions in insurance, large and small losses are usually split because it is
difficult to find a simple parametric model that fits all claim sizes. This approach involves determining
the threshold level between large and small losses. In this article, a unified approach to the estimation
of loss distributions is presented. We propose an estimator obtained by transforming the data set with
a modification of the Champernowne cdf and then estimating the density of the transformed data by
use of the classical kernel density estimator. We investigate the asymptotic bias and variance of the
proposed estimator. In a simulation study, the proposed method shows a good performance. We also
present two applications dealing with claims costs in insurance.
Keywords: Actuarial loss models; Transformation; Skewness; Champernowne distribution
2000 Mathematics Subject Classifications: 62G07; 62-07; 91B30
1. Introduction
In finance and non-life insurance, estimation of loss distributions is a fundamental part
of the business. In most situations, losses are small, and extreme losses are rarely observed,
but the number and the size of extreme losses can have a substantial influence on the profit of
the company. Standard statistical methodology, such as integrated error and likelihood, does
not weigh small and big losses differently in the evaluation of an estimator. These evaluation
methods do not, therefore, emphasize an important part of the error: the error in the tail.
Practitioners often decide to analyse large and small losses separately, because no single,
classical parametric model fits all claim sizes. This approach leaves some important chal-
lenges: choosing the appropriate parametric model, identifying the best way of estimating the
parameters and determining the threshold level between large and small losses.
*Corresponding author. Email: mguillen@ub.edu
Statistics
ISSN 0233-1888 print/ISSN 1029-4910 online © 2005 Taylor & Francis
http://www.tandf.co.uk/journals
DOI: 10.1080/02331880500439782

504 T. Buch-Larsen et al.
This work presents a systematic approach to the estimation of loss distributions which is
suitable for heavy tailed situations. The proprosed estimator is obtained by transforming the
data set with a parametric estimator and afterwards estimating the density of the transformed
data set using the classical kernel density estimator [1, 2]
ˆ
f(y) =
1
Nb
N
i=1
K
y Y
i
b
,
where K is the kernel function, b is the bandwidth and Y
i
, i =
{
1,...,N
}
is the transformed
data set. The estimator of the original density is obtained by back-transformation of
ˆ
f(y).
We will call this method a semiparametric estimation procedure because a parametrized
transformation family is used. We propose to use a transformation based on the little-known
Champernowne cdf, because it produces good results in all the studied situations and it is
straightforward to apply.
The semiparametric estimator with shifted power transformation was introduced by Wand
et al. [3] in 1991. They showed that the classical kernel density estimator was improved
substantially by applying a transformation and suggested the shifted power transformation
family. Bolancé et al. [4] improved the shifted power transformation for highly skewed data
by proposing an alternative parameter selection algorithm. The semiparametric estimator with
the Johnson family transformation function was studied by Yang and Marron [5]. Hjort and
Glad [6] advocated a semiparametric estimator with a parametric start, which is closely related
to the bias reduction method described by Jones et al. [7]. The Möbius-like transformation
was introduced by Clements et al. [8]. In contrast to the shifted power transformation, which
transforms (0, ) into (−∞, ), the Möbius-like transformation transforms (0, ) into
(1, 1) and the parameter estimation method is designed to avoid boundary problems. Scaillet
[9] has recently studied non-parametric estimators for probability density functions which have
support on the non-negative real line using alternative kernels.
The original Champernowne distribution has density [10]
f(x) =
c
x
(
(1/2)(x/M)
α
+ λ + (1/2)(x/M)
α
)
x 0, (1)
where c is a normalizing constant and α, λ and M are parameters. The distribution was men-
tioned for the first time in 1936 by D.G. Champernowne when he spoke on ‘The Theory of
Income Distribution’at the Oxford Meeting of the Econometric Society [11, 12] in 1936. Later,
he gave more details about the distribution [13], and its application to economics. When λ
equals 1 and the normalizing constant c equals (1/2, the density of the original distribution
is simply called the Champernowne distribution
f(x) =
αM
α
x
α1
(
x
α
+ M
α
)
2
with cdf
F(x) =
x
α
x
α
+ M
α
. (2)
The Champernowne distribution converges to a Pareto distribution in the tail, while looking
more like a lognormal distribution near 0 when α>1. Its density is either 0 or infinity at 0
(unless α = 1).
In the transformation kernel density estimation method, if we transform the data with the
Champernowne cdf, the inflexible shape near 0 results in boundary problems. We argue that a
modification of the Champernowne with an additional parameter can solve this inconvenience.

Kernel density estimation 505
We did not choose to work with classical extensions of the Pareto distribution such as the
generalized Pareto distribution [14], GPD. The reason for doing so is that the GPD often
estimates distributions of infinite support to have finite support and hence it cannot be used as
a transformation. We carried out a small simulation study of a standard lognormal distribution;
more than half the time the GPD suggested a distribution with finite support. Furthermore,
the GPD needs a (hard to pick) threshold from where the distribution starts; such that the
transformation methodology meets problems also in the beginning of the distribution.
In this paper, we study the transformation kernel density estimation method. The conclu-
sion of the simulation study is that the new approach based on the modified Champernowne
distribution is the preferable method, because it is the only estimator which has a good per-
formance in most of the investigated situations. Section 2 describes the transformation family
and explains the parameter estimation procedure. Section 3 presents the semiparametric kernel
density estimator and its properties. In section 4, the simulation study is presented and section
5 shows two applications. Finally, section 6 outlines the main conclusions.
2. The modified Champernowne distribution function
We generalize the Champernowne distribution with a new parameter c. This parameter ensures
the possibility of a positive finite value of the density at 0 for all α.
D
EFINITION 2.1 The modified Champernowne cdf is defined for x 0 and has the form
T
α,M,c
(x) =
(x + c)
α
c
α
(x + c)
α
+ (M + c)
α
2c
α
x R
+
(3)
with parameters α>0,M>0 and c 0 and density
t
α,M,c
(x) =
α(x + c)
α1
((M + c)
α
c
α
)
((x + c)
α
+ (M + c)
α
2c
α
)
2
x R
+
.
Corresponding to the Champernowne distribution, the modified Champernowne distribution
converges to a Pareto distribution in the tail:
t
α,M,c
(x)
α
(
(M + c)
α
c
α
)
1
α
x
α+1
as x −→ .
The effect of the additional parameter c is different for α>1 and for α<1. The parameter
c has some ‘scale parameter properties’: when α<1, the derivative of the cdf becomes larger
for increasing c, and conversely, when α>1, the derivative of the cdf becomes smaller for
increasing c. When α = 1, the choice of c affects the density in three ways. First, c changes
the density in the tail. When α<1, positive cs result in lighter tails, and the opposite when
α>1. Secondly, c changes the density in 0. A positive c provides a positive finite density
in 0:
0 <t
α,M,c
(0) =
αc
α1
(M + c)
α
c
α
< when c>0.
Thirdly, c moves the mode. When α>1, the density has a mode, and positive cs shift the
mode to the left. We therefore see that the parameter c also has a shift parameter effect. When
α = 1, the choice of c has no effect.
Figure 1 illustrates the role of c: the two graphs on the top show the cdfs and the densities
for the modified Champernowne distribution for fixed α<1 and M = 3. In the cdf plot, we

506 T. Buch-Larsen et al.
Figure 1. Different shapes of the modified Champernowne distribution with different choices of α, as well as the
effect of the parameter c. In all plots c = 0, dashed line and c = 2, solid line.
see that increasing c results in lower values of the cdf in the interval [0,M)and higher values
of the cdf in the interval [M, ). In the density plot, we see that increasing c results in a
lighter tail and a finite density at 0. In the two graphs in the middle, we have fixed α = 1 and
M = 3. We see that changing c has no effect. The two graphs at the bottom illustrate the effect
of increasing c when α>1, for M = 3. Notice that the values of the cdf become higher in
the interval [0,M)and lower in the interval [M,). The density plot shows that positive cs
move the mode to the left and produce a heavier tail.
From a computational point of view, it is simpler to estimate M and then proceed to the
other parameters.
In the Champernowne distribution, we notice that T
α,M,0
(M) = 0.5. The same holds for the
modified Champernowne distribution:T
α,M,c
(M) = 0.5.Thissuggests that M canbeestimated
as the empirical median of the data set. The empirical median is a robust estimator, especially
for heavy-tailed distributions, as shown by Lehmann [15]. He studied the properties of the
median and the mean as an estimator of location for the normal distribution and the Cauchy

Kernel density estimation 507
distribution, and showed that whereas the mean works well as an estimator of location for the
normal distribution, it works poorly for the Cauchy distribution due to its heavy tail. Tukey [16]
reached the same conclusion when he studied the efficiency of the median and the mean. He
showedthat the median efficiency increases as the tail becomes heavier. Corresponding models
have also been studied for heavy-tailed distributions [17–19]. A similar type of discussion for
the variance estimation was done by Hubert [20]. As we are especially concerned about heavy
tails, we consider the robustness of the median to be important.
After parameter M has been estimated as described earlier, the next step is to estimate the
pair
(
α, c
)
which maximizes the log likelihood function:
l
(
α, c
)
= N log α + N log
(
(M + c)
α
c
α
)
+ 1)
N
i=1
log(X
i
+ c)
2
N
i=1
log
(
(X
i
+ c)
α
+ (M + c)
α
2c
α
)
. (4)
For a xed M, this likelihood function is concave and has a maximum.
3. The semiparametric transformation kernel density estimator
In this section, we will make a detailed derivation of the estimator based on the modified
Champernowne distribution, which we will call KMCE. The resulting estimator is obtained
by computing a non-parametric classical kernel density estimator for the transformed data set
and, finally, the result is back-transformed.
3.1 Transformation with the modified Champernowne distributions
Let X
i
, i = 1,...,N,be positive stochastic variables with an unknown cdf F and density f .
The following describes in detail the transformation kernel density estimator of f , and figure 2
illustrates the four steps of the estimation procedure for a data set with 1000 observations
generated from a Weibull distribution. The resulting transformation kernel density estimator
of f based on the Champernowne distribution is denoted by KMCE.
(i) Calculate the parameters
α,
M,
c
of the modified Champernowne distribution as
described in section 2 to obtain the transformation function. In the first plot in figure 2,
we see the estimated transformation function and the true Weibull distribution. Notice
that the modified Chapernowne density has a larger mode and that the tail is too heavy.
(ii) Transform the data set X
i
, i = 1,...,N,with the transformation function, T :
Y
i
= T
α,
M,
c
(X
i
), i = 1,...,N.
The transformation function transforms data into the interval
(
0, 1
)
, and the parameter
estimation is designed to make the transformed data as close to a uniform distribution as
possible. The transformed data are illustrated in the second plot in figure 2.

Citations
More filters
Journal ArticleDOI

Composite Lognormal-Pareto model with random threshold

TL;DR: In this article, the authors consider the composite Lognormal-Pareto model proposed by Cooray & Ananda (2005) and suitably modified by Scollnik (2007), which allows for heterogeneity with respect to the threshold and let it vary among observations.
Journal ArticleDOI

Skewed bivariate models and nonparametric estimation for the CTE risk measure

TL;DR: In this article, the use of the conditional tail expectation (CTE) risk measure on a set of bivariate real data consisting of two types of auto insurance claim costs is illustrated.
Proceedings ArticleDOI

Modeling Extreme Events in Time Series Prediction

TL;DR: This paper takes inspirations from the Extreme Value Theory, developing a new form of loss called Extreme Value Loss (EVL) for detecting the future occurrence of extreme events, and proposes to employ Memory Network in order to memorize extreme events in historical records.
Journal ArticleDOI

A UK best-practice approach for extreme sea-level analysis along complex topographic coastlines

TL;DR: In this article, the Skew Surge Joint Probability Method (SJPM) was used to estimate extreme sea-level probabilities at high spatial resolution along the UK coastline, with application to the UK coastlines.
Journal ArticleDOI

Parameter recovery for the Leaky Competing Accumulator model

TL;DR: This paper finds that because there is no closed-form solution to the likelihood function of the LCA model, and there are strong trade-offs between accumulation rate, leakage, and inhibition, it is extremely difficult to faithfully recover the parameters ofThe Leaky Competitive Accumulator model.
References
More filters
BookDOI

Density estimation for statistics and data analysis

TL;DR: The Kernel Method for Multivariate Data: Three Important Methods and Density Estimation in Action.
Book

Continuous univariate distributions

TL;DR: Continuous Distributions (General) Normal Distributions Lognormal Distributions Inverse Gaussian (Wald) Distributions Cauchy Distribution Gamma Distributions Chi-Square Distributions Including Chi and Rayleigh Exponential Distributions Pareto Distributions Weibull Distributions Abbreviations Indexes
Book

Theory of point estimation

TL;DR: In this paper, the authors present an approach for estimating the average risk of a risk-optimal risk maximization algorithm for a set of risk-maximization objectives, including maximalaxity and admissibility.
Journal ArticleDOI

A reliable data-based bandwidth selection method for kernel density estimation

TL;DR: The key to the success of the current procedure is the reintroduction of a non- stochastic term which was previously omitted together with use of the bandwidth to reduce bias in estimation without inflating variance.
Frequently Asked Questions (7)
Q1. What are the contributions mentioned in the paper "Kernel density estimation for heavy-tailed distributions using the champernowne transformation" ?

In this article, a unified approach to the estimation of loss distributions is presented. The authors propose an estimator obtained by transforming the data set with a modification of the Champernowne cdf and then estimating the density of the transformed data by use of the classical kernel density estimator. The authors investigate the asymptotic bias and variance of the proposed estimator. In a simulation study, the proposed method shows a good performance. The authors also present two applications dealing with claims costs in insurance. 

Scaillet [9] has recently studied non-parametric estimators for probability density functions which have support on the non-negative real line using alternative kernels. 

The bad performance of the CHL estimator is due to the fact that the transformation functions in this case always starts at 0 when α > 

The authors have also seen that the KMCE estimator overestimates the tail, which is because the transformation function has a heavy Pareto tail. 

Bolancé et al. [4] improved the shifted power transformation for highly skewed data by proposing an alternative parameter selection algorithm. 

The mean and variance of the classical kernel density estimator isE [̂g(y)] = g(y) + 1 2 b2µ2(K)g ′′(y) + o(b2), (6)V [̂g(y)] = 1 NbR(K)g(y) + o ( 1Nb) . (7)The transformation kernel density estimator can be expressed by the standard kernel density estimator:f̂ (x) = T ′(x)ĝ(T (x))implyingE [ f̂ (x) ] = T ′(x)E [̂g(T (x))] = T ′(x) ( g(T (x)) + 12 b2µ2(K)∂2g(T (x))∂(T (x))2 + o(b2)) . 

Standard statistical methodology, such as integrated error and likelihood, does not weigh small and big losses differently in the evaluation of an estimator.