scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A priori ratemaking using bivariate Poisson regression models

01 Feb 2009-Insurance Mathematics & Economics (Xarxa de Referència en Economia Aplicada (XREAP))-Vol. 44, Iss: 1, pp 135-141
TL;DR: In this article, the authors examined an a priori ratemaking procedure when including two different types of claim, and the consequences for pure and loaded premiums when the independence assumption is relaxed by using a bivariate Poisson regression model are analyzed.
Abstract: In automobile insurance, it is useful to achieve a priori ratemaking by resorting to generalized linear models, and here the Poisson regression model constitutes the most widely accepted basis. However, insurance companies distinguish between claims with or without bodily injuries, or claims with full or partial liability of the insured driver. This paper examines an a priori ratemaking procedure when including two different types of claim. When assuming independence between claim types, the premium can be obtained by summing the premiums for each type of guarantee and is dependent on the rating factors chosen. If the independence assumption is relaxed, then it is unclear as to how the tariff system might be affected. In order to answer this question, bivariate Poisson regression models, suitable for paired count data exhibiting correlation, are introduced. It is shown that the usual independence assumption is unrealistic here. These models are applied to an automobile insurance claims database containing 80,994 contracts belonging to a Spanish insurance company. Finally, the consequences for pure and loaded premiums when the independence assumption is relaxed by using a bivariate Poisson regression model are analysed.

Summary (2 min read)

1 Introduction

  • Designing a tariff structure for insurance is one of the main tasks for actuaries.
  • When this assumption is relaxed, it is interesting to see how the tariff system might be affected.
  • In the next section, the model used here is defined.

2 Bivariate Poisson regression models

  • The usual methodology to obtain the a priori premium under the assumption of independence between types of claims can be described as follows.
  • This principle builds on the net premium by including a risk loading that is proportional to the variance of the risk.
  • This is the so-called trivariate reduction method that leads to the bivariate Poisson distribution.
  • Here the authors follow the zero-inflated bivariate Poisson model proposed by Karlis and Ntzoufras (2005).
  • Standard errors for the parameter estimates are calculated using standard bootstrap methods (boot package in R).

3 The database

  • The original sample comprised a ten percent sample of the automobile portfolio of a major insurance company operating in Spain in 1995.
  • Only cars categorised as being for private use were considered.
  • The sample is not representative of the actual portfolio as it was drawn from a larger panel of policyholders who had been customers of the company for at least seven years; however, it will be helpful for illustrative purposes.
  • The meaning of those variables referring to the policyholders’ coverage should also be clarified.
  • The simplest policy only includes third-party liability (claimed and counted as N1 type) and a set of basic guarantees such as emergency roadside assistance, legal assistance or insurance covering medical costs (claimed and counted as N2 type).

4.1 Fitting bivariate Poisson models

  • First, parameters related to the type of coverage (v10 and v11 ) were always significant and their presence increased the expected number o claims markedly.
  • In order to model the covariance term (λ3 ), the covariates were introduced in the bivariate Poisson model with the result that only the parameter for v10 was significant.
  • A profile with a mean lying very close to this average was chosen for the third profile.
  • In Table 7, it can be observed that the zero-inflated bivariate models did not present any noticeable differences with the non zero-inflated models in terms of the mean scores, but they were present in the case of the variance.

5 Conclusions

  • This paper has tested the independence assumption between claim types given a set of known risk factors and it has shown that independence should be rejected.
  • The interpretation of a number of bivariate Poisson models has been illustrated in the context of automobile insurance claims and the conclusion is that using a bivariate Poisson model leads to an a priori ratemaking that presents larger variances and, hence, larger loadings than those obtained under the independence assumption.
  • For the five models analysed here there seems to be a relationship between the goodness of fit and the level of overdispersion considered in each model.
  • In short, the main finding is that the independence assumption that is implicitely used when pricing automobile insurance by adding the pure premium for each guarantee (which are obtained using count data regression models) is insufficient because correlations (conditional on the covariates) are ignored.
  • 3In Frees and Valdez (2008) a hierarchical model allows to capture possible dependencies of claims among the various types through a t-copula specification.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A priori ratemaking using bivariate Poisson regression models
Llu´ıs Berm´udez i Morata
Departament de Matem`atica Econ`omica, Financera i Actuarial.
Risc en Finances i Assegurances-IREA. Universitat de Barcelona.
September 22, 2008
Abstract
In automobile insurance, it is useful to achieve a priori ratemaking by resorting to gene-
ralized linear models, and here the Poisson regression model constitutes the most widely
accepted basis. However, insurance companies distinguish between claims with or without
bodily injuries, or claims with full or partial liability of the insured driver. This paper exa-
mines an a priori ratemaking procedure when including two different types of claim. When
assuming independence between claim types, the premium can be obtained by summing the
premiums for each type of guarantee and is dependent on the rating factors chosen. If the
independence assumption is relaxed, then it is unclear as to how the tariff system might
be affected. In order to answer this question, bivariate Poisson regression models, suitable
for paired count data exhibiting correlation, are introduced. It is shown that the usual
independence assumption is unrealistic here. These models are applied to an automobile
insurance claims database containing 80,994 contracts belonging to a Spanish insurance
company. Finally, the consequences for pure and loaded premiums when the independence
assumption is relaxed by using a bivariate Poisson regression model are analysed.
JEL classification: C51; IM classification: IM11; IB classification: IB40.
Keywords: Bivariate Poisson regression models, Zero-inflated models, Automobile insurance,
Bootstrap methods, A priori ratemaking.
Acknowledgements. The author wishes to acknowledge the support of the Spanish Ministry of Education and
FEDER grant SEJ 2007-63298. The author is grateful for the valuable suggestions from the participants in the
12th International Congress on Insurance: Mathematics and Economics in Dalian on July 16-18, 2008.
Corresponding Author. Departament de Matem`atica Econ`omica, Financera i Actuarial, Universitat
de Barcelona, Diagonal 690, 08034-Barcelona, Spain. Tel.: +34-93-4034853; fax: +34-93-4034892; e-mail:
lbermudez@ub.edu
1

1 Introduction
Designing a tariff structure for insurance is one of the main tasks for actuaries. Such pricing
is particularly complex in the branch of automobile insurance because of highly heterogeneous
portfolios. A thorough review of ratemaking systems for automobile insurance, including the
most recent developments, can be found in Denuit et al. (2007).
One way to handle this problem of heterogeneity in a portfolio -referred to as tariff segmenta-
tion or a priori ratemaking- involves segmenting the portfolio in homogenous classes so that all
insured parties belonging to a particular class pay the same premium. This procedure ensures
that the exact weight of each risk is fairly distributed within the portfolio. In the case of auto-
mobile insurance, in order to group the policies in homogenous classes, a series of classification
variables are used (i.e., age, sex and place of residence of driver or horsepower, class and use of
the vehicle). These variables are called a priori ratemaking variables, since their values can be
determined before the insured party begins to drive.
If all the factors influencing a risk could be identified, measured and introduced in the tariff
system, then the classes defined would be homogenous. However, this is not that case as there
are important risk factors that are not considered in the a priori tariff. Some examples are
especially difficult to quantify, such as a driver’s reflexes, his or her aggressiveness, or knowledge
of the Highway Code, among others. As a result, tariff classes can be quite heterogeneous.
Hence, the idea has arisen of considering individual differences in policies within the same class
by using an a posteriori mechanism, i.e., fitting an individual premium based on the experience
of claims for each insured party. This concept has received the name of a posteriori tariff,
experience rating or the bonus-malus system.
Here, only the first step in pricing is studied, the a priori ratemaking. In short, the classi-
fication or segmentation of risks involves establishing different classes of risk according to their
nature and probability of occurrence. For this purpose, factors are determined in order to classify
each risk, and it is statistically tested that the probability of a claim depends on these factors,
and hence, their influence can be measured. A priori classification based on generalized linear
models is the most widely accepted method; see e.g. Dionne and Vanasse (1989), Haberman
2

and Renshaw (1996), Pinquet (1999), Berm´udez et al. (2001) and Boucher and Denuit (2006)
for applications in the actuarial sciences, and Mc Cullagh and Nelder (1989) or Dobson (1990)
for a general overview of the statistical theory.
The most commonly used generalized linear model for this tariff system is the Poisson re-
gression model and its generalizations (Denuit et al., 2007). Introduced by Dionne and Vanasse
(1989) in the context of automobile insurance, the model can be applied if the number of claims
for each individual policy observation is known. Although it is possible to use the total number
of claims as the response variable, the nature of automobile insurance policies (covering diffe-
rent risks) is such that the response variable is the number of claims for each type of guarantee.
Therefore, a premium is obtained for each class of guarantee as a function of different factors.
Then, assuming independence between types of claim, the total premium is obtained from the
sum of the expected number of claims of each guarantee.
Here, two different types of guarantee are assumed: third-party liability automobile insurance
and the rest of guarantees. Following the usual methodology, assuming independence between
types, the premium paid by the policyholder is obtained by summing the premiums for each
type of guarantee and this depends on the rating factors. However, the question remains as
to whether the independence assumption is realistic. When this assumption is relaxed, it is
interesting to see how the tariff system might be affected.
In this study, a bivariate Poisson regression model is introduced. Holgate (1964) provided a
practical basis for the bivariate Poisson distribution but its use has been largely ignored, mainly
because of computational difficulties. Therefore, only a few applications can be found, for
example, Jung and Winkelmann (1993) used a bivariate Poisson regression in a labour mobility
study and Karlis and Ntzoufras (2003) modelled sports data. For a comprehensive review of the
bivariate Poisson distribution and its applications (especially multivariate regression), the reader
should see Kocherlakota and Kocherlakota (1992, 2001) and Johnson, Kotz and Balakrishnan
(1997).
One early application of the bivariate Poisson distribution in the actuarial literature is des-
cribed in Cummins and Wiltbank (1983). In ruin theory, some applications of this distribution
are also to be found, for example Partrat (1994), Ambagaspitiya (1999), Walhin and Paris (2000)
and Centeno (2005). Cameron and Trivedi (1998) studied the relationship between type of health
insurance and various responses that measure the demand for health care by using a bivariate
3

Poisson regression. In addition, two studies related to fitting purposes should also be quoted,
albeit that no factors are considered. First, Vernic (1997) carried out a comparative study
with the bivariate Poisson distribution based on data related to natural events insurance and
third-party liability automobile insurance. Second, Walhin (2003) compared bivariate Hofmann
and bivariate Poisson distributions by fitting a data set for accidents sustained by members of
a sample of 122 shunters in two consecutive 2-year periods. However, in a ratemaking context,
bivariate Poisson regression models have not been used to model claim counts that depend on
the usual rating factors.
In the next section, the model used here is defined. This model is based on the bivariate
Poisson regression model, which is appropriate for modelling paired count data that exhibit
correlation. In Section 3 the database obtained from a Spanish insurance company is described.
In Section 4 the results are summarised. Finally, some concluding remarks are given in Section
5.
2 Bivariate Poisson regression models
Let N
1
and N
2
be the number of claims for third-party liability and for the rest of guarantees
respectively and N = N
1
+N
2
. The usual methodology to obtain the a priori premium under the
assumption of independence between types of claims can be described as follows. First, the model
assumed is N
1
P oisson(λ
1
) and N
2
P oisson(λ
2
) independently, and λ
1
and λ
2
depend on
a number of rating factors associated with the characteristics of the car, the driver and the use of
the car. Second, with λ
1
and λ
2
estimated for each policyholder and following the net premium
principle, the total net premium
1
( π ) is obtained as π = E[N] = E[N
1
] + E[N
2
] = λ
1
+ λ
2
.
However, an amount inflates the net premium to ensure that the insurer will not, on average,
lose money. Many well-known premium principles can be applied for this purpose. Here the
variance premium principle is used. This principle builds on the net premium by including a
risk loading that is proportional to the variance of the risk. Under the above assumptions,
the variance is equal to the expected value, and the total loaded premium ( π
) is equal to
π
= E[N ] + αV [N] = (1 + α)(E[N
1
] + E[N
2
]) .
In bivariate Poisson regression models, the independence assumption is relaxed. The model
1
Assuming the amount of the expected claim equals one monetary unit.
4

can be defined as follows. Let us consider independent random variables X
i
(i = 1, 2, 3) to
be distributed as Poisson with parameters λ
i
respectively. Then the random variables N
1
=
X
1
+ X
3
and N
2
= X
2
+ X
3
follow jointly a bivariate Poisson distribution:
(N
1
, N
2
) BP (λ
1
, λ
2
, λ
3
).
This is the so-called trivariate reduction method that leads to the bivariate Poisson distribution.
Its joint probability function is given by:
P (N
1
= n
1
, N
2
= n
2
) = e
(λ
1
+λ
2
+λ
3
)
λ
n
1
1
n
1
!
λ
n
2
2
n
2
!
min(n
1
,n
2
)
X
i=0
n
1
i

n
2
i
i!
λ
3
λ
1
λ
2
i
. (1)
The bivariate Poisson distribution defined above presents several interesting and useful pro-
perties. First, it allows for positive dependence between the random variables N
1
and N
2
which
is what we expect for these types of claims
2
. Moreover Cov(N
1
, N
2
) = λ
3
and therefore λ
3
is
a measure of this dependence. Obviously, if λ
3
= 0 the two random variables are independent
and the bivariate Poisson distribution reduces to the product of two independent Poisson dis-
tributions, referred to as a double Poisson distribution (Kocherlakota and Kocherlakota, 1992).
Second, the marginal distributions for N
1
and N
2
are Poisson with E[N
1
] = λ
1
+ λ
3
and
E[N
2
] = λ
2
+ λ
3
.
Hence, the total net premium can be obtained with π = E[N] = E[N
1
] + E[N
2
] = λ
1
+ λ
2
+
2λ
3
. The variance necessary to obtain the loaded premium is now V [N ] = λ
1
+ λ
2
+ 4λ
3
. Since
λ
3
is expected to be positive, the relaxation of the independence assumption leads to a variance
greater than the expected value. Overdispersion has often been observed when modelling claim
counts in automobile insurance data (Denuit et al., 2007).
Let us assume that N
1j
and N
2j
denote the random variables indicating the number of
claims of each type of guarantee for the jth policyholder. If covariates are introduced to model
λ
1
, λ
2
and λ
3
, a bivariate Poisson regression model can be defined with the following scheme:
(N
1j
, N
2j
) BP (λ
1j
, λ
2j
, λ
3j
),
log(λ
1j
) = x
0
1j
β
1
,
log(λ
2j
) = x
0
2j
β
2
,
log(λ
3j
) = x
0
3j
β
3
, (2)
2
In case of negatively correlated claims (not considered here) it would be necessary a more general specification.
5

Citations
More filters
Journal ArticleDOI
TL;DR: In this article, two general classes of discrete bivariate distributions are developed, and general formulas for the joint distributions belonging to the classes are derived, which are very general in the sense that new families of distributions can be generated just by specifying the baseline seed distributions.
Abstract: In this article, we develop two general classes of discrete bivariate distributions. We derive general formulas for the joint distributions belonging to the classes. The obtained formulas for the joint distributions are very general in the sense that new families of distributions can be generated just by specifying the “baseline seed distributions.” The dependence structures of the bivariate distributions belonging to the proposed classes, along with basic statistical properties, are also discussed. New families of discrete bivariate distributions are generated from the classes. Furthermore, to assess the usefulness of the proposed classes, two discrete bivariate distributions generated from the classes are applied to analyze a real dataset and the results are compared with those obtained from conventional models.

23 citations

Journal ArticleDOI
TL;DR: This paper aims to supplement the standard actuarial approach by combining two guarantees and the policyholders from the household, which allows to refine the prediction on the claim frequencies and account for the common shocks on multiple guarantees.
Abstract: Actuarial risk classification is usually performed at a guarantee and policyholder level: for each policyholder, the claim frequencies corresponding to each guarantee are modelled in isolation, without accounting for the correlation between the different guarantees and the different policyholders from the same household. However, sometimes, a common event will trigger both guarantees at the same time. Moreover, the claim frequencies for policyholders from the same household appear to be correlated. This paper aims to supplement the standard actuarial approach by combining two guarantees and the policyholders from the household, which allows to refine the prediction on the claim frequencies and account for the common shocks on multiple guarantees. Some possible cross-selling opportunities can also be identified.

19 citations

Journal ArticleDOI
TL;DR: In this paper, a bivariate INAR(1) regression model is adapted to the ratemaking problem of pricing an automobile insurance contract with two types of coverage, taking into account both the correlation between claims from different coverage types and the serial correlation between the observations of the same policyholder observed over time.
Abstract: For purposes of ratemaking, time dependence and cross dependence have been treated as separate entities in the actuarial literature. Indeed, to date, little attention has been paid to the possibility of considering the two together. To discuss the effect of the simultaneous inclusion of different dependence assumptions in ratemaking models, a bivariate INAR(1) regression model is adapted to the ratemaking problem of pricing an automobile insurance contract with two types of coverage, taking into account both the correlation between claims from different coverage types and the serial correlation between the observations of the same policyholder observed over time. A numerical application using an automobile insurance claims database is conducted and the main finding is that the improvement obtained with a BINAR(1) regression model, compared to the outcomes of the simplest models, is marked, implying that we need to consider both time and cross correlations to fit the data at hand. In addition, the BINAR(1) specification shows a third source of dependence to be significant, namely, cross-time dependence.

18 citations

Posted ContentDOI
TL;DR: In this article, the authors proposed multivariate versions of the continuous Lindley mixture of Poisson distributions considered by Sankaran (1970), which can be used for modelling multivariate dependent count data when marginal overdispersion is present.
Abstract: This paper proposes multivariate versions of the continuous Lindley mixture of Poisson distributions considered by Sankaran (1970). This new class of distributions can be used for modelling multivariate dependent count data when marginal overdispersion is present. After discussing some of its properties, a general multivariate model with Poisson-Lindley marginals and with a flexible covariance structure is proposed. Several specific models as well as one that allows correlations of any sign are considered, and then some estimation methods are discussed. Finally, some illustrative examples are given for fitting and demonstrating the usefulness of these bivariate distributions.

17 citations

Journal ArticleDOI
TL;DR: In this article, a multivariate Poisson mixture, with random effects correlated using a hierarchical structure, is proposed to accommodate for the dependence that may exist between unobserved risk factors across Home and Motor insurance and between policyholders from the same household.
Abstract: Actuarial ratemaking is usually performed at product and guarantee level, meaning that each product and guarantee is considered in isolation. Moreover, independence between policyholders is generally assumed. In this paper, we propose a multivariate Poisson mixture, with random effects correlated using a hierarchical structure, to accommodate for the dependence that may exist between unobserved risk factors across Home and Motor insurance and between policyholders from the same household. The hierarchical structure accounts for the fact that Home insurance covers the whole household, whereas Motor insurance policies are subscribed by specific policyholders within the household. The model allows to periodically correct the a priori expected claim frequencies using the reported number of claims in any of the considered products. Applications show that the impact of the number of claims reported in Motor insurance on the number of claims expected in Home insurance is larger than the other way around. Moreover, an out-of-sample analysis validates an improved predictive power. Also, the model allows to identify more rapidly the riskiest households.

15 citations

References
More filters
Book
01 Jan 1983
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,215 citations

Book
28 Sep 1998
TL;DR: The authors combine theory and practice to make sophisticated methods of analysis accessible to researchers and practitioners working with widely different types of data and software in areas such as applied statistics, econometrics, marketing, operations research, actuarial studies, demography, biostatistics and quantitative social sciences.
Abstract: Students in both social and natural sciences often seek regression methods to explain the frequency of events, such as visits to a doctor, auto accidents, or new patents awarded. This book, now in its second edition, provides the most comprehensive and up-to-date account of models and methods to interpret such data. The authors combine theory and practice to make sophisticated methods of analysis accessible to researchers and practitioners working with widely different types of data and software in areas such as applied statistics, econometrics, marketing, operations research, actuarial studies, demography, biostatistics and quantitative social sciences. The new material includes new theoretical topics, an updated and expanded treatment of cross-section models, coverage of bootstrap-based and simulation-based inference, expanded treatment of time series, multivariate and panel data, expanded treatment of endogenous regressors, coverage of quantile count regression, and a new chapter on Bayesian methods.

4,849 citations

Book
01 Jan 1990
TL;DR: In this paper, the authors propose a method of maximum likelihood estimation method of least squares estimation for generalized linear models for simple linear regression with Poisson responses GLIM, which is based on the MINITAB program.
Abstract: Part 1 Background scope notation distributions derived from normal distribution. Part 2 Model fitting: plant growth sample birthweight sample notation for linear models exercises. Part 3 Exponential family of distributions and generalized linear models: exponential family of distributions generalized linear models. Part 4 Estimation: method of maximum likelihood method of least squares estimation for generalized linear models example of simple linear regression for Poisson responses MINITAB program for simple linear regression with Poisson responses GLIM. Part 5 Inference: sampling introduction for scores sampling distribution for maximum likelihood estimators confidence intervals for the model parameters adequacy of a model sampling distribution for the log-likelihood statistic log-likelihood ratio statistic (deviance) assessing goodness of fit hypothesis testing residuals. Part 6 Multiple regression: maximum likelihood estimation least squares estimation log-likelihood ratio statistic multiple correlation coefficient and R numerical example residual plots orthogonality collinearity model selection non-linear regression. Part 7 Analysis of variance and covariance: basic results one-factor ANOVA two-factor ANOVA with replication crossed and nested factors more complicated models choice of constraint equations and dummy variables analysis of covariance. Part 8 Binary variables and logistic regression: probability distributions generalized linear models dose response models general logistic regression maximum likelihood estimation and the log-likelihood ratio statistic other criteria for goodness of fit least squares methods remarks. Part 9 Contingency tables and log-linear models: probability distributions log-linear models maximum likelihood estimation hypothesis testing and goodness of fit numerical examples remarks. Appendices: conventional parametrizations with sum-to-zero constraints corner-point parametrizations three response variables two response variables and one explanatory variable one response variable and two explanatory variables.

2,737 citations

Book
01 Jan 1997
TL;DR: In this article, the authors present a concise review of developments on discrete multivariate distributions and present some basic definitions and notations, and present several important discrete multiivariate distributions with significant properties and characteristics.
Abstract: In this article, we present a concise review of developments on discrete multivariate distributions. We first present some basic definitions and notations. Then, we present several important discrete multivariate distributions and list their significant properties and characteristics. Keywords: generating function; moments; stirling numbers; regression; inflated distributions; truncated forms; compound distributions; multinomial; negative multinomial; multivariate poisson; multivariate hypergeometric; multivariate Polya–Eggenberger; multivariate discrete exponential; multivariate power series; multivariate hermite; multivariate occupancy; multivariate weighted; dirichlet; multivariate run-related distributions

862 citations

Journal ArticleDOI
TL;DR: In this paper, a bivariate Poisson model and its extensions are proposed to model the number of goals of two competing teams in a football game, which is a plausible assumption in sports with two opposing teams competing against each other.
Abstract: Summary. Models based on the bivariate Poisson distribution are used for modelling sports data. Independent Poisson distributions are usually adopted to model the number of goals of two competing teams. We replace the independence assumption by considering a bivariate Poisson model and its extensions. The models proposed allow for correlation between the two scores, which is a plausible assumption in sports with two opposing teams competing against each other. The effect of introducing even slight correlation is discussed. Using just a bivariate Poisson distribution can improve model fit and prediction of the number of draws in football games. The model is extended by considering an inflation factor for diagonal terms in the bivariate joint distribution. This inflation improves in precision the estimation of draws and, at the same time, allows for overdispersed, relative to the simple Poisson distribution, marginal distributions. The properties of the models proposed as well as interpretation and estimation procedures are provided. An illustration of the models is presented by using data sets from football and water-polo.

412 citations