scispace - formally typeset

Journal ArticleDOI

On the existence of the maximum likelihood estimates in Poisson regression

01 May 2010-Economics Letters (Elsevier)-Vol. 107, Iss: 2, pp 310-312

Abstract: We note that the existence of the maximum likelihood estimates in Poisson regression depends on the data configuration, and propose a strategy to identify the existence of the problem and to single out the regressors causing it.

Content maybe subject to copyright    Report

CEP Discussion Paper No 932
May 2009
On the Existence of the Maximum Likelihood Estimates
for Poisson Regression
J. M. C. Santos-Silva and Silvana Tenreyro

Abstract
We note that the existence of the maximum likelihood estimates for Poisson regression
depends on the data configuration. Because standard software does not check for this
problem, the practitioner may be surprised to find that in some applications estimation of the
Poisson regression is unusually difficult or even impossible. More seriously, the estimation
algorithm may lead to spurious maximum likelihood estimates. We identify the signs of the
non-existence of the maximum likelihood estimates and propose a simple empirical strategy
to single out the regressors causing this type of identification failure.
Keywords: Poisson estimation, gravity equation
JEL Classifications: C13; C50; F10
This paper was produced as part of the Centre’s Globalisation Programme. The Centre for
Economic Performance is financed by the Economic and Social Research Council.
Acknowledgements
We are grateful to Ines Buono, Virginia Di Nino, Dave Donaldson, Doireann Fitzgerald,
Lissandra Flach and Randi Hjalmarsson for posing the problem and for providing examples
of data sets where it occurs. Special thanks are due to Markus Baldauf for help with the code
used to bypass the problem. Santos Silva also gratefully acknowledges partial financial
support from Fundação para a Ciência e Tecnologia (FEDER/POCI 2010).
João M. C. Santos-Silva is Professor of Economics at the University of Essex. Silvana
Tenreyro is an Associate at the Centre for Economic Performance and Reader in the
Economics Department, London School of Economics.
Published by
Centre for Economic Performance
London School of Economics and Political Science
Houghton Street
London WC2A 2AE
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system or transmitted in any form or by any means without the prior permission in writing of
the publisher nor be issued to the public or circulated in any form other than that in which it
is published.
Requests for permission to reproduce any article or part of the Working Paper should be sent
to the editor at the above address.
© J. M. C. Santos-Silva and S. Tenreyro, submitted 2009
ISBN 978-0-85328-385-0

1. INTRODUC TION
The Poisson regression model is dened b y
Pr (y
i
= j|x
i
)=
exp (λ) λ
j
j!
,j=0, 1, 2,...
where λ is generally specied as λ =exp(x
0
i
β)=exp(β
0
+ β
1
x
1i
+ ...).
1
With this
form ulation, β, the vecto r of parameters of in terest, can be estimated my maxim izing the
log-likelihood function given by
ln L (β)=
n
X
i=1
[ exp (x
0
i
β)+(x
0
i
β) y
i
ln (y
i
!)] . (1)
Poisson regression is not only the most widely used model for coun t data (see Winkel-
mann, 2008, and Cameron and Trivedi, 1997), but it is also becoming increasingly popular
to estimate multiplicative models for other kinds of data (see, among others, Mannin g and
Mullahy, 2001, and Santos Silva and Tenreyro, 2006).
The reasons that make this estimator popular can be clearly understood b y inspecting
the corresponding score vector and Hessia n matr ix, giv en respectiv ely by
s (β)=
n
X
i=1
[y
i
exp(x
0
i
β)] x
i
, (2)
and
H (β)=
n
X
i=1
exp(x
0
i
β)x
i
x
0
i
.
The form of the score v ector mak es clear that β will be consistently estim ated as long
as E (y
i
|x
i
)=exp(x
0
i
β), i.e., the only condition required for consisten cy is the correct
specication of the conditional mean. This is the well known pseudo-maximum likelihood
result of Gourieroux, Monfort and Trognon (1984).
Besides this r obu stness property, the estimator also has the advantage of being very
w ell behaved. Indeed, it is easy to see that the Hessian is negative denite for a ll x and β,
which facilitates the estimation and ensures the uniqu en ess of the maximum, if it exists.
1
See Winkelmann (2008) and Cameron and Trivedi (1997) for further details and background on the
Poisson regression model and its properties.
2

Conse que ntly, estimation of β is relatively simple and generally the estimation algorithm
con verges in a handful of iterations, even for relativ ely large problems.
In spite of this general result, for certain data congurations, som e of the param eters in β
are not identied by the (pseu do) ma ximum likelihood estimator described abo ve. That is,
for certain data congurations, the maximum lik elihood estimates do not exist. Because
this ty pe of identication failure has not been recognized as a problem in count data
models, standard software does not c heck for its presence and therefore the practitioner
ma y be surprised to nd that estim ation of the Poisson regression is u nusually dicult,
even in som e ap parently simple prob lem s. The next section pro vid es deta ils on wh en th is
problem arises and on how it can be detected.
2. THE PROBLEM
To better see the nature of the problem, it is useful to start by considering the case where
a regressor, sa y x
i2
,iszerowheny
i
is positive, otherw ise being non-n eg ative with at least
one positive observation . The leading exam p le of a regressor with these c h aracte ristic s is
adummyvariablethatisequaltozeroforallobservationswithpositivey
i
,havingsome
positive values for y
i
=0.Fromequation(2),therst order condition for a maximum of
(1) corresponding to the parameter associated with x
i2
canbewrittenas
s (β
2
)=
X
x
2i
>0
exp(x
0
i
ˆ
β)x
2i
=0,
which can never be satised. Therefore, when regressors suc h as x
2
are presen t, the
(pseudo ) max imum likelihood estimat e of β does not exist.
More generally, this problem can occur whenever t wo regressors are perfectly collinear
for the sub-sample with positiv e observatio ns of y
i
.
2
To see this, write (2) as
s (β)=
X
y
i
>0
[y
i
exp(x
0
i
β)] x
i
X
y
i
=0
exp(x
0
i
β)x
i
,
2
Notice that the problem identied here is very dierent from the one resulting from perfect collinearity
bet ween regressors. Perfect collinearity leads to the existence on an innite n umber of solutions to the
likelihood equations, whereas here we are concerned withthesituationwherethelikelihoodequations
have no solution.
3

and notice that the rst order conditions for a maximum corresponding to β
0
, β
1
and β
2
imp ly
X
y
i
>0
h
y
i
exp(x
0
i
ˆ
β)
i
=
X
y
i
=0
exp(x
0
i
ˆ
β), (3a)
X
y
i
>0
h
y
i
exp(x
0
i
ˆ
β)
i
x
1i
=
X
y
i
=0
exp(x
0
i
ˆ
β)x
1i
, (3b)
X
y
i
>0
h
y
i
exp(x
0
i
ˆ
β)
i
x
2i
=
X
y
i
=0
exp(x
0
i
ˆ
β)x
2i
, (3c)
where
ˆ
β denotes the maxim um lik elihood estimates of β.
Suppose no w that x
1
and x
2
are perfectly collinear for the sub-sample with positive
observation s of y
i
. In particular, let x
2i
= α
0
+ α
1
x
1i
for y
i
> 0. Then, writing x
2i
as a
function of x
1i
on the left hand side of (3c) and using equalities (3a) and (3b), it is possible
to obtain
α
0
n
X
y
i
=0
exp(x
0
i
ˆ
β)+α
1
n
X
y
i
=0
exp(x
0
i
ˆ
β)x
1i
=
n
X
y
i
=0
exp(x
0
i
ˆ
β)x
2i
. (4)
Whether or not (4) has a solution depends on the values of α
0
and α
1
,andonthe
ranges of x
1
and x
2
for the observations with y
i
=0. For instance, in the illustrativ e
examp le presen ted before, α = β =0and for (4) to ha ve a solution it is necessary, but
not sucien t, that x
2
has positive and negati ve value s for y
i
=0. Heuris tic ally, (4) will
ha v e a solution when there is a reasonable ov erlap between the ranges of x
2i
for y
i
=0and
y
i
> 0. H ow ever, it is not possible to provide a sharp criterion determining the existence
of a
ˆ
β that solv es (4). Therefore, the existence of this sort of identication problem has
to be investigated on a case-by -case basis.
Of course, Newton-type algorithm s used to maximize the likelihood function ma y ac h ieve
conver gence even when (3) has no solution, leading to spurious maximum likelihood esti-
mates, say b. It is easy to see that for b to provide an approximate solution for (4) it has
to be suc h that exp(x
0
i
b) isclosetozerofortheobservationswithy
i
=0. Therefore, these
spurious solutions can be easily ide ntied because they are characterized by a “perfect”
t for the observations with y
i
=0.
4

Citations
More filters

Book ChapterDOI
01 Jan 1998

885 citations


Journal ArticleDOI
Oeindrila Dube1, Juan F. Vargas2Institutions (2)
Abstract: How do income shocks aect armed con‡ict? Theory suggests two op- posite eects. If labor is used to appropriate resources violently, higher wages may lower con‡ict by reducing labor supplied to appropriation. This is the opportunity cost eect. Alternatively, a rise in contestable income may increase violence by raising gains from appropriation. This is the rapacity eect. Our paper exploits exogenous price shocks in inter- national commodity markets and a rich dataset on civil war in Colombia to assess how dierent income shocks aect con‡ict. We examine changes in the price of agricultural goods (which are labor intensive) and natural resources (which are capital intensive). We focus on coee and oil, the two largest exports. We …nd that a sharp fall in coee prices in the 1990s increased violence dierentially in regions growing more coee, by lower- ing wages and the opportunity cost of joining armed groups. In contrast, a rise in oil prices increased violence dierentially in the oil region, by in- creasing municipal revenue siphoned through rapacity. This pattern holds in several other agricultural and natural resource sectors, providing robust evidence that price shocks aect con‡ict in opposite directions depending on the factor intensity of the commodity.

796 citations


Cites background from "On the existence of the maximum lik..."

  • ...33See also Santos Silva and Tenreyro (2010) for a related discussion on Poisson regressions, and Mullahy (1997), Windmeijer and Santos Silva (1997), and Santos Silva and Tenreyro (2006) for related estimation approaches....

    [...]


Journal ArticleDOI
Thibault Fally1Institutions (1)
Abstract: The gravity equation for trade flows is one of the most successful empirical models in economics and has long played a central role in the trade literature (Anderson, 2011) Different approaches to estimate the gravity equation, ie reduced-form or more structural, have been proposed This paper examines the role of adding-up constraints as the key difference between structural gravity with “multilateral resistance” indexes and reduced-form gravity with simple fixed effects by exporter and importer In particular, estimating gravity equations using the Poisson pseudo-maximum-likelihood estimator (Poisson PML) with fixed effects automatically satisfies these constraints and is consistent with the introduction of “multilateral resistance” indexes as in Anderson and van Wincoop (2003)

298 citations


Journal ArticleDOI
Fabian Waldinger1Institutions (1)
Abstract: This paper analyses peer effects among university scientists. Specifically, it investigates whether the quality and the number of peers affect the productivity of researchers in physics, chemistry, and mathematics. The usual endogeneity problems related to estimating peer effects are addressed by using the dismissal of scientists by the Nazi government in 1933 as a source of exogenous variation in the peer group of scientists staying in Germany. To investigate localized peer effects, I construct a new panel data set covering the universe of scientists at the German universities from 1925 to 1938 from historical sources. I find no evidence for peer effects at the local level. Even very high-quality scientists do not affect the productivity of their local peers.

152 citations


Journal ArticleDOI
Peter Egger1, Peter Egger2, Mario Larch3Institutions (3)
Abstract: The so-called Europe Agreements had been enacted in the 1990s to initiate the integration of goods markets between the 15 EU incumbent economies as of 1995 and 10 potential entrants located in Central and Eastern Europe. This paper evaluates the trade, GDP, and welfare effects of these agreements by means of structural analysis of a bilateral trade flow model. The results support three conclusions. First, the agreements exerted significant positive effects on goods trade between the EU15 incumbents and the CEEC and, at the same time, they induced trade redirection from other countries. Second, EU15 GDP responded by an increase of much less than 1% while that in the 10 CEEC increased by several percent in response to the agreements. Third, the effects on welfare were moderate in the EU15 but amounted to more double-digit percentage changes in the involved CEEC.

114 citations


References
More filters

Book
A. Colin Cameron1, Pravin K. Trivedi2Institutions (2)
28 Sep 1998
TL;DR: The authors combine theory and practice to make sophisticated methods of analysis accessible to researchers and practitioners working with widely different types of data and software in areas such as applied statistics, econometrics, marketing, operations research, actuarial studies, demography, biostatistics and quantitative social sciences.
Abstract: Students in both social and natural sciences often seek regression methods to explain the frequency of events, such as visits to a doctor, auto accidents, or new patents awarded. This book, now in its second edition, provides the most comprehensive and up-to-date account of models and methods to interpret such data. The authors combine theory and practice to make sophisticated methods of analysis accessible to researchers and practitioners working with widely different types of data and software in areas such as applied statistics, econometrics, marketing, operations research, actuarial studies, demography, biostatistics and quantitative social sciences. The new material includes new theoretical topics, an updated and expanded treatment of cross-section models, coverage of bootstrap-based and simulation-based inference, expanded treatment of time series, multivariate and panel data, expanded treatment of endogenous regressors, coverage of quantile count regression, and a new chapter on Bayesian methods.

4,589 citations


Journal ArticleDOI
Abstract: Although economists have long been aware of Jensen's inequality, many econometric applications have neglected an important implication of it: the standard practice of interpreting the parameters of log-linearized models estimated by ordinary least squares as elasticities can be highly misleading in the presence of heteroskedasticity. This paper explains why this problem arises and proposes an appropriate estimator. Our criticism to conventional practices and the solution we propose extends to a broad range of economic applications where the equation under study is log-linearized. We develop the argument using one particular illustration, the gravity equation for trade, and apply the proposed technique to provide new estimates of this equation. We find significant differences between estimates obtained with the proposed estimator and those obtained with the traditional method. These discrepancies persist even when the gravity equation takes into account multilateral resistance terms or fixed effects

3,731 citations


"On the existence of the maximum lik..." refers background in this paper

  • ...…is not only the most widely used model for count data (see Winkelmann, 2008; Cameron and Trivedi, 1998), but it is also becoming increasingly popular to estimate multiplicative models for other kinds of data (see, among others, Manning and Mullahy, 2001, and Santos Silva and Tenreyro, 2006)....

    [...]

  • ...2 See Winkelmann (2008) and Cameron and Trivedi background on the Poisson regression model and its pr...

    [...]

  • ...2 See Winkelmann (2008) and Cameron and Trivedi background on the Poisson regression model and its pr 0165-1765/$ – see front matter © 2010 Elsevier B.V....

    [...]


Journal ArticleDOI
Willard G. Manning1, John Mullahy2Institutions (2)
Abstract: Health economists often use log models to deal with skewed outcomes, such as health utilization or health expenditures. The literature provides a number of alternative estimation approaches for log models, including ordinary least-squares on ln(y) and generalized linear models. This study examines how well the alternative estimators behave econometrically in terms of bias and precision when the data are skewed or have other common data problems (heteroscedasticity, heavy tails, etc.). No single alternative is best under all conditions examined. The paper provides a straightforward algorithm for choosing among the alternative estimators. Even if the estimators considered are consistent, there can be major losses in precision from selecting a less appropriate estimator.

1,755 citations


Book
Rainer Winkelmann1Institutions (1)
01 Mar 1997
Abstract: The book provides graduate students and researchers with an up-to-date survey of statistical and econometric techniques for the analysis of count data, with a focus on conditional distribution models. Proper count data probability models allow for rich inferences, both with respect to the stochastic count process that generated the data, and with respect to predicting the distribution of outcomes. The book starts with a presentation of the benchmark Poisson regression model. Alternative models address unobserved heterogeneity, state dependence, selectivity, endogeneity, underreporting, and clustered sampling. Testing and estimation is discussed from frequentist and Bayesian perspectives. Finally, applications are reviewed in fields such as economics, marketing, sociology, demography, and health sciences.The fifth edition contains several new topics, including copula functions, Poisson regression for non-counts, additional semi-parametric methods, and discrete factor models. Other sections have been reorganized, rewritten, and extended.

976 citations


Journal ArticleDOI
Adelin Albert1, Ja. Anderson1Institutions (1)
Abstract: SUMMARY The problems of existence, uniqueness and location of maximum likelihood estimates in log linear models have received special attention in the literature (Haberman, 1974, Chapter 2; Wedderburn, 1976; Silvapulle, 1981). For multinomial logistic regression models, we prove existence theorems by considering the possible patterns of data points, which fall into three mutually exclusive and exhaustive categories: complete separation, quasicomplete separation and overlap. Our results suggest general rules for identifying infinite parameter estimates in log linear models for frequency tables.

954 citations


"On the existence of the maximum lik..." refers methods in this paper

  • ...This situation is analogous to what happens in binary choice models when there is complete separation or quasi-complete separation, as described by Albert and Anderson (1984) and Santner and Duffy (1986). Moreover, it is clear that it can also occur in any other regression model where the conditional mean is specified in such a way that its image does not include all the points in the support of the dependent variable....

    [...]

  • ...This situation is analogous to what happens in binary choice models when there is complete separation or quasi-complete separation, as described by Albert and Anderson (1984) and Santner and Duffy (1986)....

    [...]


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20221
202142
202031
201922
201824
201716