scispace - formally typeset
Search or ask a question

Showing papers on "Overdispersion published in 2003"


Journal ArticleDOI
TL;DR: In this paper, a zero-inflated negative binomial mixed regression model is presented to analyze a set of pancreas disorder length of stay (LOS) data that comprised mainly same-day separations.
Abstract: In many biometrical applications, the count data encountered often contain extra zeros relative to the Poisson distribution. Zero-inflated Poisson regression models are useful for analyzing such data, but parameter estimates may be seriously biased if the nonzero observations are over-dispersed and simultaneously correlated due to the sampling design or the data collection procedure. In this paper, a zero-inflated negative binomial mixed regression model is presented to analyze a set of pancreas disorder length of stay (LOS) data that comprised mainly same-day separations. Random effects are introduced to account for inter-hospital variations and the dependency of clustered LOS observations. Parameter estimation is achieved by maximizing an appropriate log-likelihood function using an EM algorithm. Alternative modeling strategies, namely the finite mixture of Poisson distributions and the non-parametric maximum likelihood approach, are also considered. The determination of pertinent covariates would assist hospital administrators and clinicians to manage LOS and expenditures efficiently.

286 citations


Journal ArticleDOI
TL;DR: The negative binomial model provides an alternative approach for the analysis of discrete data where overdispersion is a problem, provided that the model is correctly specified and adequately fits the data.

180 citations


Journal ArticleDOI
TL;DR: In this paper, a fully parametric approach is taken and a marginal distribution for the counts is specified, where conditional on past observations the mean is autoregressive, and a variety of models, based on the double Poisson distribution of Efron (1986) is introduced, which in a first step introduce an additional dispersion parameter and in a second step make this dispersion parameters time-varying.
Abstract: This paper introduces and evaluates new models for time series count data. The Autoregressive Conditional Poisson model (ACP) makes it possible to deal with issues of discreteness, overdispersion (variance greater than the mean) and serial correlation. A fully parametric approach is taken and a marginal distribution for the counts is specified, where conditional on past observations the mean is autoregressive. This enables to attain improved inference on coefficients of exogenous regressors relative to static Poisson regression, which is the main concern of the existing literature, while modelling the serial correlation in a flexible way. A variety of models, based on the double Poisson distribution of Efron (1986) is introduced, which in a first step introduce an additional dispersion parameter and in a second step make this dispersion parameter time-varying. All models are estimated using maximum likelihood which makes the usual tests available. In this framework autocorrelation can be tested with a straightforward likelihood ratio test, whose simplicity is in sharp contrast with test procedures in the latent variable time series count model of Zeger (1988). The models are applied to the time series of monthly polio cases in the U.S between 1970 and 1983 as well as to the daily number of price change durations of .75$ on the IBM stock. A .75$ price-change duration is defined as the time it takes the stock price to move by at least .75$. The variable of interest is the daily number of such durations, which is a measure of intradaily volatility, since the more volatile the stock price is within a day, the larger the counts will be. The ACP models provide good density forecasts of this measure of volatility.

160 citations


Journal ArticleDOI
TL;DR: Morel et al. as discussed by the authors extended the correction originally suggested for the generalized logistic link, to other link functions and distributions, when parameters are estimated by GEE, and showed that the small sample correction was effective in reducing the Type I error rates when the number of clusters is relatively small.
Abstract: When clustered multinomial responses are fit using the generalized logistic link, Morel (1989) introduced a small sample correction in the Taylor series based estimator of the covariance matrix of the parameter estimates. The correction reduces the bias of the Type I error rates in small samples and guarantees positive definiteness of the estimated variance-covariance matrix. It is well known that small sample bias in the use of the Delta method persists in any application of the Generalized Estimating Equations (GEE) methodology. In this article, we extend the correction originally suggested for the generalized logistic link, to other link functions and distributions, when parameters are estimated by GEE. In a Monte Carlo study with correlated data generated under different sampling schemes, the small sample correction has been shown to be effective in reducing the Type I error rates when the number of clusters is relatively small.

156 citations


Journal ArticleDOI
TL;DR: In this article, the authors extend the negative binomial loglinear model to the case of dependent counts, where dependence among the counts is handled by including linear combinations of random effects in the linear predictor.
Abstract: The Poisson loglinear model is a common choice for explaining variability in counts. However, in many practical circumstances the restriction that the mean and variance are equal is not realistic. Overdispersion with respect to the Poisson distribution can be modeled explicitly by integrating with respect to a mixture distribution, and use of the conjugate gamma mixing distribution leads to a negative binomial loglinear model. This paper extends the negative binomial loglinear model to the case of dependent counts, where dependence among the counts is handled by including linear combinations of random effects in the linear predictor. If we assume that the vector of random effects is multivariate normal, then complex forms of dependence can be modelled by appropriate specification of the covariance structure. Although the likelihood function for the resulting model is not tractable, maximum likelihood estimates (and standard errors) can be found using the NLMIXED procedure in SAS or, in more complicated ex...

121 citations


Journal ArticleDOI
TL;DR: The Poisson regression model is frequently used to analyze count data, but data are often over- or sometimes even underdispersed as compared to the standard Poisson model, so the definition of Poisson R-squared measures can be applied in these situations, albeit with bias adjustments accordingly adapted.

108 citations


Journal ArticleDOI
TL;DR: It is shown how random terms, describing both yearly variation and overdispersion, can easily be incorporated into models for mark-recovery data, through the use of Bayesian methods, and how the incorporation of the random terms greatly improves the goodness of fit.
Abstract: We show how random terms, describing both yearly variation and overdispersion, can easily be incorporated into models for mark-recovery data, through the use of Bayesian methods For recovery data on lapwings, we show that the incorporation of the random terms greatly improves the goodness of fit Omitting the random terms can lead to overestimation of the significance of weather on survival, and overoptimistic prediction intervals in simulations of future population behavior Random effects models provide a natural way of modeling overdispersion-which is more satisfactory than the standard classical approach of scaling up all standard errors by a uniform inflation factor We compare models by means of Bayesian p-values and the deviance information criterion (DIC)

72 citations


Journal ArticleDOI
TL;DR: In this paper, a truncated Poisson regression model is used to arrive at point and interval estimates of the size of two offender populations, i.e. drunk drivers and persons who illegally possess firearms.
Abstract: The truncated Poisson regression model is used to arrive at point and interval estimates of the size of two offender populations, i.e. drunk drivers and persons who illegally possess firearms. The dependent capture‐recapture variables are constructed from Dutch police records and are counts of individual arrests for both violations. The population size estimates are derived assuming that each count is a realization of a Poisson distribution, and that the Poisson parameters are related to covariates through the truncated Poisson regression model. These assumptions are discussed in detail, and the tenability of the second assumption is assessed by evaluating the marginal residuals and performing tests on overdispersion. For the firearms example, the second assumption seems to hold well, but for the drunk drivers example there is some overdispersion. It is concluded that the method is useful, provided it is used with care.

70 citations


Journal ArticleDOI
TL;DR: In this paper, a bivariate zero-inflated negative binomial regression model for count data with excess zeros is proposed, and an estimation method based on the EM and quasi-Newton algorithms is given.

65 citations


Journal ArticleDOI
Abstract: This article proposes a computer-intensive methodology to build bonus-malus scales in automobile insurance. The claim frequency model is taken from Pinquet, Guillen, and Bolance (2001). It accounts for overdispersion, heteroskedasticity, and dependence among repeated observations. Explanatory variables are taken into account in the determination of the relativities, yielding an integrated automobile ratemaking scheme. In that respect, it complements the study of Taylor (1997).

54 citations


Book
04 Aug 2003
TL;DR: In this paper, a review of Ordinary Linear Regression and its Assumptions is presented, along with an analysis of Correlated Data by Ordinary Unweighted Least-Squares Estimation.
Abstract: Preface.Acknowledgments.Acronyms.Introduction.I.1 Newborn Lung Project.I.2 Wisconsin Diabetes Registry.I.3 Wisconsin Sleep Cohort Study.Suggested Reading.1 Review of Ordinary Linear Regression and Its Assumptions.1.1 The Ordinary Linear Regression Equation and Its Assumptions.1.1.1 Straight-Line Relationship.1.1.2 Equal Variance Assumption.1.1.3 Normality Assumption.1.1.4 Independence Assumption.1.2 A Note on How the Least-Squares Estimators are Obtained.Output Packet I: Examples of Ordinary Regression Analyses.2 The Maximum Likelihood Approach to Ordinary Regression.2.1 Maximum Likelihood Estimation.2.2 Example.2.3 Properties of Maximum Likelihood Estimators.2.4 How to Obtain a Residual Plot with PROC MIXED.Output Packet II: Using PROC MIXED and Comparisons to PROC RE G.3 Reformulating Ordinary Regression Analysis in Matrix Notation.3.1 Writing the Ordinary Regression Equation in Matrix Notation.3.1.1 Example.3.2 Obtaining the Least-Squares Estimator beta in Matrix Notation.3.2.1 Example: Matrices in Regression Analysis.3.3 List of Matrix Operations to Know.4 Variance Matrices and Linear Transformations.4.1 Variance and Correlation Matrices.4.1.1 Example.4.2 How to Obtain the Variance of a Linear Transformation.4.2.1 Two Variables.4.2.2 Many Variables.5 Variance Matrices of Estimators of Regression Coefficients.5.1 Usual Standard Error of Least-Squares Estimator of Regression Slope in Nonmatrix Formulation.5.2 Standard Errors of Least-Squares Regression Estimators in Matrix Notation.5.2.1 Example.5.3 The Large Sample Variance Matrix of Maximum Likelihood Estimators.5.4 Tests and Confidence Intervals.5.4.1 Example-Comparing PROC REG and PROC MIXED.6 Dealing with Unequal Variance Around the Regression Line.6.1 Ordinary Least Squares with Unequal Variance.6.1.1 Examples.6.2 Analysis Taking Unequal Variance into Account.6.2.1 The Functional Transformation Approach.6.2.2 The Linear Transformation Approach.6.2.3 Standard Errors of Weighted Regression Estimators.Output Packet III: Applying the Empirical Option to Adjust Standard Errors.Output Packet IV: Analyses with Transformation of the Outcome Variable to Equalize Residual Variance.Output Packet V: Weighted Regression Analyses of GHb Data on Age.7 Application of Weighting with Probability Sampling and Nonresponse.7.1 Sample Surveys with Unequal Probability Sampling.7.1.1 Example.7.2 Examining the Impact of Nonresponse.7.2.1 Example (of Reweighting as Well as Some SAS Manipulations).7.2.2 A Few Comments on Weighting by a Variable Versus Including it in the Regression Model.Output Packet VI: Survey and Missing Data Weights.8 Principles in Dealing with Correlated Data.8.1 Analysis of Correlated Data by Ordinary Unweighted Least-Squares Estimation.8.1.1 Example.8.1.2 Deriving the Variance Estimator.8.1.3 Example.8.2 Specifying Correlation and Variance Matrices.8.3 The Least-Squares Equation Incorporating Correlation.8.3.1 Another Application of the Spectral Theorem.8.4 Applying the Spectral Theorem to the Regression Analysis of Correlated Data.8.5 Analysis of Correlated Data by Maximum Likelihood.8.5.1 Non equal Variance.8.5.2 Correlated Errors.8.5.3 Example.Output Packet VII: Analysis of Longitudinal Data in Wisconsin Sleep Cohort.9 A Further Study of How the Transformation Works with Correlated Data.9.1 Why Would ?W and ?B Differ?9.2 How the Between- and Within-Individual Estimators are Combined.9.3 How to Proceed in Practice.9.3.1 Example.Output Packet VIII: Investigating and Fitting Within- and Between-Individual Effects.10 Random Effects.10.1 Random Intercept.10.1.1 Example.10.1.2 Example.10.2 Random Slopes.10.2.1 Example.10.3 Obtaining "The Best" Estimates of Individual Intercepts and Slopes.10.3.1 Example.Output Packet IX: Fitting Random Effects Models.11 The Normal Distribution and Likelihood Revisited.11.1 PROC GENMOD.11.1.1 Example.Output Packet X: Introducing PROC GENMOD.12 The Generalization to Non-normal Distributions.12.1 The Exponential Family.12.1.1 The Binomial Distribution.12.1.2 The Poisson Distribution.12.1.3 Example.12.2 Score Equations for the Exponential Family and the Canonical Link.12.3 Other Link Functions.12.3.1 Example.13 Modeling Binomial and Binary Outcomes.13.1 A Brief Review of Logistic Regression.13.1.1 Example: Review of the Output from PROC LOGIST.13.2 Analysis of Binomial Data in the Generalized Linear Models Framework.13.2.1 Example of Logistic Regression with Binary Outcome.13.2.2 Example with Binomial Outcome.13.2.3 Some More Examples of Goodness-of-Fit Tests.13.3 Other Links for Binary and Binomial Data.13.3.1 Example.Output Packet XI: Logistic Regression Analysis with PROC LOGIST and PROC GENMOD.Output Packet XII: Analysis of Grouped Binomial Data.Output Packet XIII: Some Goodness-of-Fit Tests for Binomial Outcome.Output Packet XIV: Three Link Functions for Binary Outcome.Output Packet XV: Poisson Regression.Output Packet XVI: Dealing with Overdispersion in Rates.14 Modeling Poisson Outcomes-The Analysis of Rates.14.1 Review of Rates.14.1.1 Relationship Between Rate and Risk.14.2 Regression Analysis.14.3 Example with Cancer Mortality Rates.14.3.1 Example with Hospitalization of Infants.14.4 Overdispersion.14.4.1 Fitting a Dispersion Parameter.14.4.2 Fitting a Different Distribution.14.4.3 Using Robust Standard Errors.14.4.4 Applying Adjustments for Over Dispersion to the Examples.Output Packet XV: Poisson Regression.15 Modeling Correlated Outcomes with Generalized Estimating Equations.15.1 A Brief Review and Reformulation of the Normal Distribution, Least Squares and Likelihood.15.2 Further Developments for the Exponential Family.15.3 How are the Generalized Estimating Equations Justified?15.3.1 Analysis of Longitudinal Systolic Blood Pressure by PROC MIXED and GENMOD.15.3.2 Analysis of Longitudinal Hypertension Data by PROC GENMOD.15.3.3 Analysis of Hospitalizations Among VLBW Children Up to Age 5.15.4 Another Way to Deal with Correlated Binary Data.Output Packet XVII: Mixed Versus GENMOD for Longitudinal SBP and Hypertension Data.Output Packet XVIII: Longitudinal Analysis of Rates.Output Packet XIX: Conditional Logistic Regression of Hypertension Data.References.Appendix: Matrix Operations.A.1 Adding Matrices.A.2 Multiplying Matrices by a Number.A.3 Multiplying Matrices by Each Other.A.4 The Inverse of a Matrix.Index.

Posted Content
TL;DR: In this paper, a computer-intensive methodology to build bonus-malus scales in automobile insurance is proposed, based on Pinquet, Guillen and Bolance's claim frequency model.
Abstract: This article proposes a computer-intensive methodology to build bonus-malus scales in automobile insurance. The claim frequency model is taken from Pinquet, Guillen and Bolance (2001). It accounts for overdispersion and dependence over repeated observations. Explanatory variables are taken into account in the determination of the relativities, yielding an integrated automobile ratemaking scheme. In that respect, it complements the study of Taylor (1997).

Book
09 Jan 2003
TL;DR: The generalized linear model - a brief non-technical account of the application of generalized linear models in medical Invetigations and some examples of theApplication of finite mixture densities in medical research.
Abstract: Preface. Prologue. 1. The Generalized Linear Model. 1.1 Introduction. 1.2 The generalized linear model - a brief non-technical account. 1.3 Examples of the application of generalized linear models. 1.4 Poisson regression. 1.5 Overdispersion. 1.6 Summary. 2. Generalized Linear Models for Longitudinal Data. 2.1 Introduction. 2.2 Marginal and conditional regression models. 2.3 Marginal and conditional regression models for continuous responses with Gaussian errors. 2.4 Marginal and conditional regression models for non-normal responses. 2.5 Summary. 3. Missing Values, Drop-outs, Compliance and Intention-to-Treat. 3.1 Introduction. 3.2 Missing values and drop-outs. 3.3 Modelling longitudinal data containing ignorable missing values. 3.4 Non-ignorable missing values. 3.5 Compliance and intention-to-treat. 3.6 Summary. 4. Generalized Additive Models. 4.1 Introduction. 4.2 Scatterplot smoothers. 4.3 Additive and generalized additive models. 4.4 Examples of the application of GAMs. 4.5 Summary. 5. Classification and Regression Trees. 5.1 Introduction. 5.2 Tree-based models. 5.3 Birthweight of babies. 5.4 Summary. 6. Survival Analysis I: Cox's Regression. 6.1 Introduction. 6.2 The survivor function. 6.3 The hazard function. 6.4 Cox's proportional hazards model. 6.5 Left truncation. 6.6 Extending Cox's model by stratification. 6.7 Checking the specification of a Cox model. 6.8 Summary. 7. Survival Analysis II: Time-dependent Covariates, Frailty and Tree Models. 7.1 Introduction. 7.2 Time-dependent covariates. 7.3 Random effects models for survival data. 7.4 Tree-structured survival analysis. 7.5 Summary. 8. Bayesian Methods and Meta-analysis. 8.1 Introduction. 8.2 Bayesian methods. 8.3 Meta-analysis. 8.4 Summary. 9. Exact Inference for Categorical Data. 9.1 Introduction. 9.2 Small expected values in contingency table, Yates' correction and Fisher's exact test. 9.3 Examples of the use of exact p-values. 9.4 Logistic regression and conditional logistic regression for sparse data. 9.5 Summary. 10. Finite Mixture Models. 10.1 Introduction. 10.2 Finite mixture distributions. 10.3 Estimating the parameters in finite mixture models. 10.4 Some examples of the application of finite mixture densities in medical research. 10.5 Latent class analysis - mixtures for binary data. 10.6 Summary. Glossary. Appendix A: Statistical Graphics in Medical Invetigations. A.1 Introduction. A.2 Probability plots. A.3 Scatterplots and beyond. A.4 Scatterplot matrices. A.5 Coplots and trellis graphics. Appendix B: Answers to Selected Exercises. References. Index.

Journal ArticleDOI
TL;DR: A zero‐truncated negative binomial mixed regression model is presented to analyse overdispersed positive count data and assists hospital administrators and clinicians to estimate the number of subsequent readmissions based on characteristics of the patient at the index stroke.
Abstract: A zero-truncated negative binomial mixed regression model is presented to analyse overdispersed positive count data. The study is motivated by the determination of pertinent risk factors associated with ischaemic stroke hospitalizations. Random effects are incorporated in the linear predictor to adjust for inter-hospital variations and the dependency of clustered observations using the generalized linear mixed model approach. The method assists hospital administrators and clinicians to estimate the number of subsequent readmissions based on characteristics of the patient at the index stroke. The findings have important implications on resource usage, rehabilitation planning and management of acute stroke care.

Posted Content
TL;DR: The Multivariate Autoregressive conditional Poisson model (MACP) as discussed by the authors is a multivariate model for time series count data, which makes it possible to deal with issues of discreteness, overdispersion and both auto-and cross-correlation.
Abstract: This paper introduces a new multivariate model for time series count data The Multivariate Autoregressive Conditional Poisson model (MACP) makes it possible to deal with issues of discreteness, overdispersion (variance greater than the mean) and both auto- and cross-correlation We model counts as Poisson or double Poisson and assume that conditionally on past observations the means follow a Vector Autoregression We use a copula to introduce contemporaneous correlation between the series An important advantage of this model is that it can accommodate both positive and negative correlation among variables As a feasible alternative to multivariate duration models, the model is applied to the submission of market orders and quote revisions on IBM on the New York Stock Exchange We show that a single factor cannot explain the dynamics of the market process, which confirms that time deformation, taken as meaning that all market events should accelerate or slow down proportionately, does not hold We advocate the use of the Multivariate Autoregressive Conditional Poisson model for the study of multivariate point processes in finance, when the number of variables considered simultaneously exceeds two and looking at durations becomes too difficult

Journal ArticleDOI
TL;DR: In this paper, a generalization of the simple Poisson process is discussed and illustrated with an analysis of some longitudinal count data on frequencies of epileptic fits, showing that some covariates have a more significant effect using this modelling than from using mixed Poisson models.
Abstract: Models based on a generalization of the simple Poisson process are discussed and illustrated with an analysis of some longitudinal count data on frequencies of epileptic fits. The models enable a broad class of discrete distributions to be constructed, which cover a variety of dispersion properties that can be characterized in an intuitive and appealing way by a simple parameterization. This class includes the Poisson and negative binomial distributions as well as other distributions with greater dispersion than Poisson, and also distributions underdispersed relative to the Poisson distribution. Comparing a number of analyses of the data shows that some covariates have a more significant effect using this modelling than from using mixed Poisson models. It is argued that this could be due to the mixed Poisson models used in the other analyses not providing an appropriate description of the residual variation, with the greater flexibility of the generalized Poisson modelling generally enabling more critical...

Journal ArticleDOI
TL;DR: In this paper, the authors present an integrated set of Bayesian tools one can use to model heterogeneous event counts with overdispersion and contextual heterogeneity, and compare the estimates from this model with traditional approaches with little start-up cost.
Abstract: This article presents an integrated set of Bayesian tools one can use to model heterogeneous event counts. While models for event count cross sections are now widely used, little has been written about how to model counts when contextual factors introduce heterogeneity. The author begins with a discussion of Bayesian cross-sectional count models and discusses an alternative model for counts with overdispersion. To illustrate the Bayesian framework, the author fits the model to the number of women’s rights cosponsorships for each member of the 83rd to 102nd House of Representatives. The model is generalized to allow for contextual heterogeneity. The hierarchical model allows one to explicitly model contextual factors and test alternative contextual explanations, even with a small number of contextual units. The author compares the estimates from this model with traditional approaches and discusses software one can use to easily implement these Bayesian models with little start-up cost.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the application of local influence and residual analysis through deviance residual in nonlinear negative binomial models and derive appropriate matrices for assessing the local influence on the parameter estimates by considering as influence measure the likelihood displacement.
Abstract: Nonlinear negative binomial models represent a general class of nonlinear regression models that may be applied to fit growth curves for overdispersed count data. We discuss in this article application of local influence and residual analysis through deviance residual in nonlinear negative binomial models. We derive the appropriate matrices for assessing the local influence on the parameter estimates by considering as influence measure the likelihood displacement. An example in which different growth curves for count data are compared is given for illustration.

DOI
01 Jan 2003
TL;DR: Semiparametric count data models which can deal with overdispersion in count data regression by incorporating corresponding components in structured additive form into the predictor are developed and studied.
Abstract: Overdispersion in count data regression is often caused by neglection or inappropriate modelling of individual heterogeneity, temporal or spatial correlation, and nonlinear covariate effects. In this paper, we develop and study semiparametric count data models which can deal with these issues by incorporating corresponding components in structured additive form into the predictor. The models are fully Bayesian and inference is carried out by computationally efficient MCMC techniques. In a simulation study, we investigate how well the different components can be identified with the data at hand. The approach is applied to a large data set of claim frequencies from car insurance.

Journal ArticleDOI
TL;DR: The relationship between photochemical air pollutants and emergency room admissions for asthma in Madrid (Spain) for the period 1995–1998 was analysed using the statistical models commonly used to studying the short-term effects of air pollution on health: linear and Cochrane-Orcutt regression, standard Poisson and Poisson corrected by overdispersion, Poisson autoregressive models, and generalised additive models.
Abstract: The relationship between photochemical air pollutants (nitrogen dioxide and ozone) and emergency room admissions for asthma in Madrid (Spain) for the period 1995–1998 was analysed using the statistical models commonly used to studying the short-term effects of air pollution on health: linear and Cochrane-Orcutt regression, standard Poisson and Poisson corrected by overdispersion, Poisson autoregressive models, and generalised additive models. Linear regression models presented residual autocorrelation, Poisson regression models also showed overdispersion, and generalised additive models did not show residual autocorrelation and overdispersion was substantially reduced. Linear models provided biased estimates because our health outcome is non-normally distributed. Estimates from Poisson regression allowing for overdispersion and autocorrelation did not differ substantially from those reported by generalised additive models, which present the best model fit in terms of the absence of autocorrelation and reduction of overdispersion.

Posted Content
TL;DR: In this paper, a fully parametric approach is taken and a marginal distribution for the counts is specified, where conditional on past observations the mean is autoregressive, and a variety of models, based on the double Poisson distribution of Efron (1986) is introduced, which in a first step introduce an additional dispersion parameter and in a second step make this dispersion parameters time-varying.
Abstract: This paper introduces and evaluates new models for time series count data. The Autoregressive Conditional Poisson model (ACP) makes it possible to deal with issues of discreteness, overdispersion (variance greater than the mean) and serial correlation. A fully parametric approach is taken and a marginal distribution for the counts is specified, where conditional on past observations the mean is autoregressive. This enables to attain improved inference on coefficients of exogenous regressors relative to static Poisson regression, which is the main concern of the existing literature, while modelling the serial correlation in a flexible way. A variety of models, based on the double Poisson distribution of Efron (1986) is introduced, which in a first step introduce an additional dispersion parameter and in a second step make this dispersion parameter time-varying. All models are estimated using maximum likelihood which makes the usual tests available. In this framework autocorrelation can be tested with a straightforward likelihood ratio test, whose simplicity is in sharp contrast with test procedures in the latent variable time series count model of Zeger (1988). The models are applied to the time series of monthly polio cases in the U.S between 1970 and 1983 as well as to the daily number of price change durations of .75$ on the IBM stock. A .75$ price-change duration is defined as the time it takes the stock price to move by at least .75$. The variable of interest is the daily number of such durations, which is a measure of intradaily volatility, since the more volatile the stock price is within a day, the larger the counts will be. The ACP models provide good density forecasts of this measure of volatility.

Journal ArticleDOI
TL;DR: The main sources of overdisPersion are identified according to the different levels of perception of mastitis risk and the proposed solutions to control for overdispersion at each study level are discussed.
Abstract: Modelling case occurrence and risk factors for clinical mastitis, as a key multifactorial disease in the dairy cow, requires statistical models. The type of model used depends on the choice of perception or the study level: herd, lactation, animal, udder and quarter. The validity of the tests that are performed through these models is especially ensured when hypotheses of independence between statistical units are respected, and when the model adjustments do not involve overdispersion faced with the observed data. In the article, the main sources of overdispersion are identified according to the different levels of perception of mastitis risk. Then, the proposed solutions to control for overdispersion at each study level are discussed and the difficulty to compare the study results is highlighted through a variety of methodological choices of the authors. Two main categories of models are used for modelling clinical mastitis, i.e. generalist exploratory models and explanatory designed models. The contribution of the explanatory models to improve modelling accuracy and relevance is documented through the two main published methodological approaches, the first one being based on a states model, and the second on a survival model. The integration and optimisation of such explanatory modelling methods should be possible in the future in order to develop a more global explanatory model including herd risk factors, which could pertinently predict udder infections (both clinical and subclinical) at the cow, lactation, or even udder and quarter levels.

01 Jan 2003
TL;DR: In this article, the authors investigate two sets of overdispersed models when Poisson distribution does not fit to count data: a class of Poisson mixture with Tweedie mixing distributions and an exponential dispersion model with a unit variance function of the form µ + µp, where p is a real number.
Abstract: We investigate two sets of overdispersed models when Poisson distribution does not fit to count data: a class of Poisson mixture with Tweedie mixing distributions and a class of exponential dispersion models which have a unit variance function of the form µ + µp, where p is a real number These two classes generalize the negative binomial distribution which is classically used in the framework of regression models for count data when overdispersion results in a lack of fit of the Poisson regression model Some properties are then studied and discussed

Journal ArticleDOI
TL;DR: This work defines a class of generalized log-linear models with random effects, which contains most standard models currently used for categorical data analysis and suggests some new models that are useful for applications such as smoothing large contingency tables and modeling heterogeneity in odds ratios.
Abstract: We define a class of generalized log-linear models with random effects. For a vector of Poisson or multinomial means m and matrices of constants C and A, the model has the form C log Aμ = Xβ + Zu, where β are fixed effects and u are random effects. The model contains most standard models currently used for categorical data analysis. We suggest some new models that are special cases of this model and are useful for applications such as smoothing large contingency tables and modeling heterogeneity in odds ratios. We present examples of its use for such applications. In many cases, maximum likelihood model fitting can be handled with existing methods and software. We outline extensions of model fitting methods for other cases. We also summarize several challenges for future research, such as fitting the model in its most general form and deriving properties of estimates used in smoothing contingency tables.

Journal ArticleDOI
TL;DR: Five methods based on bootstrap techniques are compared empirically by their application to mortality data due to cardiovascular diseases in women from Navarra, Spain, during the period 1988-1994, finding that the BC method seems to provide better coverage probabilities in the case studied, according to a small scale simulation study.
Abstract: Several analysis of the geographic variation of mortality rates in space have been proposed in the literature. Poisson models allowing the incorporation of random effects to model extra-variability are widely used. The typical modelling approach uses normal random effects to accommodate local spatial autocorrelation. When spatial autocorrelation is absent but overdispersion persists, a discrete mixture model is an alternative approach. However, a technique for identifying regions which have significant high or low risk in any given area has not been developed yet when using the discrete mixture model. Taking into account the importance that this information provides to the epidemiologists to formulate hypothesis related to the potential risk factors affecting the population, different procedures for obtaining confidence intervals for relative risks are derived in this paper. These methods are the standard information-based method and other four, all based on bootstrap techniques, namely the asymptotic-bootstrap, the percentile-bootstrap, the BC-bootstrap and the modified information-based method. All of them are compared empirically by their application to mortality data due to cardiovascular diseases in women from Navarra, Spain, during the period 1988-1994. In the small area example considered here, we find that the information-based method is sensible at estimating standard errors of the component means in the discrete mixture model but it is not appropriate for providing standard errors of the estimated relative risks and hence, for constructing confidence intervals for the relative risk associated to each region. Therefore, the bootstrap-based methods are recommended for this matter. More specifically, the BC method seems to provide better coverage probabilities in the case studied, according to a small scale simulation study that has been carried out using a scenario as encountered in the analysis of the real data.

Posted ContentDOI
01 Jan 2003
TL;DR: In this article, the authors investigate the issues related to proper modelling of the fast decay process and the associated long tails in recreation demand analysis and introduce two categories of alternative count data models.
Abstract: Since the early 1990s, researchers have routinely used count data models (such as the Poisson and negative binomial) to estimate the demand for recreational activities. Along with the success and popularity of count data models in recreational demand analysis during the last decade, a number of shortcomings of standard count data models became obvious to researchers. This had led to the development of new and more sophisticated model specifications. Furthermore, semi-parametric and non-parametric approaches have also made their way into count data models. Despite these advances, however, one interesting issue has received little research attention in this area. This is related to the fast decay process of the dependent variable and the associated long tail. This phenomenon is observed quite frequently in recreational demand studies; most recreationists make one or two trips while a few of them make exceedingly large number of trips. This introduces an extreme form of overdispersion difficult to address in popular count data models. The major objective of this paper is to investigate the issues related to proper modelling of the fast decay process and the associated long tails in recreation demand analysis. For this purpose, we introduce two categories of alternative count data models. The first group includes four alternative count data models, each characterised by a single parameter while the second group includes one count data model characterised by two parameters. This paper demonstrates how these alternative models can be used to properly model the fast decay process and the associated long tail commonly observed in recreation demand analysis. The first four alternative count data models are based on an adaptation of the geometric, Borel, logarithmic and Yule probability distributions to count data models while the second group of models relied on the use of the generalised Poisson probability distribution. All these alternative count data models are empirically implemented using the maximum likelihood estimation procedure and applied to study the demand for moose hunting in Northern Ontario. Econometric results indicate that most of the alternative count data models proposed in this paper are able to capture the fast decay process characterising the number of moose hunting trips. Overall they seem to perform as well as the conventional negative binomial model.and better than the Poisson specification. However further investigation of the econometric results reveal that the geometric and generalised Poisson model specifications fare better than the modified Borel and Yule regression models.

01 Jan 2003
TL;DR: The result of test proves that negative binomial regression model rather than poisson regression model well reflects traffic accident frequencies of taxi drivers in this study.
Abstract: This study has the research purpose for establishing model considering overdispersion appeared frequently in count data and applies the model into traffic accidents by taxi drivers. The model usually used in current research on count data is based on linear regression analysis because of its simple concept and easy usage. But linear regression model has the following shortcoming; unreflection of non-negative integer characteristics in count data, unreliability of forecasting results, and not offering of discrete probability on traffic accident frequencies. The model appropriate for analyzing count data is poisson regression model. However, poission regression model is based on the concept that expected mean of distribution is equal to variance. The most count data has variance greater than mean, which results in underestimating standard error. In study, overdispersion test is used to estimate the degree of statistical difference between mean and variance. The result of test proves that negative binomial regression model rather than poisson regression model well reflects traffic accident frequencies of taxi drivers in this study. Likelihood ratio test and Theil's inequality coefficient test are applied for the validity of variables and for accuracy of estimating model.

Journal Article
TL;DR: In this article, the so-called indirect method of inference is applied to survival analyses of two data sets with repeated events, and the methodology is applied in both parametric and semi-parametric regression analyses to accommodate random effects and covariate measurement error.
Abstract: In this paper we describe the so-called “indirect” method of inference, originally developed from the econometric literature, and apply it to survival analyses of two data sets with repeated events. This method is often more convenient computationally than maximum likelihood estimation when handling such model complexities as random effects and measurement error, for example; and it can also serve as a basis for robust inference with less stringent assumptions on the data generating mechanism. The first data set concerns recurrence times of mammary tumors in rats and is modeled using a Poisson process model with covariates and frailties. The second data set involves times of recurrences of skin tumors in individual patients in a clinical trial. The methodology is applied in both parametric and semi-parametric regression analyses to accommodate random effects and covariate measurement error. MSC: 62N01, 62N02, 62P10

Posted Content
TL;DR: In this paper, the authors investigate the issues related to proper modelling of the fast decay process and the associated long tails in recreation demand analysis and introduce two categories of alternative count data models.
Abstract: Since the early 1990s, researchers have routinely used count data models (such as the Poisson and negative binomial) to estimate the demand for recreational activities. Along with the success and popularity of count data models in recreational demand analysis during the last decade, a number of shortcomings of standard count data models became obvious to researchers. This had led to the development of new and more sophisticated model specifications. Furthermore, semi-parametric and non-parametric approaches have also made their way into count data models. Despite these advances, however, one interesting issue has received little research attention in this area. This is related to the fast decay process of the dependent variable and the associated long tail. This phenomenon is observed quite frequently in recreational demand studies; most recreationists make one or two trips while a few of them make exceedingly large number of trips. This introduces an extreme form of overdispersion difficult to address in popular count data models. The major objective of this paper is to investigate the issues related to proper modelling of the fast decay process and the associated long tails in recreation demand analysis. For this purpose, we introduce two categories of alternative count data models. The first group includes four alternative count data models, each characterised by a single parameter while the second group includes one count data model characterised by two parameters. This paper demonstrates how these alternative models can be used to properly model the fast decay process and the associated long tail commonly observed in recreation demand analysis. The first four alternative count data models are based on an adaptation of the geometric, Borel, logarithmic and Yule probability distributions to count data models while the second group of models relied on the use of the generalised Poisson probability distribution. All these alternative count data models are empirically implemented using the maximum likelihood estimation procedure and applied to study the demand for moose hunting in Northern Ontario. Econometric results indicate that most of the alternative count data models proposed in this paper are able to capture the fast decay process characterising the number of moose hunting trips. Overall they seem to perform as well as the conventional negative binomial model and better than the Poisson specification. However further investigation of the econometric results reveal that the geometric and generalised Poisson model specifications fare better than the modified Borel and Yule regression models.

Journal ArticleDOI
TL;DR: In this paper, a generalised estimating equations procedure is discussed for the estimation of the parameters of the Poisson-log-normal mixed model under a longitudinal set-up, which is useful for accommodating the overdispersion and correlations often observed among count data.
Abstract: Poisson mixed models are useful for accommodating the overdispersion and correlations often observed among count data. These models are generated from the well-known independent Poisson model by adding normally distributed random effects to the linear predictor, and they are known as Poisson-log-normal mixed models. Unfortunately, a full likelihood analysis for such Poisson mixed models is hampered by the need for numerical integration. In this paper, generalised estimating equations procedure is discussed for the estimation of the parameters of the Poisson-log-normal mixed model under a longitudinal set-up. Keywords : count clustered data, higher order structural moments, mixed effects, multivariate repeated measures, normal based higher order longitudinal moments.