scispace - formally typeset
Search or ask a question

Showing papers on "Poisson regression published in 2005"


Book
15 Aug 2005
TL;DR: In this paper, the authors present a linear variance-components model for expiratory flow measurements, which is based on the Mini Wright measurements, and a three-level logistic random-intercept model.
Abstract: Preface LINEAR VARIANCE-COMPONENTS MODELS Introduction How reliable are expiratory flow measurements? The variance-components model Modeling the Mini Wright measurements Estimation methods Assigning values to the random intercepts Summary and further reading Exercises LINEAR RANDOM-INTERCEPT MODELS Introduction Are tax preparers useful? The longitudinal data structure Panel data and correlated residuals The random-intercept model Different kinds of effects in panel models Endogeneity and between-taxpayer effects Residual diagnostics Summary and further reading Exercises LINEAR RANDOM-COEFFICIENT AND GROWTH-CURVE MODELS Introduction How effective are different schools? Separate linear regressions for each school The random-coefficient model How do children grow? Growth-curve modeling Two-stage model formulation Prediction of trajectories for individual children Complex level-1 variation or heteroskedasticity Summary and further reading Exercises DICHOTOMOUS OR BINARY RESPONSES Models for dichotomous responses Which treatment is best for toenail infection? The longitudinal data structure Population-averaged or marginal probabilities Random-intercept logistic regression Subject-specific vs. population-averaged relationships Maximum likelihood estimation using adaptive quadrature Empirical Bayes (EB) predictions Other approaches to clustered dichotomous data Summary and further reading Exercises ORDINAL RESPONSES Introduction Cumulative models for ordinal responses Are antipsychotic drugs effective for patients with schizophrenia? Longitudinal data structure and graphs A proportional-odds model A random-intercept proportional-odds model A random-coefficient proportional-odds model Marginal and patient-specific probabilities Do experts differ in their grading of student essays? A random-intercept model with grader bias Including grader-specific measurement error variances Including grader-specific thresholds Summary and further reading Exercises COUNTS Introduction Types of counts Poisson model for counts Did the German health-care reform reduce the number of doctor visits? Longitudinal data structure Poisson regression ignoring overdispersion and clustering Poisson regression with overdispersion but ignoring clustering Random-intercept Poisson regression Random-coefficient Poisson regression Other approaches to clustered counts Which Scottish countries have a high risk of lip cancer? Standardized mortality ratios Random-intercept Poisson regression Nonparametric maximum likelihood estimation Summary and further reading Exercises HIGHER LEVEL MODELS AND NESTED RANDOM EFFECTS Introduction Which method is best for measuring expiratory flow? Two-level variance-components models Three-level variance-components models Did the Guatemalan immunization campaign work? A three-level logistic random-intercept model Summary and further reading Exercises CROSSED RANDOM EFFECTS Introduction How does investment depend on expected profit and capital stock? A two-way error-components model How much do primary and secondary schools affect attainment at age 16? An additive crossed random-effects model Including a random interaction A trick requiring fewer random effects Summary and further reading Exercises APPENDIX A: Syntax for gllamm, eq, and gllapred APPENDIX B: Syntax for gllamm APPENDIX C: Syntax for gllapred APPENDIX D: Syntax for gllasim References Author Index Subject Index

4,086 citations


Journal ArticleDOI
TL;DR: Geographically weighted Poisson regression and its semi-parametric variant are described as a new statistical tool for analysing disease maps arising from spatially non-stationary processes and GWPR provides disease analysts with an important new set of statistical tools.
Abstract: This paper describes geographically weighted Poisson regression (GWPR) and its semi-parametric variant as a new statistical tool for analysing disease maps arising from spatially non-stationary processes. The method is a type of conditional kernel regression which uses a spatial weighting function to estimate spatial variations in Poisson regression parameters. It enables us to draw surfaces of local parameter estimates which depict spatial variations in the relationships between disease rates and socio-economic characteristics. The method therefore can be used to test the general assumption made, often without question, in the global modelling of spatial data that the processes being modelled are stationary over space. Equally, it can be used to identify parts of the study region in which 'interesting' relationships might be occurring and where further investigation might be warranted. Such exceptions can easily be missed in traditional global modelling and therefore GWPR provides disease analysts with an important new set of statistical tools. We demonstrate the GWPR approach applied to a data set of working-age deaths in the Tokyo metropolitan area, Japan. The results indicate that there are significant spatial variations (that is, variation beyond that expected from random sampling) in the relationships between working-age mortality and occupational segregation and between working-age mortality and unemployment throughout the Tokyo metropolitan area and that, consequently, the application of traditional 'global' models would yield misleading results.

440 citations


Journal ArticleDOI
01 Dec 2005-Chest
TL;DR: The BODE staging system, which includes in addition to FEV1 other physiologic and clinical variables, helps to better predict hospitalization for COPD.

241 citations


Journal ArticleDOI
TL;DR: Inferences from studies of weather and mortality using the ambidirectional or time-stratified case-crossover ap-proaches and the time-series analyses are comparable and provide consistent findings in this study.
Abstract: Background: Time-series analyses have been used for decades to investigate time-varying environmental exposures. Recently, the case-crossover design has been applied to assess acute effects of air pollution. Our objective was to compare time-series and casecrossover analyses using varying referent periods (ie, unidirectional, ambidirectional, and time-stratified). Methods: We examined the association between temperature and cardiorespiratory mortality among the elderly population in the 20 largest metropolitan areas of the United States. Risks were estimated by season and geographic region in 1992. We obtained weather data from the National Climatic Data Center and mortality data from the Division of Vital Statistics. Conditional logistic regression (casecrossover) and Poisson regression (time-series) were used to estimate the increased risk of cardiorespiratory mortality associated with a 10°F increase in daily temperature, accounting for dew-point temperature and other potential confounding factors. Results: In the time-stratified case-crossover analysis, the strongest associations were found in the summer; in the Southwest, Southeast, Northwest, Northeast, and Midwest, the odds ratios were 1.15 (95% confidence interval 1.07‐1.24), 1.10 (0.96‐1.27), 1.08 (0.92‐ 1.26), 1.08 (1.02‐1.15), and 1.01 (0.92‐1.11), respectively. Mostly null or negative associations were found in the winter, spring, and fall. The ambidirectional case-crossover and the time-series analyses produced quantitatively similar results to those from the timestratified analysis. The unidirectional analysis produced conflicting results. Conclusions: Inferences from studies of weather and mortality using the ambidirectional or time-stratified case-crossover approaches and the time-series analyses are comparable and provide consistent findings in this study. (Epidemiology 2005;16: 58‐66)

178 citations


Journal ArticleDOI
TL;DR: In this article, a Poisson log-bilinear projection model is applied to the forecasting of the gender- and age-specific mortality rates for Belgium on the basis of mortality statistics relating to the period 1950-2000.
Abstract: This paper proposes bootstrap procedures for expected ramining lifetimes and life annuity single premiums in a dynamic mortality environment. Assuming a further continuation of the stable pace of mortality decline, a Poisson log-bilinear projection model is applied to the forecasting of the gender- and age-specific mortality rates for Belgium on the basis of mortality statistics relating to the period 1950-2000. Bootstrap procedures are then used to obtain confidence intervals on various actuarial quantities.

166 citations


Journal ArticleDOI
TL;DR: In this paper, several parametric zero-inflated count distributions, including the ZIP, ZINB, ZIGP and ZIDP, were presented to accommodate the excess zeros for insurance claim count data.
Abstract: In some occasions, claim frequency data in general insurance may not follow the traditional Poisson distribution and in particular they are zero-inflated. Extra dispersion appears as the number of observed zeros exceeding the number of expected zeros under the Poisson or even the negative binomial distribution assumptions. This paper presents several parametric zero-inflated count distributions, including the ZIP, ZINB, ZIGP and ZIDP, to accommodate the excess zeros for insurance claim count data. Different count distributions in the second component are considered to allow flexibility to control the distribution shape. The generalized Pearson χ2 statistic, Akaike's information criteria (AIC) and Bayesian information criteria (BIC) are used as goodness-of-fit and model selection measures. With the presence of extra zeros in a data set of automobile insurance claims, our result shows that the application of zero-inflated count data models and in particular the zero-inflated double Poisson regression model, provide a good fit to the data.

160 citations


Journal ArticleDOI
TL;DR: The purpose of this article is to compare and contrast the use of these three methods for the analysis of infrequently occurring count data, and the strengths, limitations, and special considerations of each approach are discussed.
Abstract: Nurses and other health researchers are often concerned with infrequently occurring, repeatable, health-related events such as number of hospitalizations, pregnancies, or visits to a health care provider. Reports on the occurrence of such discrete events take the form of non-negative integer or count data. Because the counts of infrequently occurring events tend to be non-normally distributed and highly positively skewed, the use of ordinary least squares (OLS) regression with non-transformed data has several shortcomings. Techniques such as Poisson regression and negative binomial regression may provide more appropriate alternatives for analyzing these data. The purpose of this article is to compare and contrast the use of these three methods for the analysis of infrequently occurring count data. The strengths, limitations, and special considerations of each approach are discussed. Data from the National Longitudinal Survey of Adolescent Health (AddHealth) are used for illustrative purposes.

156 citations


Journal ArticleDOI
Joel Schwartz1
TL;DR: The association between ozone and mortality risk is unlikely to be caused by confounding by temperature, and the case-crossover approach converts this problem into a case-control study, where the control for each person is the same person on a day near in time, when he or she did not die.
Abstract: Rationale: Air pollution has been associated with changes in daily mortality. Objectives: Generally, studies use Poisson regression, with complicated modeling strategies, to control for season and weather, raising concerns that the results may be sensitive to these modeling protocols. For studies of ozone, weather control is a particular problem because high ozone days are generally quite hot. Methods: The case-crossover approach converts this problem into a case-control study, where the control for each person is the same person on a day near in time, when he or she did not die. This method controls for season and individual risk factors by matching. One can also choose the control day to have the same temperature as the event day. Measurements: I have applied this approach to a study of more than 1 million deaths in 14 U.S. cities. Main results: I found that, with matching on temperature, a 10-ppb increase in maximum hourly ozone concentrations was associated with a 0.23% (95% confidence interval [CI] 0...

133 citations


Journal ArticleDOI
TL;DR: Coccidioidomycosis in Arizona has increased, driven by seasonal outbreaks associated with environmental and climatic changes, and this study may allow public-health officials to predict seasonal outbreaks in Arizona and to alert the public and physicians early, so that appropriate preventive measures can be implemented.
Abstract: Background. Reports of coccidioidomycosis cases in Arizona have increased substantially. We investigated factors associated with the increase. Methods. We analyzed the National Electronic Telecommunications System for Surveillance (NETSS) data from 1998 to 2001 and used Geographic Information Systems (GIS) to map high-incidence areas in Maricopa County. Poisson regression analysis was performed to assess the effect of climatic and environmental factors on the number of monthly cases; a model was developed and tested to predict outbreaks. Results. The overall incidence in 2001 was 43 cases/100,000 population, a significant (P<.01, test for trend) increase from 1998 (33 cases/100,000 population); the highest age-specific rate was in persons ≥65 years old (79 cases/100,000 population in 2001). Analysis of NETSS data by season indicated high-incidence periods during the winter (November-February). GIS analysis showed that the highest-incidence areas were in the periphery of Phoenix. Multivariable Poisson regression modeling revealed that a combination of certain climatic and environmental factors were highly correlated with seasonal outbreaks (R 2 = 0.75). Conclusions. Coccidioidomycosis in Arizona has increased. Its incidence is driven by seasonal outbreaks associated with environmental and climatic changes. Our study may allow public-health officials to predict seasonal outbreaks in Arizona and to alert the public and physicians early, so that appropriate preventive measures can be implemented.

130 citations


Journal ArticleDOI
TL;DR: In this article, an R package called bivpois is presented for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models, and an Expectation-Maximization (EM) algorithm is implemented.
Abstract: In this paper we present an R package called bivpois for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation-Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance and implementation on simulated and real data sets using bivpois package is provided.

99 citations


Journal ArticleDOI
TL;DR: A quasi-likelihood method of moments technique in which the Bernoulli outcome is Poisson, with the mean (success probability) following a log-linear model, is proposed, which uses the Poisson maximum likelihood equations to estimate the regression coefficients without constraints.
Abstract: SUMMARY Fo rap rospective randomized clinical trial with two groups, the relative risk can be used as a measure of treatment effect and is directly interpretable as the ratio of success probabilities in the new treatment group versus the placebo group. For a prospective study with many covariates and a binary outcome (success or failure), relative risk regression may be of interest. If we model the log of the success probability as a linear function of covariates, the regression coefficients are log-relative risks. However, using such a log–linear model with a Bernoulli likelihood can lead to convergence problems in the Newton–Raphson algorithm. This is likely to occur when the success probabilities are close to one. A constrained likelihood method proposed by Wacholder (1986, American Journal of Epidemiology123, 174–184), also has convergence problems. We propose a quasi-likelihood method of moments technique in which we naively assume the Bernoulli outcome is Poisson, with the mean (success probability) following a log–linear model. We use the Poisson maximum likelihood equations to estimate the regression coefficients without constraints. Using method of moment ideas, one can show that the estimates using the Poisson likelihood will be consistent and asymptotically normal. We apply these methods to a double-blinded randomized trial in primary biliary cirrhosis of the liver (Markus et al., 1989, New England Journal of Medicine320, 1709– 1713).

Journal ArticleDOI
TL;DR: In this article, a zero adjusted generalized Poisson distribution is studied and a score test is developed, with and without covariates, to determine whether such an adjustment is necessary or not.
Abstract: In certain applications involving count data, it is sometimes found that zeros are observed with a frequency significantly higher (lower) than predicted by the assumed model. Examples of such applications are cited in the literature from engineering, manufacturing, economics, public health, epidemiology, psychology, sociology, political science, agriculture, road safety, species abundance, use of recreational facilities, horticulture and criminology. In this article, a zero adjusted generalized Poisson distribution is studied and a score test is developed, with and without covariates, to determine whether such an adjustment is necessary. Examples, with and without covariates, are provided to illustrate the results.

Journal ArticleDOI
TL;DR: Poisson regression analysis of ungrouped person-time data is a useful tool that can avoid bias associated with categorising exposure data and assigning exposure scores, and facilitate direct assessment of the consequences of exposure categorisation and score assignment on regression results.
Abstract: Background: Poisson regression is routinely used for analysis of epidemiological data from studies of large occupational cohorts. It is typically implemented as a grouped method of data analysis in which all exposure and covariate information is categorised and person-time and events are tabulated. Aims: To describe an alternative approach to Poisson regression analysis using single units of person-time without grouping. Methods: Data for simulated and empirical cohorts were analysed by Poisson regression. In analyses of simulated data, effect estimates derived via Poisson regression without grouping were compared to those obtained under proportional hazards regression. Analyses of empirical data for a cohort of 138 900 electrical workers were used to illustrate how the ungrouped approach may be applied in analyses of actual occupational cohorts. Results: Using simulated data, Poisson regression analyses of ungrouped person-time data yield results equivalent to those obtained via proportional hazards regression: the results of both methods gave unbiased estimates of the "true" association specified for the simulation. Analyses of empirical data confirm that grouped and ungrouped analyses provide identical results when the same models are specified. However, bias may arise when exposure-response trends are estimated via Poisson regression analyses in which exposure scores, such as category means or midpoints, are assigned to grouped data. Conclusions: Poisson regression analysis of ungrouped person-time data is a useful tool that can avoid bias associated with categorising exposure data and assigning exposure scores, and facilitate direct assessment of the consequences of exposure categorisation and score assignment on regression results.

Journal ArticleDOI
TL;DR: In this paper, a negative binomial model and modified count data models are established to consider overdispersion and heterogeneity to improve the reliability of the Poisson model with an assumption of equidispersion.
Abstract: Count data models are established to overcome the shortcoming of linear regression model used for trip generation in conventional four step travel demand forecasting. It should be checked if there are overdispersion and excess zero responses in count data to forecast the generation of trips. The forecasted values should also be non-negative ones. The study applies to nonhome based trips at household level to perform efficient analysis on count data. The Poisson model with an assumption of equidispersion has frequently been used to analyze count data. However, if the variance of data is greater than the mean, the Poisson model tends to underestimate errors, resulting in problem in reliability. Excess zeros in data result in heterogeneity leading to biased coefficient estimates for the models. The negative binomial model and the modified count data models are established to consider overdispersion and heterogeneity to improve the reliability. The optimal model is chosen through Vuong test. Model reliability is also checked by likelihood test and accuracy of estimated value of model by Theil inequality coefficient. Finally, sensitivity analysis is performed to know the change of nonhome based trips depending on the change in socio-economic characteristics.

Journal ArticleDOI
TL;DR: In this paper, a doubly periodic Poisson model with short and long-term trends is studied and the likelihood function and the maximum likelihood estimates of the model parameters are derived.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the estimation of an unknown population size n by using the counting distribution for estimating p 0 to solve the problem when repeated counts of identifying the same case are available, and show that the Horvitz-Thompson estimator of n is always at least as large as the estimator by using a homogeneous Poisson model.
Abstract: Summary. The paper discusses the estimation of an unknown population size n. Suppose that an identification mechanism can identify nobs cases. The Horvitz–Thompson estimator of n adjusts this number by the inverse of 1−p0, where the latter is the probability of not identifying a case. When repeated counts of identifying the same case are available, we can use the counting distribution for estimating p0 to solve the problem. Frequently, the Poisson distribution is used and, more recently, mixtures of Poisson distributions. Maximum likelihood estimation is discussed by means of the EM algorithm. For truncated Poisson mixtures, a nested EM algorithm is suggested and illustrated for several application cases. The algorithmic principles are used to show an inequality, stating that the Horvitz–Thompson estimator of n by using the mixed Poisson model is always at least as large as the estimator by using a homogeneous Poisson model. In turn, if the homogeneous Poisson model is misspecified it will, potentially strongly, underestimate the true population size. Examples from various areas illustrate this finding.

Journal ArticleDOI
TL;DR: This work proposes to extend the classical proportional hazard model by allowing the regression coefficients to vary in a smooth way with time, and shows how the regression parameters and the penalty weights can be estimated efficiently using Bayesian inference tools based on the Metropolis-adjusted Langevin algorithm.
Abstract: One can fruitfully approach survival problems without covariates in an actuarial way. In narrow time bins, the number of people at risk is counted together with the number of events. The relationship between time and probability of an event can then be estimated with a parametric or semi-parametric model. The number of events observed in each bin is described using a Poisson distribution with the log mean specified using a flexible penalized B-splines model with a large number of equidistant knots. Regression on pertinent covariates can easily be performed using the same log-linear model, leading to the classical proportional hazard model. We propose to extend that model by allowing the regression coefficients to vary in a smooth way with time. Penalized B-splines models will be proposed for each of these coefficients. We show how the regression parameters and the penalty weights can be estimated efficiently using Bayesian inference tools based on the Metropolis-adjusted Langevin algorithm.

Journal ArticleDOI
TL;DR: The density of maternal-fetal medicine specialists is significantly and inversely associated with maternal mortality ratios, even after controlling for state-level measures of maternal poverty, education, race, age, and their significant interactions.

Journal ArticleDOI
TL;DR: In this article, the authors deal with various mixed Poisson distributions in order to analyze count data characterized by their long tails and over dispersion when the Poisson distribution and negative binomial distribution are found to be inadequate.
Abstract: This article deals with various mixed Poisson distributions in order to analyze count data characterized by their long tails and over dispersion when the Poisson distribution and negative binomial distribution are found to be inadequate. Several mixed Poisson distributions are presented and their structural properties are investigated. Three well-known data sets, having long tails, are analyzed and the results of fitting by various models are provided.

Journal ArticleDOI
TL;DR: This paper presents a statistical framework based on the modified loss causation model (MLCM), and proposes that the PDF can be represented by the Poisson distribution, which had been used in various industries to model random failures or incidents.
Abstract: Construction incidents are essentially random events because they have a probabilistic component that causes their occurrence to be indeterministic. Thus, as with most random events, one of the best ways to understand and analyze construction incidents is to apply statistical methods and tools. Consequently, this paper presents a statistical framework based on the modified loss causation model (MLCM). Even though the MLCM has been used for the framework, the approach can be readily adapted for other incident causation models. The MLCM is separated into two basic components: random and systematic. The random component is represented by a probability density function (PDF), which has parameters influenced by the systematic component of the MLCM, while the systematic component is represented by the situational variables and quality of the safety management system. In particular, this paper proposes that the PDF can be represented by the Poisson distribution. Besides being a convenient and simple distribution that can be easily used in applications, the Poisson distribution had been used in various industries to model random failures or incidents. The differences in contexts and the undesirable effects of adopting an unrepresentative distribution will require formal analysis to determine the suitability of the Poisson distribution in modeling the random component of construction incident occurrence. Incident records for 14 major projects were used in the analysis. Hypothesis testing using the chi-square goodness-of-fit and dispersion tests shows that the incident occurrences can be modeled as a Poisson process characterized by some mean arrival rate. The paper also presents some applications of the proposed Poisson model to improve construction safety management, focusing on two specific concepts: the Bayesian approach and the partitioned Poisson.

Journal ArticleDOI
TL;DR: Routine data recorded in the Hospital Information System on all admissions to the Regional Public Hospital of Betim, Minas Gerais State, Brazil, from July 1996 to June 2000 were analyzed to compare different modeling strategies to identify individual and admissions characteristics associated with readmission to a general hospital.
Abstract: The objective of this study was to compare different modeling strategies to identify individual and admissions characteristics associated with readmission to a general hospital. Routine data recorded in the Hospital Information System on all admissions to the Regional Public Hospital of Betim, Minas Gerais State, Brazil, from July 1996 to June 2000 were analyzed. Cox proportional hazards model and variants designed to deal with multiple-events data, like Andersen-Gill (AG), Prentice, Williams and Peterson (PWP), and random effects models were fitted to time between hospital admissions or censoring. For comparison purposes, a Poisson model was fitted to the total number of readmissions, using the same covariates. We analyzed 31,648 admissions of 26,198 patients, including 17,096 adults and 9,102 children. Estimates for the PWP and frailty models were very similar, and both approaches should be fitted and compared. If clinical characteristics are available, the PWP model should be used. Otherwise the random effects model can account for unmeasured differences, particularly some related to severity of the disease. These methodologies can help focus on various related readmission aspects such as diagnostic groups or medical specialties.

Journal ArticleDOI
TL;DR: A rich family of generalized Poisson regression (GPR) models is reviewed in detail, which has a wide range of applications in various disciplines including agriculture, econometrics, patent applications, species abundance, medicine, and use of recreational facilities.

Journal ArticleDOI
TL;DR: Power analysis was used to determine sample size adequacy when varying the number of visits, count stations, and years for examining trends in abundance, and suggest potentially useful focal species for monitoring, such as keystone species like the Acorn Woodpecker.
Abstract: We used data from two oak-woodland sites in California to develop guidelines for the design of bird monitoring programs using point counts. We used power analysis to determine sample size adequacy when varying the number of visits, count stations, and years for examining trends in abundance. We assumed an overdispersed Poisson distribution for count data, with overdispersion attributed to observer variability, and used Poisson regression for analysis of population trends. Overdispersion had a large, negative effect on power. The number of sampling years also had an especially large effect on power. In all cases, 10 years of sampling were insufficient to detect a decline in abundance of 30% over 10 years. Increasing the sampling period to 20 years provided adequate power for 56% of breeding species at one site. The number of count stations needed for detecting trends for a given species depended primarily on observer variability. If observer variability was high, increasing the number of years and...

Journal ArticleDOI
TL;DR: The results provide information about individual and contextual social inequalities in injury morbidity, the highest risks of injury occur in individuals of lower educational level and who reside in the more private neighbourhoods.

Journal ArticleDOI
01 Jun 2005-Metrika
TL;DR: In this paper, a closed form expression for the median of the Poisson distribution was provided, and the Central Limit Theorem was used to improve the known estimates of the difference between the median and the mean of the distribution.
Abstract: The purpose of this paper is twofold: first, to provide a closed form expression for the median of the Poisson distribution and, second, to improve the known estimates of the difference between the median and the mean of the Poisson distribution. We use elementary techniques based on the monotonicity of certain sequences involving tail probabilities of the Poisson distribution and the Central Limit Theorem

Journal ArticleDOI
TL;DR: The compound Poisson distribution is used as the random factor in the frailty model, which allows some individuals to be non-susceptible, which can be useful in many settings.
Abstract: Frailty models are often used to model heterogeneity in survival analysis. The most common frailty model has an individual intensity which is a product of a random factor and a basic intensity common to all individuals. This paper uses the compound Poisson distribution as the random factor. It allows some individuals to be non-susceptible, which can be useful in many settings. In some diseases, one may suppose that a number of families have an increased susceptibility due to genetic circumstances. Then, it is logical to use a frailty model where the individuals within each family have some shared factor, while individuals between families have different factors. This can be attained by randomizing the Poisson parameter in the compound Poisson distribution. To our knowledge, this is a new distribution. The power variance function distributions are used for the Poisson parameter. The subsequent appearing distributions are studied in some detail, both regarding appearance and various statistical properties. An application to infant mortality data from the Medical Birth Registry of Norway is included, where the model is compared to more traditional shared frailty models.

Journal ArticleDOI
TL;DR: A comprehensive empirical comparative study presented in this paper showed that the zero-inflated Poisson (ZIP) model yielded better predictions than the PRM and also demonstrated better robustness in prediction accuracy across the 50 data splits.
Abstract: Predicting the software quality prior to system tests and operations has proven to be useful for achieving effective reliability improvements. Poisson (pure) regression modelling is the most commonly used count modelling technique for predicting the expected number of faults in software modules. It is best suited to when the distribution of the fault data (dependent variable) is not biased, that is equidispersed fault data, whose mean equals the variance. However, in software fault data we often observe a large portion of zeros (no faults), especially in high-assurance systems. In such cases a pure Poisson regression model (PRM) may yield inaccurate fault predictions. A zero-inflated Poisson (ZIP) model changes the mean structure of a PRM, resulting in improved predictive quality. To illustrate the same, we examined software data collected from a full-scale industrial software system. Fault prediction models were calibrated using both pure Poisson and ZIP regression techniques. To prevent claims based on ...

Journal ArticleDOI
TL;DR: The objective of this study is to challenge the interpretability of the corresponding Poisson pseudo R-squared measure whenever the approximate Poisson outcome is generated by counting the number of events within covariate patterns formed by cross-tabulating categorical covariates.
Abstract: Many epidemiological research problems deal with large numbers of exposed subjects of whom only a small number actually suffers the adverse event of interest Such rare events data can be analysed by employing an approximate Poisson model The objective of this study is to challenge the interpretability of the corresponding Poisson pseudo R-squared measure It will lack sensible interpretation whenever the approximate Poisson outcome is generated by counting the number of events within covariate patterns formed by cross-tabulating categorical covariates The failure is caused by the immanent arbitrariness in the definition of the covariate patterns, that is, independent Bernoulli events, B(1,pi), are arbitrarily combined into binomially distributed ones, B(n,pi), which are then approximated by the Poisson model

Journal ArticleDOI
TL;DR: In this paper, the authors proposed the use of the cluster distribution, derived from a negative binomial probability model, to estimate the probability of high-order events in terms of number of lines outaged within a short time, useful in longterm planning and also in short-term operational defense to such events.
Abstract: We propose the use of the cluster distribution, derived from a negative binomial probability model, to estimate the probability of high-order events in terms of number of lines outaged within a short time, useful in long-term planning and also in short-term operational defense to such events. We use this model to fit statistical data gathered for a 30-year period for North America. The model is compared to the commonly used Poisson model and the power-law model. Results indicate that the Poisson model underestimates the probability of higher-order events, whereas the power-law model overestimates it. We use the strict chi-square fitness test to compare the fitness of these three models and find that the cluster model is superior to the other two models for the data used in the study.

Journal ArticleDOI
TL;DR: It is shown that the bootstrap method keeps the significance level close to the nominal one and has greater power uniformly than the existing normal approximation for testing the hypothesis.
Abstract: SummaryRidout, Hinde, and Demetrio (2001, Biometrics57, 219–223) derived a score test for testing a zero-inflated Poisson (ZIP) regression model against zero-inflated negative binomial (ZINB) alternatives. They mentioned that the score test using the normal approximation might underestimate the nominal significance level possibly for small sample cases. To remedy this problem, a parametric bootstrap method is proposed. It is shown that the bootstrap method keeps the significance level close to the nominal one and has greater power uniformly than the existing normal approximation for testing the hypothesis.