scispace - formally typeset
Search or ask a question
Posted Content

Prediction Regions for Poisson and Over-Dispersed Poisson Regression Models with Applications to Forecasting Number of Deaths during the COVID-19 Pandemic

TL;DR: In this paper, the authors examined the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson Regression model for predicting daily and cumulative deaths in the United States due to the SARS-CoV-2 virus.
Abstract: Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions are developed and their performance are compared through simulation studies. The methods are applied to the problem of forecasting daily and cumulative deaths in the United States (US) due to COVID-19. To examine their performance relative to what actually happened, daily deaths data until May 15th were used to forecast cumulative deaths by June 1st. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. An over-dispersed Poisson regression model is therefore proposed. This new model builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model and obtains as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by July 16th, given the data until July 2nd, is presented. Finally, the paper discusses limitations of proposed procedures and mentions open research problems, as well as the dangers and pitfalls when forecasting on a long horizon, with focus on this pandemic where events, both foreseen and unforeseen, could have huge impacts on point predictions and prediction regions.
Citations
More filters
Journal ArticleDOI
TL;DR: The book provides a brief introduction to SAS, SPSS, and BMDP, along with their use in performing ANOVA, and is indeed an excellent source of reference for the ANOVA based on Ž xed, random, and mixed-effects models.
Abstract: the book provides a brief introduction to SAS, SPSS, and BMDP, along with their use in performing ANOVA. The book also has a chapter devoted to experimental designs and the corresponding ANOVA. In terms of coverage, a nice feature of the book is the inclusion of a chapter on Ž nite population models—typically not found in books on experimental designs and ANOVA. Several appendixes are given at the end of the book discussing some of the standard distributions, the Satterthwaite approximation, rules for computing the sums of squares, degrees of freedom, expected mean squares, and so forth. The exercises at the end of each chapter contain a number of numerical problems. Some of my quibbles about the book are the following. At times, it simply gives expressions without adequate motivation or examples. A reader who is not already familiar with ANOVA techniques will wonder as to the relevance of some of the expressions. Just to give an example, the quantity “sum of squares due to a contrast” is deŽ ned on page 65. The algebraic property that the sums of squares due to a set of a ƒ 1 orthogonal contrasts will add up to the sum of squares due to an effect having a ƒ 1 df is then stated. Given the level of the book, discussion of such a property appears to be irrelevant. I did not see this property used anywhere in the book; neither did I see the sum of squares due to a contrast explicitly used or mentioned later in the book. Examples in which the one-way model is adequate are mentioned only after introducing the model and the assumptions, and the examples are buried inside the remarks (in small print) following the model. This is also the case with the two-way model with interaction (Chap. 4). The authors indicate in the preface that the remarks are mostly meant to include results to be kept out of the main body of the text. I believe that good examples should be the starting point for introducing ANOVA models. The authors present the analysis of Ž xed, random, and mixed models simultaneously. Motivating examples that distinguish between these scenarios should have been made the highlight of the presentation in each chapter rather than deferred to the later part of the chapter under “worked out examples” or buried within the remarks. The authors discuss transformations to correct lack of normality and lack of homoscedasticity (Sec. 2.22). However, these are not illustrated with any real examples. Regarding tests concerning the departure from the model assumptions, formal tests are presented in some detail; however, graphical procedures are only very brie y mentioned under a remark. I consider this to be a glaring omission. Consequently, I would be somewhat hesitant to recommend this book to anyone interested in actual data analysis using ANOVA unless the application is such that one of the standard models (along with the standard assumptions) is known to be adequate and diagnostic checks are not called for. Obviously, this is an unlikely scenario in most applications. The preceding criticisms aside, I can see myself consulting this book to refer to an ANOVA table, to look up an expected value or test statistic under a random or mixed-effects model, or to refer to the use of SAS, SPSS, or BMDP for performing ANOVA. The book is indeed an excellent source of reference for the ANOVA based on Ž xed, random, and mixed-effects models.

248 citations

Proceedings Article
01 Jan 1976
TL;DR: Hirsch and Mathews as mentioned in this paper describe the true story of Ulam's pivotal role in the making of the "Super," in their historical introduction to this behind-the-scenes look at the minds and ideas that ushered in the nuclear age.
Abstract: This autobiography of mathematician Stanislaw Ulam, one of the great scientific minds of the twentieth century, tells a story rich with amazingly prophetic speculations and peppered with lively anecdotes. As a member of the Los Alamos National Laboratory from 1944 on, Ulam helped to precipitate some of the most dramatic changes of the postwar world. He was among the first to use and advocate computers for scientific research, originated ideas for the nuclear propulsion of space vehicles, and made fundamental contributions to many of today's most challenging mathematical projects. With his wide-ranging interests, Ulam never emphasized the importance of his contributions to the research that resulted in the hydrogen bomb. Now Daniel Hirsch and William Mathews reveal the true story of Ulam's pivotal role in the making of the 'Super,' in their historical introduction to this behind-the-scenes look at the minds and ideas that ushered in the nuclear age. It includes an epilogue by Francoise Ulam and Jan Mycielski that sheds new light on Ulam's character and mathematical originality.

14 citations

Graham Wood1
01 Jan 2004
TL;DR: In this article, the authors describe how confidence intervals (for example, for the true accident rate at given flows) and prediction intervals can be produced using spreadsheet technology, which can be used for estimating the number of accidents at a new site with given flows.
Abstract: Generalised linear models, with "log" link and either Poisson or negative binomial errors, are commonly used for relating accident rates to explanatory variables. This paper adds to the toolkit for such models. It describes how confidence intervals (for example, for the true accident rate at given flows) and prediction intervals (for example, for the number of accidents at a new site with given flows) can be produced using spreadsheet technology.

8 citations

Journal ArticleDOI
17 May 2021-PeerJ
TL;DR: In this paper, the authors adopt a one-parameter model of infections per cluster, dividing any daily count n i into n i / ϕ i''clusters, for 'country' i. They find that most of the daily infection count sequences are inconsistent with a Poissonian model.
Abstract: The noise in daily infection counts of an epidemic should be super-Poissonian due to intrinsic epidemiological and administrative clustering. Here, we use this clustering to classify the official national SARS-CoV-2 daily infection counts and check for infection counts that are unusually anti-clustered. We adopt a one-parameter model of ϕ i ' infections per cluster, dividing any daily count n i into n i / ϕ i ' 'clusters', for 'country' i. We assume that n i / ϕ i ' on a given day j is drawn from a Poisson distribution whose mean is robustly estimated from the four neighbouring days, and calculate the inferred Poisson probability P i j ' of the observation. The P i j ' values should be uniformly distributed. We find the value ϕ i that minimises the Kolmogorov-Smirnov distance from a uniform distribution. We investigate the (ϕ i , N i ) distribution, for total infection count N i . We consider consecutive count sequences above a threshold of 50 daily infections. We find that most of the daily infection count sequences are inconsistent with a Poissonian model. Most are found to be consistent with the ϕ i model. The 28-, 14- and 7-day least noisy sequences for several countries are best modelled as sub-Poissonian, suggesting a distinct epidemiological family. The 28-day least noisy sequence of Algeria has a preferred model that is strongly sub-Poissonian, with ϕ i 28 0.1 . Tajikistan, Turkey, Russia, Belarus, Albania, United Arab Emirates and Nicaragua have preferred models that are also sub-Poissonian, with ϕ i 28 0.5 . A statistically significant (P τ < 0.05) correlation was found between the lack of media freedom in a country, as represented by a high Reporters sans frontieres Press Freedom Index (PFI2020), and the lack of statistical noise in the country's daily counts. The ϕ i model appears to be an effective detector of suspiciously low statistical noise in the national SARS-CoV-2 daily infection counts.

6 citations


Cites background from "Prediction Regions for Poisson and ..."

  • ...Detailed modelling is usually restricted to a small number of countries (e.g. Chowdhury et al. 2020; Kim et al. 2020; Molina-Cuevas 2020; Jiang, Zhao & Shao 2021; Afshordi et al. 2020)....

    [...]

  • ..., 2020) and for COVID-19 death rate counts in the United States (Kim et al., 2020)....

    [...]

Proceedings ArticleDOI
03 Aug 2021
TL;DR: In this paper, the authors employ an autoregression model using Poisson distribution in predicting the COVID-19 future cases, namely the positive and recovery number, and compare the Poisson Autoregression with several well-known forecasting methods, namely ARIMA, Exponential Smoothing, BATS, and Prophet.
Abstract: COVID-19 is currently become a global problem, including in Jakarta, Indonesia. There have been many approaches to predict COVID-19 occurrence, including the forecasting approach. However, the traditional forecasting method, particularly machine learning, often does not consider the condition of the data, although it has forms of the count, such as the number of cases. This study employs an autoregression model using Poisson distribution in predicting the COVID-19 future cases, namely the positive and recovery number. We compare the Poisson Autoregression with several well-known forecasting methods, namely ARIMA, Exponential Smoothing, BATS, and Prophet. This study found that Poisson Autoregression could create an accurate prediction with MAPE below 20% and tend to follows the actual data for the next 8 to 14 days to the future. Thus, this approach can forecast the future cases of COVID-19 and other cases that use count data in Jakarta, like the number of citizen complaints or transportation context.

4 citations

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations


"Prediction Regions for Poisson and ..." refers methods in this paper

  • ...This could be done by using the glm object function in R [20] with the Poisson family and logarithm link....

    [...]

  • ...The resulting prediction regions were Γ̃0 = [21, 42], Γ̌1 = [20, 42], Γ̌2 = [22, 43]....

    [...]

  • ...To compare performance of the prediction regions Γ̃0 (randomized version), Γ̌1, Γ̌2, Γ̌3, Γ̌4, and Γ̌5 withM = 50, S = 100, and for n ∈ {5, 10, 15, 20, 30, 50, 70, 100} and λ ∈ {1, 5, 15, 30, 50, 100, 200}, we performed simulation studies, with program codes in the R [20] environment, to determine the coverage probabilities and the lengths of the regions (recall that length is an equivalent surrogate for the cardinality of the regions since we took the ceiling and the floor of the lower and upper...

    [...]

  • ...This realized value of Y101 was contained in the realized prediction regions Γ̃0 = [21, 42], Γ̌1 = [20, 42], and Γ̌2 = [22, 43]....

    [...]

Journal ArticleDOI
TL;DR: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
Abstract: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The Ž rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The Ž rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the Ž rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speciŽ c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speciŽ c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The Ž rst half of the chapter presents the methodology, and the second half demonstrates its application through Ž ve different examples. The basis for the general situation is Ž rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coefŽ cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

10,520 citations

Book
16 Jul 1993
TL;DR: Statistical Models Based on Counting Processes (SBP) as discussed by the authors is a monograph for mathematical statisticians and biostatisticians, although almost all methods are given in sufficient detail to be used in practice by other mathematically oriented researchers studying event histories.
Abstract: Modern survival analysis and more general event history analysis may be effectively handled in the mathematical framework of counting processes, stochastic integration, martingale central limit theory and product integration. This book presents this theory, which has been the subject of an intense research activity during the past one-and-a-half decades. The exposition of the theory is integrated with the careful presentation of many practical examples, based almost exlusively on the authors' experience, with detailed numerical and graphical illustrations. "Statistical Models Based on Counting Processes" may be viewed as a research monograph for mathematical statisticians and biostatisticians, although almost all methods are given in sufficient detail to be used in practice by other mathematically oriented researchers studying event histories (demographers, econometricians, epidemiologists, actuariala mathematicians, reliability engineers, biologists). Much of the material has so far only been available in the journal literature (if at all), and a wide variety of researchers will find this an invlauable survey of the subject.

3,012 citations

Journal ArticleDOI
TL;DR: "Statistical Models Based on Counting Processes" may be viewed as a research monograph for mathematical statisticians and biostatisticians, although almost all methods are given in sufficient detail to be used in practice by other mathematically oriented researchers studying event histories.
Abstract: Modern survival analysis and more general event history analysis may be effectively handled in the mathematical framework of counting processes, stochastic integration, martingale central limit theory and product integration. This book presents this theory, which has been the subject of an intense research activity during the past one-and-a-half decades. The exposition of the theory is integrated with the careful presentation of many practical examples, based almost exlusively on the authors' experience, with detailed numerical and graphical illustrations. \"Statistical Models Based on Counting Processes\" may be viewed as a research monograph for mathematical statisticians and biostatisticians, although almost all methods are given in sufficient detail to be used in practice by other mathematically oriented researchers studying event histories (demographers, econometricians, epidemiologists, actuariala mathematicians, reliability engineers, biologists). Much of the material has so far only been available in the journal literature (if at all), and a wide variety of researchers will find this an invlauable survey of the subject.

2,852 citations


"Prediction Regions for Poisson and ..." refers methods in this paper

  • ...We mention that our Occam’s Razor-type solution is motivated by frailty modeling in Survival Analysis (see, for instance, [5])....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the development of statistical methods at Rothamsted Experimental Station by Sir Ronald Fisher is used to illustrate these themes, and the authors discuss the importance of flexibility to profit from such confrontations and to devise parsimonious but effective models.
Abstract: Aspects of scientific method are discussed: In particular, its representation as a motivated iteration in which, in succession, practice confronts theory, and theory, practice. Rapid progress requires sufficient flexibility to profit from such confrontations, and the ability to devise parsimonious but effective models, to worry selectively about model inadequacies and to employ mathematics skillfully but appropriately. The development of statistical methods at Rothamsted Experimental Station by Sir Ronald Fisher is used to illustrate these themes.

1,726 citations


"Prediction Regions for Poisson and ..." refers background in this paper

  • ...Box [7] that all models are wrong, but some are useful – there are other factors, some beyond our control, that could impact the number of reported deaths at a future time, such as premature easing of social distancing and re-opening of business establishments, virus mutations, better diagnostic tools, changing hotspots, overburdened health care facilities, introduction of effective treatments, beneficial or detrimental actions by local, state, and/or federal entities, changing definition deaths...

    [...]