scispace - formally typeset
Search or ask a question
Author

Taeho Kim

Bio: Taeho Kim is an academic researcher from University of Haifa. The author has contributed to research in topics: Poisson distribution & Mathematics. The author has an hindex of 2, co-authored 5 publications receiving 11 citations.

Papers
More filters
Posted Content
TL;DR: In this paper, the authors examined the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson Regression model for predicting daily and cumulative deaths in the United States due to the SARS-CoV-2 virus.
Abstract: Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions are developed and their performance are compared through simulation studies. The methods are applied to the problem of forecasting daily and cumulative deaths in the United States (US) due to COVID-19. To examine their performance relative to what actually happened, daily deaths data until May 15th were used to forecast cumulative deaths by June 1st. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. An over-dispersed Poisson regression model is therefore proposed. This new model builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model and obtains as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by July 16th, given the data until July 2nd, is presented. Finally, the paper discusses limitations of proposed procedures and mentions open research problems, as well as the dangers and pitfalls when forecasting on a long horizon, with focus on this pandemic where events, both foreseen and unforeseen, could have huge impacts on point predictions and prediction regions.

9 citations

Journal ArticleDOI
TL;DR: In this paper , the least squares predictor is an element of this class obtained by equating the expectations of Y and to be equal and the variances of and to also be also equal.
Abstract: Abstract This note examines, at the population-level, the approach of obtaining predictors of a random variable Y, given the joint distribution of , by maximizing the mapping for a given correlation function . Commencing with Pearson’s correlation function, the class of such predictors is uncountably infinite. The least-squares predictor is an element of this class obtained by equating the expectations of Y and to be equal and the variances of and to be also equal. On the other hand, replacing the second condition by the equality of the variances of Y and , a natural requirement for some calibration problems, the unique predictor that is obtained has the maximum value of Lin’s (1989) concordance correlation coefficient (CCC) with Y among all predictors. Since the CCC measures the degree of agreement, the new predictor is called the maximal agreement predictor. These predictors are illustrated for three special distributions: the multivariate normal distribution; the exponential distribution, conditional on covariates; and the Dirichlet distribution. The exponential distribution is relevant in survival analysis or in reliability settings, while the Dirichlet distribution is relevant for compositional data.

5 citations

Journal ArticleDOI
TL;DR: In this article, a bottom-to-top approach for constructing confidence region (CR) under the nonparametric measurement error model (NMEM) is proposed, where the objective is to construct a confidence region for a continuous distribution.
Abstract: The nonparametric measurement error model (NMEM) postulates that $X_{i}=\Delta +\epsilon _{i},i=1,2,\ldots ,n;\Delta \in \Re $ with $\epsilon _{i},i=1,2,\ldots ,n$, IID from $F(\cdot )\in\mathfrak{F}_{c,0}$, where $\mathfrak{F}_{c,0}$ is the class of all continuous distributions with median $0$, so $\Delta $ is the median parameter of $X$. This paper deals with the problem of constructing a confidence region (CR) for $\Delta $ under the NMEM. Aside from the NMEM, the problem setting also arises in a variety of situations, including inference about the median lifetime of a complex system arising in engineering, reliability, biomedical, and public health settings, as well as in the economic arena such as when dealing with household income. Current methods of constructing CRs for $\Delta $ are discussed, including the $T$-statistic based CR and the Wilcoxon signed-rank statistic based CR, arguably the two default methods in applied work when a confidence interval about the center of a distribution is desired. A ‘bottom-to-top’ approach for constructing CRs is implemented, which starts by imposing reasonable invariance or equivariance conditions on the desired CRs, and then optimizing with respect to their mean contents on subclasses of $\mathfrak{F}_{c,0}$. This contrasts with the usual approach of using a pivotal quantity constructed from test statistics and/or estimators and then ‘pivoting’ to obtain the CR. Applications to a real car mileage data set and to Proschan’s famous air-conditioning data set are illustrated. Simulation studies to compare performances of the different CR methods were performed. Results of these studies indicate that the sign-statistic based CR and the optimal CR focused on symmetric distributions satisfy the confidence level requirement, though they tended to have higher contents; while three of the bootstrap-based CR procedures and one of the newly-developed adaptive CR tended to be a tad more liberal, but with smaller contents. A critical recommendation for practitioners is that, under the NMEM, the $T$-statistic based and Wilcoxon signed-rank statistic based CRs should not be used since they either have very degraded coverage probabilities or inflated contents under some of the allowable error distributions under the NMEM.

3 citations

Journal ArticleDOI
TL;DR: This paper provides a review of the literature regarding methods for constructing prediction intervals for counting variables, with particular focus on those whose distributions are Poisson or derived from Poisson and with an over‐dispersion property.
Abstract: This paper provides a review of the literature regarding methods for constructing prediction intervals for counting variables, with particular focus on those whose distributions are Poisson or derived from Poisson and with an over‐dispersion property. Independent and identically distributed models and regression models are both considered. The motivating problem for this review is that of predicting the number of daily and cumulative cases or deaths attributable to COVID‐19 at a future date.This article is categorized under: Applications of Computational Statistics > Clinical Trials Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods Statistical Models > Generalized Linear Models [ABSTRACT FROM AUTHOR] Copyright of WIREs: Computational Statistics is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

2 citations

DOI
01 Jan 2021
TL;DR: In this article, the authors examined the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson Regression model for predicting daily and cumulative deaths in the United States due to the SARS-CoV-2 virus.
Abstract: Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions are developed and their performance are compared through simulation studies. The methods are applied to the problem of forecasting daily and cumulative deaths in the United States (US) due to COVID-19. To examine their performance relative to what actually happened, daily deaths data until May 15th were used to forecast cumulative deaths by June 1st. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. An over-dispersed Poisson regression model is therefore proposed. This new model builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model and obtains as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by July 16th, given the data until July 2nd, is presented. Finally, the paper discusses limitations of proposed procedures and mentions open research problems, as well as the dangers and pitfalls when forecasting on a long horizon, with focus on this pandemic where events, both foreseen and unforeseen, could have huge impacts on point predictions and prediction regions.

1 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The book provides a brief introduction to SAS, SPSS, and BMDP, along with their use in performing ANOVA, and is indeed an excellent source of reference for the ANOVA based on Ž xed, random, and mixed-effects models.
Abstract: the book provides a brief introduction to SAS, SPSS, and BMDP, along with their use in performing ANOVA. The book also has a chapter devoted to experimental designs and the corresponding ANOVA. In terms of coverage, a nice feature of the book is the inclusion of a chapter on Ž nite population models—typically not found in books on experimental designs and ANOVA. Several appendixes are given at the end of the book discussing some of the standard distributions, the Satterthwaite approximation, rules for computing the sums of squares, degrees of freedom, expected mean squares, and so forth. The exercises at the end of each chapter contain a number of numerical problems. Some of my quibbles about the book are the following. At times, it simply gives expressions without adequate motivation or examples. A reader who is not already familiar with ANOVA techniques will wonder as to the relevance of some of the expressions. Just to give an example, the quantity “sum of squares due to a contrast” is deŽ ned on page 65. The algebraic property that the sums of squares due to a set of a ƒ 1 orthogonal contrasts will add up to the sum of squares due to an effect having a ƒ 1 df is then stated. Given the level of the book, discussion of such a property appears to be irrelevant. I did not see this property used anywhere in the book; neither did I see the sum of squares due to a contrast explicitly used or mentioned later in the book. Examples in which the one-way model is adequate are mentioned only after introducing the model and the assumptions, and the examples are buried inside the remarks (in small print) following the model. This is also the case with the two-way model with interaction (Chap. 4). The authors indicate in the preface that the remarks are mostly meant to include results to be kept out of the main body of the text. I believe that good examples should be the starting point for introducing ANOVA models. The authors present the analysis of Ž xed, random, and mixed models simultaneously. Motivating examples that distinguish between these scenarios should have been made the highlight of the presentation in each chapter rather than deferred to the later part of the chapter under “worked out examples” or buried within the remarks. The authors discuss transformations to correct lack of normality and lack of homoscedasticity (Sec. 2.22). However, these are not illustrated with any real examples. Regarding tests concerning the departure from the model assumptions, formal tests are presented in some detail; however, graphical procedures are only very brie y mentioned under a remark. I consider this to be a glaring omission. Consequently, I would be somewhat hesitant to recommend this book to anyone interested in actual data analysis using ANOVA unless the application is such that one of the standard models (along with the standard assumptions) is known to be adequate and diagnostic checks are not called for. Obviously, this is an unlikely scenario in most applications. The preceding criticisms aside, I can see myself consulting this book to refer to an ANOVA table, to look up an expected value or test statistic under a random or mixed-effects model, or to refer to the use of SAS, SPSS, or BMDP for performing ANOVA. The book is indeed an excellent source of reference for the ANOVA based on Ž xed, random, and mixed-effects models.

248 citations

Proceedings Article
01 Jan 1976
TL;DR: Hirsch and Mathews as mentioned in this paper describe the true story of Ulam's pivotal role in the making of the "Super," in their historical introduction to this behind-the-scenes look at the minds and ideas that ushered in the nuclear age.
Abstract: This autobiography of mathematician Stanislaw Ulam, one of the great scientific minds of the twentieth century, tells a story rich with amazingly prophetic speculations and peppered with lively anecdotes. As a member of the Los Alamos National Laboratory from 1944 on, Ulam helped to precipitate some of the most dramatic changes of the postwar world. He was among the first to use and advocate computers for scientific research, originated ideas for the nuclear propulsion of space vehicles, and made fundamental contributions to many of today's most challenging mathematical projects. With his wide-ranging interests, Ulam never emphasized the importance of his contributions to the research that resulted in the hydrogen bomb. Now Daniel Hirsch and William Mathews reveal the true story of Ulam's pivotal role in the making of the 'Super,' in their historical introduction to this behind-the-scenes look at the minds and ideas that ushered in the nuclear age. It includes an epilogue by Francoise Ulam and Jan Mycielski that sheds new light on Ulam's character and mathematical originality.

14 citations

Posted Content
TL;DR: In this paper, the authors examined the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson Regression model for predicting daily and cumulative deaths in the United States due to the SARS-CoV-2 virus.
Abstract: Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions are developed and their performance are compared through simulation studies. The methods are applied to the problem of forecasting daily and cumulative deaths in the United States (US) due to COVID-19. To examine their performance relative to what actually happened, daily deaths data until May 15th were used to forecast cumulative deaths by June 1st. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. An over-dispersed Poisson regression model is therefore proposed. This new model builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model and obtains as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by July 16th, given the data until July 2nd, is presented. Finally, the paper discusses limitations of proposed procedures and mentions open research problems, as well as the dangers and pitfalls when forecasting on a long horizon, with focus on this pandemic where events, both foreseen and unforeseen, could have huge impacts on point predictions and prediction regions.

9 citations

Graham Wood1
01 Jan 2004
TL;DR: In this article, the authors describe how confidence intervals (for example, for the true accident rate at given flows) and prediction intervals can be produced using spreadsheet technology, which can be used for estimating the number of accidents at a new site with given flows.
Abstract: Generalised linear models, with "log" link and either Poisson or negative binomial errors, are commonly used for relating accident rates to explanatory variables. This paper adds to the toolkit for such models. It describes how confidence intervals (for example, for the true accident rate at given flows) and prediction intervals (for example, for the number of accidents at a new site with given flows) can be produced using spreadsheet technology.

8 citations

Posted Content
TL;DR: In this paper, second order Chebyshev--Edgeworth and Cornish--Fisher expansions based of Student's $t$-and Laplace distributions and their quantiles are derived for sample median with random sample size of a special kind.
Abstract: In practice, we often encounter situations where a sample size is not defined in advance and can be a random value. The randomness of the sample size crucially changes the asymptotic properties of the underlying statistic. In the present paper second order Chebyshev--Edgeworth and Cornish--Fisher expansions based of Student's $t$- and Laplace distributions and their quantiles are derived for sample median with random sample size of a special kind.

7 citations