scispace - formally typeset
Search or ask a question

Showing papers on "Poisson distribution published in 2012"


Book
26 Nov 2012
TL;DR: In this paper, the authors presented a graphical representation of the spatial autocorrelation between two points on a map and analyzed the correlation with the Rook's definition of connectivity.
Abstract: 1 Introduction.- 1.1 Scientific Visualization.- 1.2 What Is Spatial Autocorrelation?.- 1.3 Selected Visualization Tools: An Overview.- 1.3.1 Graphical Portrayals of Spatial Autocorrelation.- 1.4 The Sample Georeferenced Datasets.- 1.4.1 Selected Interval/Ratio Datasets.- 1.4.2 Selected Counts Datasets.- 1.4.3 Selected Binomial Datasets.- 2 Salient Properties of Geographic Connectivity Underlying Spatial Autocorrelation.- 2.1 Eigenfunctions Associated with Geographic Connectivity Matrices.- 2.1.1 Eigenvalue Decompositions.- 2.1.2 Eigenvectors Associated with Geographic Connectivity Matrices.- 2.1.3 The Maximum MC Value (MCmax).- 2.1.4 Moments of Eigenvalue Distributions.- 2.2 Generalized Eigenvalue Frequency Distributions.- 2.2.1 The Extreme Eigenvalues of Matrices C and W.- 2.2.2 Spectrum Results for Matrices C and W.- 2.2.3 Spectrum Results for Matrix (I - 11T/n)C(I - 11T/n).- 2.3 The Auto-Gaussian Jacobian Term Normalizing Factor.- 2.3.1 Simplification of the Auto-Gaussian Jacobian Term Based upon Matrix W for a Regular Square Tessellation and the Rook's Definition of Connectivity.- 2.4 Eigenfunctions Associated with the GR.- 2.5 Remarks and Discussion.- 3 Sampling Distributions Associated with Spatial Autocorrelation.- 3.1 Samples as Random Permutations of Values across Locations on a Man: Randomization.- 3.2 Simple Random Samples at Each Location on a Map: Unconstrained Selection.- 3.3 Samples as Ordered Random Drawings from a Parent Frequency Distribution: Extending the Permutation Perspective.- 3.3.1 The Samnling Distribution fnr MC.- 3.3.2 The Distribution of p for an Auto-normal SAR Model.- 3.4 Samples as Outcomes of a Multivariate Drawing: Extending the Simple Random Samnling Persnective.- 3.4.1 The Auto-normal Model: ML Estimation.- 3.4.2 The Auto-logistic/binomial Model.- 3.4.3 Embedding Spatial Autocorrelation through the Mean Response.- 3.5 Effective Sample Size.- 3.5.1 Estimates Based upon a Single Mean Response.- 3.5.2 Estimates Based upon Multiple Mean Responses.- 3.5.3 Estimates Based upon a Difference of Means for Correlated (Paired) Samples.- 3.5.4 Relationships between Effective Sample Size and the Configuration of Sample Points.- 3.6. Remarks and Discussion.- 4 Spatial Filtering.- 4.1 Eigenvector-based Spatial Filtering.- 4.1.1 Map Patterns Depicted by Eigenvectors of Matrix (I-?C)T(I-? C).- 4.1.2 Similarities with Conventional PCA.- 4.1.3 Orthogonality and Uncorrelatedness of the Eigenvectors.- 4.1.4 Linear Combinations of Eigenvectors of Matrix (I - 11T/n)C(I - 11T/n).- 4.2 Coefficients for Single and Linear Combinations of Distinct Map Patterns.- 4.2.1 Decomposition of Regressor and Regressand Attribute Variables.- 4.2.2 The Sampling Distributions of y? and r.- 4.3 Eigenvector Selection Criteria.- 4.3.1 The Auto-normal Model.- 4.3.2 The Auto-logistic/binomial Model.- 4.3.3 The Auto-Poisson Model.- 4.3.4 The Case of Negative Spatial Autocorrelation.- 4.4 Regression Analysis: Standard Errors Based upon Simulation Experiments and Resampling.- 4.4.1 Simulating Error for Georeferenced Data.- 4.4.2 Bootstrapping Georeferenced Data.- 4.5 The MC Local Statistic and Illuminating Diagnostics.- 4.5.1 The MCis.- 4.5.2 Diagnostics Based upon Eigenvectors of Matrix (I-11T/n)C(I-11T/n).- 4.6 Remarks and Discussion.- 5 Spatial Filtering Applications: Selected Interval/Ratio Datasets.- 5.1 Geographic Distributions of Settlement Size in Peru.- 5.2 The Geographic Distribution of Lyme Disease in Georgia.- 5.3 The Geographic Distribution or Biomass in the Hign Peak District.- 5.4 The Geographic Distribution of Agricultural and Topographic Variables in Puerto Rico.- 5.5 Remarks and Discussion.- 5.5.1 Relationship between the SAR and Eigenvector Spatial Filtering Specifications.- 5.5.2 Computing Back-transformations.- 6 Spatial Filtering Applications: Selected Counts Datasets.- 6.1 Geographic Distributions of Settlement Counts in Pennsylvania.- 6.2 The Geographic Distribution of Farms in Loiza, Puerto Rico.- 6.3 The Geographic Distribution of Volcanoes in Uganda.- 6.4 The Geographic Distribution of Cholera Deaths in London.- 6.5 The Geographic Distribution of Drumlins in Ireland.- 6.6 Remarks and Discussion.- 7 Spatial Filtering Applications: Selected Percentage Datasets.- 7.1 The Geographic Distribution of the Presence/Absence of Plant Disease in an Agricultural Field.- 7.2 The Geographic Distribution of Plant Disease in an Agricultural Field.- 7.3 The Geographic Distribution of Blood Group A in Eire.- 7.4 The Geographic Distribution of Urbanization across the Island of Puerto Rico.- 7.5 Remarks and Discussion.- 8 Concluding Comments.- 8.1 Spatial Filtering versus Spatial Autoregression.- 8.2 Some Numerical Issues in Spatial Filtering.- 8.2.1 Covariation of Spatial Filter and SAR Spatial Autocorrelation Measures.- 8.2.2 Exploding Georeferenced Data with a Spatial Filter When Maps Have Holes or Gaps: Estimating Missing Data Values.- 8.2.3 Rotation and Theoretical Eigenvectors Given by Theorem 2.5 for Regular Square Tessellations Forming Rectangular Regions.- 8.2.4 Effective Sample Size Revisited.- 8.3 Stepwise Selection of Eigenvectors for an Auto-Poisson Model.- 8.4 Binomial and Poisson Overdispersion.- 8.5 Future Research: What Next?.- List of Symbols.- List of Tables.- List of Figures.- References.- Author Index.- Place Index.

467 citations


Journal ArticleDOI
TL;DR: Bounds for the interference and the outage probability are derived, and it is shown how to use a Poisson cluster process to model the interference in this kind of network.
Abstract: Consider a cognitive radio network with two types of users: primary users (PUs) and cognitive users (CUs), whose locations follow two independent Poisson point processes. The cognitive users follow the policy that a cognitive transmitter is active only when it is outside the primary user exclusion regions. We found that under this setup the active cognitive users form a point process called the Poisson hole process. Due to the interaction between the primary users and the cognitive users through exclusion regions, an exact calculation of the interference and the outage probability seems unfeasible. Instead, two different approaches are taken to tackle this problem. First, bounds for the interference (in the form of Laplace transforms) and the outage probability are derived, and second, it is shown how to use a Poisson cluster process to model the interference in this kind of network. Furthermore, the bipolar network model with different exclusion region settings is analyzed.

310 citations


Journal ArticleDOI
TL;DR: In this article, a penalized negative Poisson log-likelihood objective function with nonnegativity constraints is proposed to estimate a spatially or temporally distributed phenomenon from Poisson data.
Abstract: Observations in many applications consist of counts of discrete events, such as photons hitting a detector, which cannot be effectively modeled using an additive bounded or Gaussian noise model, and instead require a Poisson noise model. As a result, accurate reconstruction of a spatially or temporally distributed phenomenon (f*) from Poisson data (y) cannot be effectively accomplished by minimizing a conventional penalized least-squares objective function. The problem addressed in this paper is the estimation of f* from y in an inverse problem setting, where the number of unknowns may potentially be larger than the number of observations and f* admits sparse approximation. The optimization formulation considered in this paper uses a penalized negative Poisson log-likelihood objective function with nonnegativity constraints (since Poisson intensities are naturally nonnegative). In particular, the proposed approach incorporates key ideas of using separable quadratic approximations to the objective function at each iteration and penalization terms related to l1 norms of coefficient vectors, total variation seminorms, and partition-based multiscale estimation methods.

266 citations


Proceedings Article
01 Jan 2012
TL;DR: A beta-negative binomial (BNB) process is proposed, leading to a beta-gamma-Poisson process, which may be viewed as a \multiscoop" generalization of the beta-Bernoulli process.
Abstract: A beta-negative binomial (BNB) process is proposed, leading to a beta-gamma-Poisson process, which may be viewed as a \multiscoop" generalization of the beta-Bernoulli process. The BNB process is augmented into a beta-gamma-gamma-Poisson hierarchical structure, and applied as a nonparametric Bayesian prior for an innite Poisson factor analysis model. A nite approximation for the beta process L evy random measure is constructed for convenient implementation. Ecient MCMC computations are performed with data augmentation and marginalization techniques. Encouraging results are shown on document count matrix factorization.

264 citations


DOI
21 May 2012
TL;DR: A new Stata command, traj, is demonstrated for fitting to longitudinal data finite (discrete) mixture models designed to identify clusters of individuals following similar progressions of some behavior or outcome over age or time.
Abstract: Group-based trajectory models are used to investigate population differences in the developmental courses of behaviors or outcomes . This article demonstrates a new Stata command, traj, for fitting to longitudinal data finite (discrete) mixture models designed to identify clusters of individuals following similar progressions of some behavior or outcome over age or time. Censored normal, Poisson, zero-inflated Poisson, and Bernoulli distributions are supported. Applications to psychometric scale data, count data, and a dichotomous prevalence measure are illustrated

254 citations


Journal ArticleDOI
TL;DR: This paper proposes that the random variation is best described via a Poisson distribution, which better describes the zeros observed in the data as compared to the typical assumption of a Gaussian distribution, and presents a new algorithm for Poisson tensor factorization called CANDECOMP--PARAFAC alternating Poisson regression (CP-APR), based on a majorization-minimization approach.
Abstract: Tensors have found application in a variety of fields, ranging from chemometrics to signal processing and beyond. In this paper, we consider the problem of multilinear modeling of sparse count data. Our goal is to develop a descriptive tensor factorization model of such data, along with appropriate algorithms and theory. To do so, we propose that the random variation is best described via a Poisson distribution, which better describes the zeros observed in the data as compared to the typical assumption of a Gaussian distribution. Under a Poisson assumption, we fit a model to observed data using the negative log-likelihood score. We present a new algorithm for Poisson tensor factorization called CANDECOMP--PARAFAC alternating Poisson regression (CP-APR) that is based on a majorization-minimization approach. It can be shown that CP-APR is a generalization of the Lee--Seung multiplicative updates. We show how to prevent the algorithm from converging to non-KKT points and prove convergence of CP-APR under mil...

250 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss the connection between spatial point, count, and presence-absence methods and how their parameter estimates and predictions should be interpreted and illustrate that under certain assumptions, each method can be motivated by the same underlying spatial inhomogeneous Poisson point process (IPP) model in which the intensity function is modelled as a log-linear function of covariates.
Abstract: 1. The need to understand the processes shaping population distributions has resulted in a vast increase in the diversity of spatial wildlife data, leading to the development of many novel analytical techniques that are fit-for-purpose. One may aggregate location data into spatial units (e.g. grid cells) and model the resulting counts or presence–absences as a function of environmental covariates. Alternatively, the point data may be modelled directly, by combining the individual observations with a set of random or regular points reflecting habitat availability, a method known as a use-availability design (or, alternatively a presence – pseudo-absence or case–control design). 2. Although these spatial point, count and presence–absence methods are widely used, the ecological literature is not explicit about their connections and how their parameter estimates and predictions should be interpreted. The objective of this study is to recapitulate some recent statistical results and illustrate that under certain assumptions, each method can be motivated by the same underlying spatial inhomogeneous Poisson point process (IPP) model in which the intensity function is modelled as a log-linear function of covariates. 3. The Poisson likelihood used for count data is a discrete approximation of the IPP likelihood. Similarly, the presence–absence design will approximate the IPP likelihood, but only when spatial units (i.e. pixels) are extremely small (Electric Journal of Statistics, 2010, 4, 1151–1201). For larger pixel sizes, presence–absence designs do not differentiate between one or multiple observations within each pixel, hence leading to information loss. 4. Logistic regression is often used to estimate the parameters of the IPP model using point data. Although the response variable is defined as 0 for the availability points, these zeros do not serve as true absences as is often assumed; rather, their role is to approximate the integral of the denominator in the IPP likelihood (The Annals of Applied Statistics, 2010, 4, 1383–1402). Because of this common misconception, the estimated exponential function of the linear predictor (i.e. the resource selection function) is often assumed to be proportional to occupancy. Like IPP and count models, this function is proportional to the expected density of observations. 5. Understanding these (dis-)similarities between different species distribution modelling techniques should improve biological interpretation of spatial models and therefore advance ecological and methodological cross-fertilization.

238 citations


Book
01 Jan 2012
TL;DR: This work presents a basic introduction to Bayesian statistics and Markov Chain Monte Carlo and introduces hurdle models using GAM, and demonstrates that an excessive number of zeros does not necessarily mean zero inflation.
Abstract: Chapter 1 provides a basic introduction to Bayesian statistics and Markov Chain Monte Carlo (MCMC), as we will need this for most analyses. If you are familiar with these techniques we suggest quickly skimming through it. In Chapter 2 we analyse nested zero inflated data of sibling negotiation of barn owl chicks. We explain application of a Poisson GLMM for 1-way nested data and discuss the observation-level random intercept to allow for overdispersion. We show that the data are zero inflated and introduce zero inflated GLMM. We recommend reading this chapter in detail, as we will refer often to it. Data of sandeel otolith presence in seal scat is analysed in Chapter 3. We present a flowchart of steps in selecting the appropriate technique: Poisson GLM, negative binomial GLM, Poisson or negative binomial GAM, or GLMs with zero inflated distribution. Chapter 4 is relevant for readers interested in the analysis of (zero inflated) 2-way nested data. The chapter takes us to marmot colonies: multiple colonies with multiple animals sampled repeatedly over time. Chapters 5 - 7 address GLMs with spatial correlation. Chapter 5 presents an analysis of Common Murre density data and introduces hurdle models using GAM. Random effects are used to model spatial correlation. In Chapter 6 we analyse zero inflated skate abundance recorded at approximately 250 sites along the coastal and continental shelf waters of Argentina. Chapter 7 also involves spatial correlation (parrotfish abundance) with data collected around islands, which increases the complexity of the analysis. GLMs with residual conditional auto-regressive correlation structures are used. In Chapter 8 we apply zero inflated models to click beetle data. Chapter 9 is relevant for readers interested in GAM, zero inflation, and temporal auto-correlation. We analyse a time series of zero inflated whale strandings. In Chapter 10 we demonstrate that an excessive number of zeros does not necessarily mean zero inflation. We also discuss whether the application of mixture models requires that the data include false zeros and whether the algorithm can indicate which zeros are false.

216 citations


Posted Content
TL;DR: This paper presents a Poisson-convergence result for a broad range of stationary (including lattice) networks subject to log-normal shadowing of increasing variance and proves the invariance of the Poisson limit with respect to the distribution of the additional shadowing or fading.
Abstract: An almost ubiquitous assumption made in the stochastic-analytic study of the quality of service in cellular networks is Poisson distribution of base stations. It is usually justified by various irregularities in the real placement of base stations, which ideally should form the hexagonal pattern. We provide a different and rigorous argument justifying the Poisson assumption under sufficiently strong log-normal shadowing observed in the network, in the evaluation of a natural class of the typical-user service-characteristics including its SINR. Namely, we present a Poisson-convergence result for a broad range of stationary (including lattice) networks subject to log-normal shadowing of increasing variance. We show also for the Poisson model that the distribution of all these characteristics does not depend on the particular form of the additional fading distribution. Our approach involves a mapping of 2D network model to 1D image of it "perceived" by the typical user. For this image we prove our convergence result and the invariance of the Poisson limit with respect to the distribution of the additional shadowing or fading. Moreover, we present some new results for Poisson model allowing one to calculate the distribution function of the SINR in its whole domain. We use them to study and optimize the mean energy efficiency in cellular networks.

214 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used Geostationary Operational Environmental Satellite X-ray flares and McIntosh group classifications from solar cycles 21 and 22 to calculate average flare rates for each McIntosh class and use these to determine Poisson probabilities for different flare magnitudes.
Abstract: Solar flares occur in complex sunspot groups, but it remains unclear how the probability of producing a flare of a given magnitude relates to the characteristics of the sunspot group. Here, we use Geostationary Operational Environmental Satellite X-ray flares and McIntosh group classifications from solar cycles 21 and 22 to calculate average flare rates for each McIntosh class and use these to determine Poisson probabilities for different flare magnitudes. Forecast verification measures are studied to find optimum thresholds to convert Poisson flare probabilities into yes/no predictions of cycle 23 flares. A case is presented to adopt the true skill statistic (TSS) as a standard for forecast comparison over the commonly used Heidke skill score (HSS). In predicting flares over 24 hr, the maximum values of TSS achieved are 0.44 (C-class), 0.53 (M-class), 0.74 (X-class), 0.54 (≥M1.0), and 0.46 (≥C1.0). The maximum values of HSS are 0.38 (C-class), 0.27 (M-class), 0.14 (X-class), 0.28 (≥M1.0), and 0.41 (≥C1.0). These show that Poisson probabilities perform comparably to some more complex prediction systems, but the overall inaccuracy highlights the problem with using average values to represent flaring rate distributions.

214 citations


Journal ArticleDOI
TL;DR: This paper surveys the different COM-Poisson models that have been published thus far and their applications in areas including marketing, transportation, and biology, among others.
Abstract: The Poisson distribution is a popular distribution for modeling count data, yet it is constrained by its equidispersion assumption, making it less than ideal for modeling real data that often exhibit over-dispersion or under-dispersion. The COM-Poisson distribution is a two-parameter generalization of the Poisson distribution that allows for a wide range of over-dispersion and under-dispersion. It not only generalizes the Poisson distribution but also contains the Bernoulli and geometric distributions as special cases. This distribution's flexibility and special properties have prompted a fast growth of methodological and applied research in various fields. This paper surveys the different COM-Poisson models that have been published thus far and their applications in areas including marketing, transportation, and biology, among others. Copyright © 2011 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this tutorial, both classes of models used in psychological research are revisited, and model comparisons and the interpretation of their parameters are discussed.
Abstract: Infrequent count data in psychological research are commonly modelled using zero-inflated Poisson regression. This model can be viewed as a latent mixture of an "always-zero" component and a Poisson component. Hurdle models are an alternative class of two-component models that are seldom used in psychological research, but clearly separate the zero counts and the non-zero counts by using a left-truncated count model for the latter. In this tutorial we revisit both classes of models, and discuss model comparisons and the interpretation of their parameters. As illustrated with an example from relational psychology, both types of models can easily be fitted using the R-package pscl.

Journal ArticleDOI
TL;DR: Using a Poisson log linear model, an analog of diagonal linear discriminant analysis that is appropriate for sequencing data is developed and an approach for clustering sequencing data using a new dissimilarity measure that is based upon the Poisson model is proposed.
Abstract: In recent years, advances in high throughput sequencing technology have led to a need for specialized methods for the analysis of digital gene expression data. While gene expression data measured on a microarray take on continuous values and can be modeled using the normal distribution, RNA sequencing data involve nonnegative counts and are more appropriately modeled using a discrete count distribution, such as the Poisson or the negative binomial. Consequently, analytic tools that assume a Gaussian distribution (such as classification methods based on linear discriminant analysis and clustering methods that use Euclidean distance) may not perform as well for sequencing data as methods that are based upon a more appropriate distribution. Here, we propose new approaches for performing classification and clustering of observations on the basis of sequencing data. Using a Poisson log linear model, we develop an analog of diagonal linear discriminant analysis that is appropriate for sequencing data. We also propose an approach for clustering sequencing data using a new dissimilarity measure that is based upon the Poisson model. We demonstrate the performances of these approaches in a simulation study, on three publicly available RNA sequencing data sets, and on a publicly available chromatin immunoprecipitation sequencing data set.

Journal ArticleDOI
TL;DR: In this article, the authors consider a class of observation-driven Poisson count processes where the current value of the accompanying intensity process depends on previous values of both processes and show that the bivariate process has a unique stationary distribution and that a stationary version of the count process is absolutely regular.
Abstract: We consider a class of observation-driven Poisson count processes where the current value of the accompanying intensity process depends on previous values of both processes. We show under a contractive condition that the bivariate process has a unique stationary distribution and that a stationary version of the count process is absolutely regular. Moreover, since the intensities can be written as measurable functionals of the count variables, we conclude that the bivariate process is ergodic. As an important application of these results, we show how a test method previously used in the case of independent Poisson data can be used in the case of Poisson count processes.

Journal ArticleDOI
TL;DR: The MortalitySmooth package provides a framework for smoothing count data in both one- and two-dimensional settings and is specifically tailored to demographers, actuaries, epidemiologists, and geneticists who may be interested in using a practical tool for smoothed mortality data over ages and/or years.
Abstract: The MortalitySmooth package provides a framework for smoothing count data in both one- and two-dimensional settings Although general in its purposes, the package is specifically tailored to demographers, actuaries, epidemiologists, and geneticists who may be interested in using a practical tool for smoothing mortality data over ages and/or years The total number of deaths over a specified age- and year-interval is assumed to be Poisson-distributed, and P-splines and generalized linear array models are employed as a suitable regression methodology Extra-Poisson variation can also be accommodated

Journal ArticleDOI
TL;DR: For the uniform distance ρ(F n, Φ) between the standard normal distribution function Φ and the distribution function F n of the normalized sum of an arbitrary number n≥1 of independent identically distributed random variables with zero mean, unit variance, and finite third absolute moment β3, the best known upper estimate of the absolute constant in the classical Berry-Esseen inequality is 0.3041 which is strictly less than the least possible value 0.4097 as discussed by the authors.
Abstract: By a modification of the method that was applied in study of Korolev & Shevtsova (2009), here the inequalities and are proved for the uniform distance ρ(F n ,Φ) between the standard normal distribution function Φ and the distribution function F n of the normalized sum of an arbitrary number n≥1 of independent identically distributed random variables with zero mean, unit variance, and finite third absolute moment β3. The first of these two inequalities is a structural improvement of the classical Berry–Esseen inequality and as well sharpens the best known upper estimate of the absolute constant in the classical Berry–Esseen inequality since 0.33477(β3+0.429)≤0.33477(1+0.429)β3<0.4784β3 by virtue of the condition β3≥1. The latter inequality is applied to lowering the upper estimate of the absolute constant in the analog of the Berry–Esseen inequality for Poisson random sums to 0.3041 which is strictly less than the least possible value 0.4097… of the absolute constant in the classical Berry–Esseen inequalit...

Journal ArticleDOI
Fukang Zhu1
TL;DR: In this article, the authors proposed zero-inflated Poisson and negative binomial INGARCH models, which are useful and flexible generalizations of the Poisson model, respectively.

Journal ArticleDOI
TL;DR: In this paper, generalized linear models for regression modeling of count time series are considered and conditions for obtaining weak dependence for such models are given. But these conditions are not applicable to the case of counting time series.

Journal ArticleDOI
Fukang Zhu1
TL;DR: In this paper, a generalized Poisson INGARCH model is proposed to account for both overdispersion and underdispersing in time series of counts, which can be used to analyze the autocorrelation structure and derive expressions for moments of order 1 and 2.

Journal ArticleDOI
TL;DR: In this article, a stationary first-order nonnegative integer valued autoregressive process with zero inflated Poisson innovations is proposed to model the counts of events in consecutive points of time.
Abstract: The first-order nonnegative integer valued autoregressive process has been applied to model the counts of events in consecutive points of time. It is known that, if the innovations are assumed to follow a Poisson distribution then the marginal model is also Poisson. This model may however not be suitable for overdispersed count data. One frequent manifestation of overdispersion is that the incidence of zero counts is greater than expected from a Poisson model. In this paper, we introduce a new stationary first-order integer valued autoregressive process with zero inflated Poisson innovations. We derive some structural properties such as the mean, variance, marginal and joint distribution functions of the process. We consider estimation of the unknown parameters by conditional or approximate full maximum likelihood. We use simulation to study the limiting marginal distribution of the process and the performance of our fitting algorithms. Finally, we demonstrate the usefulness of the proposed model by analyzing some real time series on animal health laboratory submissions.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce the space-fractional Poisson process whose state probabilities p, t, t > 0, � 2 (0,1), are governed by the equations (d/dt)pk(t) = � � (1 B)p � (t), where (B) is the fractional difference operator found in the study of time series analysis.

Proceedings Article
26 Jun 2012
TL;DR: In this article, a lognormal and gamma mixed negative binomial (NB) regression model for counting is proposed, which has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients.
Abstract: In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a lognormal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples.

Journal ArticleDOI
TL;DR: In this paper, the authors examine various approaches to demand forecasting for such products, paying particular attention to the need for inventory planning over a multi-period lead-time when the underlying process may be non-stationary.

Journal ArticleDOI
TL;DR: This work presents the first detailed algorithmic analysis of how Google Flu Trends can be used as a basis for building a fully automated system for early warning of epidemics in advance of methods used by the CDC, and develops FluBreaks, an early warning system for flu epidemics using Google flu Trends data.
Abstract: Background: The Google Flu Trends service was launched in 2008 to track changes in the volume of online search queries related to flu-like symptoms. Over the last few years, the trend data produced by this service has shown a consistent relationship with the actual number of flu reports collected by the US Centers for Disease Control and Prevention (CDC), often identifying increases in flu cases weeks in advance of CDC records. However, contrary to popular belief, Google Flu Trends is not an early epidemic detection system. Instead, it is designed as a baseline indicator of the trend, or changes, in the number of disease cases. Objective: To evaluate whether these trends can be used as a basis for an early warning system for epidemics. Methods: We present the first detailed algorithmic analysis of how Google Flu Trends can be used as a basis for building a fully automated system for early warning of epidemics in advance of methods used by the CDC. Based on our work, we present a novel early epidemic detection system, called FluBreaks (dritte.org/flubreaks), based on Google Flu Trends data. We compared the accuracy and practicality of three types of algorithms: normal distribution algorithms, Poisson distribution algorithms, and negative binomial distribution algorithms. We explored the relative merits of these methods, and related our findings to changes in Internet penetration and population size for the regions in Google Flu Trends providing data. Results: Across our performance metrics of percentage true-positives (RTP), percentage false-positives (RFP), percentage overlap (OT), and percentage early alarms (EA), Poisson- and negative binomial-based algorithms performed better in all except RFP. Poisson-based algorithms had average values of 99%, 28%, 71%, and 76% for RTP, RFP, OT, and EA, respectively, whereas negative binomial-based algorithms had average values of 97.8%, 17.8%, 60%, and 55% for RTP, RFP, OT, and EA, respectively. Moreover, the EA was also affected by the region’s population size. Regions with larger populations (regions 4 and 6) had higher values of EA than region 10 (which had the smallest population) for negative binomial- and Poisson-based algorithms. The difference was 12.5% and 13.5% on average in negative binomial- and Poisson-based algorithms, respectively. Conclusions: We present the first detailed comparative analysis of popular early epidemic detection algorithms on Google Flu Trends data. We note that realizing this opportunity requires moving beyond the cumulative sum and historical limits method-based normal distribution approaches, traditionally employed by the CDC, to negative binomial- and Poisson-based algorithms to deal with potentially noisy search query data from regions with varying population and Internet penetrations. Based on our work, we have developed FluBreaks, an early warning system for flu epidemics using Google Flu Trends.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: It is demonstrated that for the task of image denoising, nearly state-of-the-art results can be achieved using small dictionaries only, provided that they are learned directly from the noisy image.
Abstract: Photon limitations arise in spectral imaging, nuclear medicine, astronomy and night vision. The Poisson distribution used to model this noise has variance equal to its mean so blind application of standard noise removals methods yields significant artifacts. Recently, overcomplete dictionaries combined with sparse learning techniques have become extremely popular in image reconstruction. The aim of the present work is to demonstrate that for the task of image denoising, nearly state-of-the-art results can be achieved using small dictionaries only, provided that they are learned directly from the noisy image. To this end, we introduce patch-based denoising algorithms which perform an adaptation of PCA (Principal Component Analysis) for Poisson noise. We carry out a comprehensive empirical evaluation of the performance of our algorithms in terms of accuracy when the photon count is really low. The results reveal that, despite its simplicity, PCA-flavored denoising appears to be competitive with other state-of-the-art denoising algorithms.

Journal ArticleDOI
TL;DR: It is shown that when appropriate covariates that affect both detection and abundance are available, conditional likelihood can be used to estimate the regression parameters of a binomial–zero-inflated Poisson (ZIP) mixture model and correct for detection error.
Abstract: Current methods to correct for detection error require multiple visits to the same survey location. Many historical datasets exist that were collected using only a single visit, and logistical/cost considerations prevent many current research programs from collecting multiple visit data. In this paper, we explore what can be done with single visit count data when there is detection error. We show that when appropriate covariates that affect both detection and abundance are available, conditional likelihood can be used to estimate the regression parameters of a binomial–zero-inflated Poisson (ZIP) mixture model and correct for detection error. We use observed counts of Ovenbirds (Seiurus aurocapilla) to illustrate the estimation of the parameters for the binomial–zero-inflated Poisson mixture model using a subset of data from one of the largest and longest ecological time series datasets that only has single visits. Our single visit method has the following characteristics: (i) it does not require the assumptions of a closed population or adjustments caused by movement or migration; (ii) it is cost effective, enabling ecologists to cover a larger geographical region than possible when having to return to sites; and (iii) its resultant estimators appear to be statistically and computationally highly efficient. Copyright © 2012 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The approach incorporates stochastic variation due to the evolutionary process and can be fit using standard statistical software, and universally outperforms existing methods for detecting genes subject to selection using polymorphism and divergence data.
Abstract: We present an approach for identifying genes under natural selection using polymorphism and divergence data from synonymous and non-synonymous sites within genes. A generalized linear mixed model is used to model the genome-wide variability among categories of mutations and estimate its functional consequence. We demonstrate how the model's estimated fixed and random effects can be used to identify genes under selection. The parameter estimates from our generalized linear model can be transformed to yield population genetic parameter estimates for quantities including the average selection coefficient for new mutations at a locus, the synonymous and non-synynomous mutation rates, and species divergence times. Furthermore, our approach incorporates stochastic variation due to the evolutionary process and can be fit using standard statistical software. The model is fit in both the empirical Bayes and Bayesian settings using the lme4 package in R, and Markov chain Monte Carlo methods in WinBUGS. Using simulated data we compare our method to existing approaches for detecting genes under selection: the McDonald-Kreitman test, and two versions of the Poisson random field based method MKprf. Overall, we find our method universally outperforms existing methods for detecting genes subject to selection using polymorphism and divergence data.

Journal ArticleDOI
TL;DR: This paper focuses on confocal microscopy, which is a very popular technique for 3-D imaging of biological living specimens that gives images with a very good resolution, although degraded by both blur and Poisson noise.
Abstract: Deblurring noisy Poisson images has recently been a subject of an increasing amount of works in many areas such as astronomy and biological imaging. In this paper, we focus on confocal microscopy, which is a very popular technique for 3-D imaging of biological living specimens that gives images with a very good resolution (several hundreds of nanometers), although degraded by both blur and Poisson noise. Deconvolution methods have been proposed to reduce these degradations, and in this paper, we focus on techniques that promote the introduction of an explicit prior on the solution. One difficulty of these techniques is to set the value of the parameter, which weights the tradeoff between the data term and the regularizing term. Only few works have been devoted to the research of an automatic selection of this regularizing parameter when considering Poisson noise; therefore, it is often set manually such that it gives the best visual results. We present here two recent methods to estimate this regularizing parameter, and we first propose an improvement of these estimators, which takes advantage of confocal images. Following these estimators, we secondly propose to express the problem of the deconvolution of Poisson noisy images as the minimization of a new constrained problem. The proposed constrained formulation is well suited to this application domain since it is directly expressed using the antilog likelihood of the Poisson distribution and therefore does not require any approximation. We show how to solve the unconstrained and constrained problems using the recent alternating-direction technique, and we present results on synthetic and real data using well-known priors, such as total variation and wavelet transforms. Among these wavelet transforms, we specially focus on the dual-tree complex wavelet transform and on the dictionary composed of curvelets and an undecimated wavelet transform.

Journal ArticleDOI
TL;DR: In this paper, the authors present motivation and new Stata commands for modeling count data with underdispersion, and the new command for fitting generalized Poisson regre is presented.
Abstract: We present motivation and new Stata commands for modeling count data While the focus of this article is on modeling data with underdispersion, the new command for fitting generalized Poisson regre

Journal ArticleDOI
TL;DR: In this article, a new 3D structure constructed from rigid cuboids which also deform through relative rotation of the units is proposed, and analytical models for the mechanical properties, namely the Poisson's ratio and the Young's moduli, are derived and it is shown that for loading on-axis, these systems have the potential to exhibit negative values for all the six onaxis Poisson ratios.
Abstract: Materials exhibiting auxetic behaviour get fatter when stretched (i.e. possess a negative Poisson's ratio). This property has been closely related to particular geometrical features of a system and how it deforms. One of the mechanisms which is known to have a potential to generate such behaviour is that of rotating rigid units. Several models based on this concept have been developed, including two-dimensional as well as three-dimensional (3D) models. In this work, we propose a new 3D structure constructed from rigid cuboids which also deform through relative rotation of the units. In particular, analytical models for the mechanical properties, namely the Poisson's ratio and the Young's moduli, are derived and it is shown that for loading on-axis, these systems have the potential to exhibit negative values for all the six on-axis Poisson's ratios.