scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Finite mixtures of multivariate Poisson distributions with application

TL;DR: In this article, the authors examined finite mixtures of multivariate Poisson distributions as an alternative class of models for multivariate count data, allowing for both overdispersion in the marginal distributions and negative correlation, while they are computationally tractable using standard ideas from finite mixture modelling.
About: This article is published in Journal of Statistical Planning and Inference.The article was published on 2007-06-01. It has received 116 citations till now. The article focuses on the topics: Count data & Multivariate statistics.
Citations
More filters
Journal ArticleDOI
TL;DR: This work marks an important step in the non-Gaussian model-based clustering and classification direction, and a variant of the EM algorithm is developed for parameter estimation by exploiting the relationship with the generalized inverse Gaussian distribution.
Abstract: A mixture of shifted asymmetric Laplace distributions is introduced and used for clustering and classification. A variant of the EM algorithm is developed for parameter estimation by exploiting the relationship with the generalized inverse Gaussian distribution. This approach is mathematically elegant and relatively computationally straightforward. Our novel mixture modelling approach is demonstrated on both simulated and real data to illustrate clustering and classification applications. In these analyses, our mixture of shifted asymmetric Laplace distributions performs favourably when compared to the popular Gaussian approach. This work, which marks an important step in the non-Gaussian model-based clustering and classification direction, concludes with discussion as well as suggestions for future work.

151 citations


Cites background from "Finite mixtures of multivariate Poi..."

  • ...E-mail: paul.mcnicholas@uoguelph.ca. ar X iv :1 20 7....

    [...]

  • ...…also burgeoned on skew-normal distributions (e.g., Lin, 2009), skew-t distributions (e.g., Lin, 2010; Lee and McLachlan, 2011; Vrbik and McNicholas, 2012), and other non-elliptically contoured distributions (e.g., Karlis and Meligkotsidou, 2007; Karlis and Santourian, 2009; Browne et al., 2012)....

    [...]

Journal ArticleDOI
TL;DR: In this article, a bivariate integer-valued autoregressive process of order 1 (BINAR(1)) is introduced for counting data and a method of conditional maximum likelihood for the estimation of its unknown parameters is proposed.
Abstract: The study of time series models for count data has become a topic of special interest during the last years. However, while research on univariate time series for counts now flourishes, the literature on multivariate time series models for count data is notably more limited. In the present paper, a bivariate integer-valued autoregressive process of order 1 (BINAR(1)) is introduced. Emphasis is placed on models with bivariate Poisson and bivariate negative binomial innovations. We discuss properties of the BINAR(1) model and propose the method of conditional maximum likelihood for the estimation of its unknown parameters. Issues of diagnostics and forecasting are considered and predictions are produced by means of the conditional forecast distribution. Estimation uncertainty is accommodated by taking advantage of the asymptotic normality of maximum likelihood estimators and constructing appropriate confidence intervals for the fe-step-ahead conditional probability mass function. The proposed model is appli...

126 citations


Cites methods from "Finite mixtures of multivariate Poi..."

  • ...…literature, as, e.g., the bivariate Poisson lognormal model of Aitchinson and Ho (1989) (see also Chib and Winkelmann, 2001), the finite mixture model developed by Karlis and Meligkotsidou (2007) and models based on copulas see, e.g., Nikoloulopoulos and Karlis (2009) and the references therein....

    [...]

Journal ArticleDOI
TL;DR: The authors introduce a mixture of generalized hyperbolic distributions as an alternative to the ubiquitous mixture of Gaussian distributions as well as their near relatives within which the mixture of multivariate t-distributions and the mixtures of skew-t distributions predominate.
Abstract: We introduce a mixture of generalized hyperbolic distributions as an alternative to the ubiquitous mixture of Gaussian distributions as well as their near relatives within which the mixture of multivariate t-distributions and the mixture of skew-t distributions predominate. The mathematical development of our mixture of generalized hyperbolic distributions model relies on its relationship with the generalized inverse Gaussian distribution. The latter is reviewed before our mixture models are presented along with details of the aforesaid reliance. Parameter estimation is outlined within the expectation–maximization framework before the clustering performance of our mixture models is illustrated via applications on simulated and real data. In particular, the ability of our models to recover parameters for data from underlying Gaussian and skew-t distributions is demonstrated. Finally, the role of generalized hyperbolic mixtures within the wider model-based clustering, classification, and density estimation literature is discussed. The Canadian Journal of Statistics 43: 176–198; 2015 © 2015 Statistical Society of Canada Resume Les auteurs presentent un melange de distributions hyperboliques generalisees comme solution de rechange aux melanges habituels bases sur la distribution gaussienne, celle de Student ou celle de Student asymetrique. Les auteurs passent en revue les proprietes de l'inverse generalise de la distribution gaussienne puisque le developpement mathematique qu'ils presentent repose sur un lien, presente en detail, entre cet inverse generalise et les distributions hyperboliques generalisees. Ils procedent a l'estimation des parametres par un algorithme d'esperance-maximisation, puis ils illustrent la performance de leur modele dans le cadre d'une analyse de regroupement en l'appliquant a des donnees simulees, ainsi qu’a un jeu de donnees reelles. Les auteurs demontrent la capacite de leur modele a recuperer les parametres des distributions sous-jacentes lorsque celles-ci sont gaussiennes, ou lorsqu'elles suivent une loi de Student asymetrique. Finalement, ils discutent le role de la distribution hyperbolique generalisee lorsqu'un modele est utilise pour l'analyse de regroupement, la classification ou l'estimation de la densite. La revue canadienne de statistique 43: 176–198; 2015 © 2015 Societe statistique du Canada

121 citations

Journal ArticleDOI
TL;DR: Mixtures of skew-t factor analyzers are very well-suited for model-based clustering of high-dimensional data, giving superior clustering results when compared to a well-established family of Gaussian mixture models.

112 citations


Cites background from "Finite mixtures of multivariate Poi..."

  • ...M E ] 1 8 Ju McLachlan et al., 2007; Karlis and Meligkotsidou, 2007; Lin, 2009; Browne et al., 2012; Lee and McLachlan, 2012; Franczak et al., 2012; Vrbik and McNicholas, 2012; Morris and McNicholas, 2013; Morris et al., 2013)....

    [...]

Journal ArticleDOI
TL;DR: This work examines the state of the art in terms of estimating the total number of taxa in a microbial population from a sample of sequences, and discusses the full range of statistical techniques, parametric and nonparametric as well as frequentist and Bayesian, and specific implications of their use in microbial diversity studies.
Abstract: For decades, statisticians have studied the species problem: how to estimate the total number of species, observed plus unobserved, in a population. This problem dates at least as far back as 1943, to a paper by R.A. Fisher. These methods have found many applications in general ecology, but their importance has grown considerably in recent years, driven by the introduction of high-throughput DNA sequencing into microbial ecology. We examine the state of the art in terms of estimating the total number of taxa in a microbial population from a sample of sequences. We focus mainly on estimating the number of species within a single population (α-diversity), but we also briefly consider statistical inference for comparing the numbers of species across populations (β-diversity). We discuss the full range of statistical techniques, parametric and nonparametric as well as frequentist and Bayesian, and specific implications of their use in microbial diversity studies. We conclude with some recommendations for theo...

89 citations

References
More filters
BookDOI
28 Jan 2005
TL;DR: The important role of finite mixture models in statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the statistical and geospatial literature.
Abstract: The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the statistical and ge...

8,258 citations

Book
02 Oct 2000
TL;DR: The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the mathematical and statistical literature.
Abstract: The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the statistical and ge...

8,095 citations

Journal ArticleDOI
TL;DR: This work discusses the formulation and theoretical and practical properties of the EM algorithm, a specialization to the mixture density context of a general algorithm used to approximate maximum-likelihood estimates for incomplete data problems.
Abstract: The problem of estimating the parameters which determine a mixture density has been the subject of a large, diverse body of literature spanning nearly ninety years. During the last two decades, the...

2,836 citations

Journal ArticleDOI
TL;DR: The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.
Abstract: : The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967). However, as currently implemented, it does not allow the specification of which features (orientation, size and shape) are to be common to all clusters and which may differ between clusters. Also, it is restricted to Gaussian distributions and it does not allow for noise. We propose ways of overcoming these limitations. A reparameterization of the covariance matrix allows us to specify that some features, but not all, be the same for all clusters. A practical framework for non-Gaussian clustering is outlined, and a means of incorporating noise in the form of a Poisson process is described. An approximate Bayesian method for choosing the number of clusters is given. The performance of the proposed methods is studied by simulation, with encouraging results. The methods are applied to the analysis of a data set arising in the study of diabetes, and the results seem better than those of previous analyses. (RH)

2,336 citations

Journal ArticleDOI
TL;DR: In this paper, a hierarchical prior model is proposed to deal with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context, which can be used as a basis for a thorough presentation of many aspects of the posterior distribution.
Abstract: New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context

2,018 citations