A Product Partition Model With Regression on Covariates.
Reads0
Chats0
TLDR
A model-based clustering algorithm that exploits available covariates is developed that is suitable for any combination of continuous, categorical, count, and ordinal covariates and formalizes Posterior predictive inference in this model.Abstract:
We propose a probability model for random partitions in the presence of covariates. In other words, we develop a model-based clustering algorithm that exploits available covariates. The motivating application is predicting time to progression for patients in a breast cancer trial. We proceed by reporting a weighted average of the responses of clusters of earlier patients. The weights should be determined by the similarity of the new patient's covariate with the covariates of patients in each cluster. We achieve the desired inference by defining a random partition model that includes a regression on covariates. Patients with similar covariates are a priori more likely to be clustered together. Posterior predictive inference in this model formalizes the desired prediction.We build on product partition models (PPM). We define an extension of the PPM to include a regression on covariates by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster. We discuss implementations suitable for any combination of continuous, categorical, count, and ordinal covariates.An implementation of the proposed model as R-package is available for download.read more
Citations
More filters
Journal ArticleDOI
How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification
Christian Hennig,Tim Futing Liao +1 more
TL;DR: The application of a philosophy of cluster analysis to economic data from the 2007 US Survey of Consumer Finances demonstrates techniques and decisions required to obtain an interpretable clustering, and the clustering is shown to be significantly more structured than a suitable null model.
Journal ArticleDOI
Bayesian Nonparametric Inference – Why and How
Peter Müller,Riten Mitra +1 more
TL;DR: Inference under models with nonparametric Bayesian (BNP) priors is reviewed for density estimation, clustering, regression and for mixed effects models with random effects distributions.
Journal ArticleDOI
Mixture Models With a Prior on the Number of Components
TL;DR: The most commonly used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces as discussed by the authors.
Posted Content
Mixture models with a prior on the number of components
TL;DR: It turns out that many of the essential properties of DPMs are also exhibited by MFMs, and the MFM analogues are simple enough that they can be used much like the corresponding DPM properties; this simplifies the implementation of MFMs and can substantially improve mixing.
Journal ArticleDOI
PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes
TL;DR: PReMiuM as mentioned in this paper is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model, which allows binary, categorical, count and continuous response, as well as continuous and discrete covariates.
References
More filters
Journal ArticleDOI
A Bayesian Analysis of Some Nonparametric Problems
TL;DR: In this article, a class of prior distributions, called Dirichlet process priors, is proposed for nonparametric problems, for which treatment of many non-parametric statistical problems may be carried out, yielding results that are comparable to the classical theory.
Journal ArticleDOI
Model-Based Clustering, Discriminant Analysis, and Density Estimation
Chris Fraley,Adrian E. Raftery +1 more
TL;DR: This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
Journal ArticleDOI
Hierarchical mixtures of experts and the EM algorithm
TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
Journal ArticleDOI
Model-based Gaussian and non-Gaussian clustering
TL;DR: The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.
Journal ArticleDOI
On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)
TL;DR: In this paper, a hierarchical prior model is proposed to deal with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context, which can be used as a basis for a thorough presentation of many aspects of the posterior distribution.