scispace - formally typeset
Open AccessJournal ArticleDOI

A Product Partition Model With Regression on Covariates.

Reads0
Chats0
TLDR
A model-based clustering algorithm that exploits available covariates is developed that is suitable for any combination of continuous, categorical, count, and ordinal covariates and formalizes Posterior predictive inference in this model.
Abstract
We propose a probability model for random partitions in the presence of covariates. In other words, we develop a model-based clustering algorithm that exploits available covariates. The motivating application is predicting time to progression for patients in a breast cancer trial. We proceed by reporting a weighted average of the responses of clusters of earlier patients. The weights should be determined by the similarity of the new patient's covariate with the covariates of patients in each cluster. We achieve the desired inference by defining a random partition model that includes a regression on covariates. Patients with similar covariates are a priori more likely to be clustered together. Posterior predictive inference in this model formalizes the desired prediction.We build on product partition models (PPM). We define an extension of the PPM to include a regression on covariates by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster. We discuss implementations suitable for any combination of continuous, categorical, count, and ordinal covariates.An implementation of the proposed model as R-package is available for download.

read more

Citations
More filters
Journal ArticleDOI

How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification

TL;DR: The application of a philosophy of cluster analysis to economic data from the 2007 US Survey of Consumer Finances demonstrates techniques and decisions required to obtain an interpretable clustering, and the clustering is shown to be significantly more structured than a suitable null model.
Journal ArticleDOI

Bayesian Nonparametric Inference – Why and How

TL;DR: Inference under models with nonparametric Bayesian (BNP) priors is reviewed for density estimation, clustering, regression and for mixed effects models with random effects distributions.
Journal ArticleDOI

Mixture Models With a Prior on the Number of Components

TL;DR: The most commonly used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces as discussed by the authors.
Posted Content

Mixture models with a prior on the number of components

TL;DR: It turns out that many of the essential properties of DPMs are also exhibited by MFMs, and the MFM analogues are simple enough that they can be used much like the corresponding DPM properties; this simplifies the implementation of MFMs and can substantially improve mixing.
Journal ArticleDOI

PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes

TL;DR: PReMiuM as mentioned in this paper is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model, which allows binary, categorical, count and continuous response, as well as continuous and discrete covariates.
References
More filters
Journal ArticleDOI

A Bayesian Analysis of Some Nonparametric Problems

TL;DR: In this article, a class of prior distributions, called Dirichlet process priors, is proposed for nonparametric problems, for which treatment of many non-parametric statistical problems may be carried out, yielding results that are comparable to the classical theory.
Journal ArticleDOI

Model-Based Clustering, Discriminant Analysis, and Density Estimation

TL;DR: This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
Journal ArticleDOI

Hierarchical mixtures of experts and the EM algorithm

TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
Journal ArticleDOI

Model-based Gaussian and non-Gaussian clustering

TL;DR: The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.
Journal ArticleDOI

On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)

TL;DR: In this paper, a hierarchical prior model is proposed to deal with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context, which can be used as a basis for a thorough presentation of many aspects of the posterior distribution.