Model-Based Clustering, Discriminant Analysis, and Density Estimation

doi:10.1198/016214502760047131

Journal ArticleDOI

Model-Based Clustering, Discriminant Analysis, and Density Estimation

Chris Fraley, +1 more

- 01 Jun 2002 -

Journal of the American Statistical Asso...

- Vol. 97, Iss: 458, pp 611-631

TLDR

This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

Abstract:

Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled. We review a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, minefield detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology and discuss recent development...

Citations

PDF

Open Access

More filters

Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Journal ArticleDOI

A tutorial on spectral clustering

Ulrike von Luxburg

- 01 Dec 2007 -

Statistics and Computing

TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.

...read moreread less

Book

Machine Learning : A Probabilistic Perspective

Kevin P. Murphy

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Journal ArticleDOI

Survey of clustering algorithms

Rui Xu, +1 more

- 01 May 2005 -

IEEE Transactions on Neural Networks

TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.

...read moreread less

Journal ArticleDOI

The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

Christina Curtis, +35 more

- 21 Jun 2012 -

Nature

TL;DR: The results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome, and identify novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

Journal ArticleDOI

Estimating the Dimension of a Model

Gideon Schwarz

- 01 Mar 1978 -

Annals of Statistics

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

...read moreread less

Estimating the dimension of a model

Gideon Schwarz

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

...read moreread less

Some methods for classification and analysis of multivariate observations

James B. MacQueen

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.

...read moreread less

Book

Structural Equations with Latent Variables

Kenneth A. Bollen

TL;DR: The General Model, Part I: Latent Variable and Measurement Models Combined, Part II: Extensions, Part III: Extensions and Part IV: Confirmatory Factor Analysis as discussed by the authors.

...read moreread less

Collapse

Model-Based Clustering, Discriminant Analysis, and Density Estimation

Citations

Data Mining: Concepts and Techniques

A tutorial on spectral clustering

Machine Learning : A Probabilistic Perspective

Survey of clustering algorithms

The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

References

Maximum likelihood from incomplete data via the EM algorithm

Estimating the Dimension of a Model

Estimating the dimension of a model

Some methods for classification and analysis of multivariate observations

Structural Equations with Latent Variables

Related Papers (5)

Maximum likelihood from incomplete data via the EM algorithm

Estimating the Dimension of a Model

Finite Mixture Models

Finite mixture models: McLachlan/finite mixture models

Some methods for classification and analysis of multivariate observations