Journal ArticleDOI
Model-Based Clustering, Discriminant Analysis, and Density Estimation
Chris Fraley,Adrian E. Raftery +1 more
TLDR
This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.Abstract:
Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled. We review a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, minefield detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology and discuss recent development...read more
Citations
More filters
Book
Data Mining: Concepts and Techniques
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Journal ArticleDOI
A tutorial on spectral clustering
TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.
Book
Machine Learning : A Probabilistic Perspective
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Journal ArticleDOI
Survey of clustering algorithms
Rui Xu,Donald C. Wunsch +1 more
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Journal ArticleDOI
The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups
Christina Curtis,Christina Curtis,Sohrab P. Shah,Suet-Feung Chin,Gulisa Turashvili,Oscar M. Rueda,Mark J Dunning,Doug Speed,Doug Speed,Andy G. Lynch,Shamith A. Samarajiwa,Yinyin Yuan,Stefan Gräf,Gavin Ha,Gholamreza Haffari,Ali Bashashati,Roslin Russell,Steven McKinney,Anita Langerød,Andrew R. Green,Elena Provenzano,Gordon C. Wishart,Sarah E Pinder,Peter H. Watson,Peter H. Watson,Florian Markowetz,Leigh C. Murphy,Ian O. Ellis,Arnie Purushotham,Arnie Purushotham,Anne Lise Børresen-Dale,Anne Lise Børresen-Dale,James D. Brenton,Simon Tavaré,Carlos Caldas,Samuel Aparicio +35 more
TL;DR: The results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome, and identify novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal ArticleDOI
Estimating the Dimension of a Model
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Estimating the dimension of a model
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Some methods for classification and analysis of multivariate observations
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Book
Structural Equations with Latent Variables
TL;DR: The General Model, Part I: Latent Variable and Measurement Models Combined, Part II: Extensions, Part III: Extensions and Part IV: Confirmatory Factor Analysis as discussed by the authors.