scispace - formally typeset
Search or ask a question
Topic

Dendrogram

About: Dendrogram is a research topic. Over the lifetime, 877 publications have been published within this topic receiving 15547 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The Dynamic Tree Cut R package is presented, that implements novel dynamic branch cutting methods for detecting clusters in a dendrogram depending on their shape that can optionally combine the advantages of hierarchical clustering and partitioning around medoids, giving better detection of outliers.
Abstract: Summary: Hierarchical clustering is a widely used method for detecting clusters in genomic data. Clusters are defined by cutting branches off the dendrogram. A common but inflexible method uses a constant height cutoff value; this method exhibits suboptimal performance on complicated dendrograms. We present the Dynamic Tree Cut R package that implements novel dynamic branch cutting methods for detecting clusters in a dendrogram depending on their shape. Compared to the constant height cutoff method, our techniques offer the following advantages: (1) they are capable of identifying nested clusters; (2) they are flexible—cluster shape parameters can be tuned to suit the application at hand; (3) they are suitable for automation; and (4) they can optionally combine the advantages of hierarchical clustering and partitioning around medoids, giving better detection of outliers. We illustrate the use of these methods by applying them to protein–protein interaction network data and to a simulated gene expression data set. Availability: The Dynamic Tree Cut method is implemented in an R package available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting Contact: stevitihit@yahoo.com Supplementary information: Supplementary data are available at Bioinformatics online.

1,661 citations

01 Apr 2008
TL;DR: In this paper, an alternate short proof of NP-hardness of Euclidean sum-of-squares clustering is provided. But this proof is not valid for the general case.
Abstract: A recent proof of NP-hardness of Euclidean sum-of-squares clustering, due to Drineas et al. (Mach. Learn. 56:9---33, 2004), is not valid. An alternate short proof is provided.

774 citations

Journal ArticleDOI
TL;DR: Six clustering algorithms are considered and it is shown that the group means produced by Diana are the closest and those produced by UPGMA are the farthest from a model profile based on a set of hand-picked genes.
Abstract: Motivation: With the advent of microarray chip technology, large data sets are emerging containing the simultaneous expression levels of thousands of genes at various time points during a biological process Biologists are attempting to group genes based on the temporal pattern of their expression levels While the use of hierarchical clustering (UPGMA) with correlation ‘distance’ has been the most common in the microarray studies, there are many more choices of clustering algorithms in pattern recognition and statistics literature At the moment there do not seem to be any clear-cut guidelines regarding the choice of a clustering algorithm to be used for grouping genes based on their expression profiles Results: In this paper, we consider six clustering algorithms (of various flavors!) and evaluate their performances on a well-known publicly available microarray data set on sporulation of budding yeast and on two simulated data sets Among other things, we formulate three reasonable validation strategies that can be used with any clustering algorithm when temporal observations or replications are present We evaluate each of these six clustering methods with these validation measures While the ‘best’ method is dependent on the exact validation strategy and the number of clusters to be used, overall Diana appears to be a solid performer Interestingly, the performance of correlation-based hierarchical clustering and model-based clustering (another method that has been advocated by a number of researchers) appear to be on opposite extremes, depending on what validation measure one employs Next it is shown that the group means produced by Diana are the closest and those produced by UPGMA are the farthest from a model profile based on a set of hand-picked genes Availability: S+ codes for the partial least squares based clustering are available from the authors upon request All ∗ To whom correspondence should be addressed other clustering methods considered have S+ implementation in the library MASS S+ codes for calculating the validation measures are available from the authors upon request The sporulation data set is publicly available at

393 citations

Journal ArticleDOI
TL;DR: A Self Organizing Map (SOM) neural network clustering methodology is used and it is demonstrated that it is superior to the hierarchical clustering methods.

353 citations


Network Information
Related Topics (5)
Genetic diversity
42.8K papers, 873.4K citations
86% related
Shoot
32.1K papers, 693.3K citations
84% related
Germination
51.9K papers, 877.9K citations
83% related
Chlorophyll
18.2K papers, 587.4K citations
79% related
Hordeum vulgare
20.3K papers, 717.5K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023139
2022294
202121
202031
201931