scispace - formally typeset
Search or ask a question

Showing papers on "Dendrogram published in 2003"


Journal ArticleDOI
TL;DR: Six clustering algorithms are considered and it is shown that the group means produced by Diana are the closest and those produced by UPGMA are the farthest from a model profile based on a set of hand-picked genes.
Abstract: Motivation: With the advent of microarray chip technology, large data sets are emerging containing the simultaneous expression levels of thousands of genes at various time points during a biological process Biologists are attempting to group genes based on the temporal pattern of their expression levels While the use of hierarchical clustering (UPGMA) with correlation ‘distance’ has been the most common in the microarray studies, there are many more choices of clustering algorithms in pattern recognition and statistics literature At the moment there do not seem to be any clear-cut guidelines regarding the choice of a clustering algorithm to be used for grouping genes based on their expression profiles Results: In this paper, we consider six clustering algorithms (of various flavors!) and evaluate their performances on a well-known publicly available microarray data set on sporulation of budding yeast and on two simulated data sets Among other things, we formulate three reasonable validation strategies that can be used with any clustering algorithm when temporal observations or replications are present We evaluate each of these six clustering methods with these validation measures While the ‘best’ method is dependent on the exact validation strategy and the number of clusters to be used, overall Diana appears to be a solid performer Interestingly, the performance of correlation-based hierarchical clustering and model-based clustering (another method that has been advocated by a number of researchers) appear to be on opposite extremes, depending on what validation measure one employs Next it is shown that the group means produced by Diana are the closest and those produced by UPGMA are the farthest from a model profile based on a set of hand-picked genes Availability: S+ codes for the partial least squares based clustering are available from the authors upon request All ∗ To whom correspondence should be addressed other clustering methods considered have S+ implementation in the library MASS S+ codes for calculating the validation measures are available from the authors upon request The sporulation data set is publicly available at

393 citations


Book ChapterDOI
01 Jan 2003
TL;DR: The Hierarchical Clustering Explorer provides a dendrogram and color mosaic linked to two-dimensional scattergrams, a variety of visualization options, and dynamic query controls for use in genomic microarray data analysis.
Abstract: The Hierarchical Clustering Explorer provides a dendrogram and color mosaic linked to two-dimensional scattergrams, a variety of visualization options, and dynamic query controls for use in genomic microarray data analysis.

170 citations


Journal ArticleDOI
TL;DR: In this article, the authors used tree inventory data from 28 lowland tropical dipterocarp rain forest locations throughout Borneo to identify floristic regions in the lowland (below 500 m a.s.a.l.) tropical diptersphere rain forest based on tree genera, determine the characteristic taxa of these regions, study tree diversity patterns within Borneia, and relate the floristics and diversity patterns to abiotic factors such as mean annual rainfall and geographical distance between plots.
Abstract: Aim To (1) identify floristic regions in the lowland (below 500 m a.s.l.) tropical dipterocarp rain forest of Borneo based on tree genera, (2) determine the characteristic taxa of these regions, (3) study tree diversity patterns within Borneo, and (4) relate the floristic and diversity patterns to abiotic factors such as mean annual rainfall and geographical distance between plots. Location Lowland tropical dipterocarp rain forest of Borneo. Methods We used tree (diameter at breast height ‡ 9.8 cm) inventory data from 28 lowland dipterocarp rain forest locations throughout Borneo. From each location six samples of 640 individuals were drawn randomly. With these data we calculated a Sorensen and Steinhaus similarity matrix for the locations. These matrices were then used in an UPGMA clustering algorithm to determine the floristic relations between the locations (dendrogram). Principal coordinate analysis was used to ordinate the locations. Characteristic taxa for the identified floristic clusters were determined with the use of the INDVAL method of Dufrene & Legendre (1997). Finally, Mantel analysis was applied to determine the influence of mean annual rainfall and geographical distance between plots on floristic composition.

152 citations


Proceedings ArticleDOI
Jörg Sander1, Xuejie Qin1, Zhiyong Lu1, Nan Niu1, Alex Kovarsky1 
30 Apr 2003
TL;DR: This paper investigates the relation between dendrograms and reachability plots and introduces methods to convert them into each other showing that they essentially contain the same information, and introduces a technique that automatically determines the significant clusters in a hierarchical cluster representation.
Abstract: Hierarchical clustering algorithms are typically more effective in detecting the true clustering structure of a data set than partitioning algorithms. However, hierarchical clustering algorithms do not actually create clusters, but compute only a hierarchical representation of the data set. This makes them unsuitable as an automatic pre-processing step for other algorithms that operate on detected clusters. This is true for both dendrograms and reachability plots, which have been proposed as hierarchical clustering representations, and which have different advantages and disadvantages. In this paper we first investigate the relation between dendrograms and reachability plots and introduce methods to convert them into each other showing that they essentially contain the same information. Based on reachability plots, we then introduce a technique that automatically determines the significant clusters in a hierarchical cluster representation. This makes it for the first time possible to use hierarchical clustering as an automatic pre-processing step that requires no user interaction to select clusters from a hierarchical cluster representation.

138 citations


Journal ArticleDOI
TL;DR: To the knowledge of this work, this is the first test of a species abundance model based on nontrivial predictions as to the origins and causes of abundance patterns, and not simply on the goodness-of-fit of distributions.
Abstract: We examine a hypothesized relationship between two descriptions of community structure: the niche-overlap dendrogram that describes the ecological similarities of species and the pattern of relative abundances. Specifically, we examine the way in which this relationship follows from the niche hierarchy model, whose fundamental assumption is a direct connection between abundances and underlying hierarchical community organization. We test three important, although correlated, predictions of the niche hierarchy model and show that they are upheld in a set of 11 communities (encompassing fishes, amphibians, lizards, and birds) where both abundances and dendrograms were reported. First, species that are highly nested in the dendrogram are on average less abundant than species from branches less subdivided. Second, and more significantly, more equitable community abundances are associated with more evenly branched dendrogram structures, whereas less equitable abundances are associated with less even dendrograms. This relationship shows that abundance patterns can give insight into less visible aspects of community organization. Third, one can recover the distribution of proportional abundances seen in assemblages containing two species by treating each branch point in the dendrogram as a two-species case. This reconstruction cannot be achieved if abundances and the dendrogram are unrelated and suggests a method for hierarchically decomposing systems. To our knowledge, this is the first test of a species abundance model based on nontrivial predictions as to the origins and causes of abundance patterns, and not simply on the goodness-of-fit of distributions.

116 citations


Journal ArticleDOI
TL;DR: Analysis of molecular variance revealed that 94.7% of the genetic diversity in mango existed within regions, however, differences among regions were significant; northern and eastern regions formed one zone and western and southern regions formed another zone of mango diversity in India.
Abstract: SummaryRandom amplified polymorphic DNA analysis was carried out in 29 Indian mango cultivars comprising popular landraces and some advanced cultivars. PCR amplification with 24 primers generated 314 bands, 91.4% of which were polymorphic. Jaccard’s similarity between pairs of cultivars ranged between 0.318 and 0.75 with a mean of 0.565. A UPGMA dendrogram showed the majority of the cultivars from northern and eastern regions of India clustering together and separate from southern and western cultivars. Analysis of molecular variance revealed that 94.7% of the genetic diversity in mango existed within regions. However, differences among regions were significant; northern and eastern regions formed one zone and western and southern regions formed another zone of mango diversity in India.

84 citations


Journal ArticleDOI
TL;DR: The dendrogram sharpening method, combined with a hierarchical clustering algorithm, is used in this work to identify modality regions, which are, in essence, areas of activation in the human brain during an fMRI experiment.
Abstract: The major disadvantage of hierarchical clustering in fMRI data analysis is that an appropriate clustering threshold needs to be specified. Upon grouping data into a hierarchical tree, clusters are identified either by specifying their number or by choosing an appropriate inconsistency coefficient. Since the number of clusters present in the data is not known beforehand, even a slight variation of the inconsistency coefficient can significantly affect the results. To address these limitations, the dendrogram sharpening method, combined with a hierarchical clustering algorithm, is used in this work to identify modality regions, which are, in essence, areas of activation in the human brain during an fMRI experiment. The objective of the algorithm is to remove data from the low-density regions in order to obtain a clearer representation of the data structure. Once cluster cores are identified, the classification algorithm is run on voxels, set aside during sharpening, attempting to reassign them to the detected groups. When applied to a paced motor paradigm, task-related activations in the motor cortex are detected. In order to evaluate the performance of the algorithm, the obtained clusters are compared to standard activation maps where the expected hemodynamic response function is specified as a regressor. The obtained patterns of both methods have a high concordance (correlation coefficient = 0.91). Furthermore, the dependence of the clustering results on the sharpening parameters is investigated and recommendations on the appropriate choice of these variables are offered. Hum. Brain Mapping 20:201–219, 2003. © 2003 Wiley-Liss, Inc.

61 citations


Journal ArticleDOI
TL;DR: The results of the present study provide evidence of the high discriminatory power of AFLP analysis, suggesting the possible applicability of this method to the molecular characterization of Fusarium.
Abstract: The high-resolution genotyping method of amplified fragment length polymorphism (AFLP) analysis was used to study the genetic relationships within and between natural populations of five Fusarium spp. AFLP templates were prepared by the digestion of Fusarium DNA with EcoRI and MseI restriction endonucleases and subsequent ligation of corresponding site-specific adapters. An average of 44 loci was assayed simultaneously with each primer pair and DNA markers in the range 100 to 500 bp were considered for analysis. A total of 80 AFLP polymorphic markers were obtained using four primer combinations, with an average of 20 polymorphic markers observed per primer pair. UPGMA analyses indicated 5 distinct clusters at the phenon line of 30% on the genetic similarity scale corresponding to the 5 taxa. The similarity percent of each group oscillated between 87 and 97%. The phenetic dendrogram generated by UPGMA as well as principal coordinate analysis (PCA) grouped all of the Fusarium spp. isolates into five major clusters. No clear trend was detected between clustering in the AFLP dendrogram and geographic origin, host genotype of the tested isolates with a few exceptions. The results of the present study provide evidence of the high discriminatory power of AFLP analysis, suggesting the possible applicability of this method to the molecular characterization of Fusarium. (African Journal of Biotechnology: 2003 2(3): 51-55)

43 citations


Journal ArticleDOI
TL;DR: The results show that RAPD analysis is an efficient marker technology for estimating genetic diversity and relatedness, thereby enabling the formulation of appropriate strategies for conservation, germplasm management, and selection of diverse parents for sandalwood improvement programmes.
Abstract: SummarySandalwood is an economically important aromatic tree belonging to the family Santalaceae. The trees are used mainly for their fragrant heartwood and oil that have immense potential for foreign exchange. Very little information is available on the genetic diversity in this species. Hence studies were initiated and genetic diversity estimated using RAPD markers in 51 genotypes of Santalum album procured from different geographcial regions of India and three exotic lines of S. spicatum from Australia. Eleven selected Operon primers (10mer) generated a total of 156 consistent and unambiguous amplification products ranging from 200bp to 4kb. Rare and genotype specific bands were identified which could be effectively used to distinguish the genotypes. Genetic relationships within the genotypes were evaluated by generating a dissimilarity matrix based on Ward’s method (Squared Euclidean distance). The phenetic dendrogram and the Principal Component Analysis generated, separated the 51 Indian genotypes fr...

35 citations


Journal ArticleDOI
TL;DR: DNA from Coffea arabica leaves was used for RAPD analysis and a total of 144 leaf samples collected from 16 provenances in five regions of Tanzania were analysed, implying a narrow genetic base in the cultivated Arabica coffee.
Abstract: DNA from Coffea arabica leaves was used for RAPD analysis and a total of 144 leaf samples collected from 16 provenances in five regions of Tanzania were analysed Ten arbitrary 10 mer primers were employed in the analysis and they produced a total of 86 fragments Fragment sizes ranged from 100–1400 bp The resulting dissimilarity matrix revealed values ranging from 011 to 1, while the average was 066 The cophenetic matrix and the original dissimilarity matrix showed a significant correlation of 78 % Mean dissimilarity values within provenances showed a fairly uniform trend despite the large range from 031 to 065 The dendrogram based on genetic distances but showed two clusters with grouping of provenances similar to the dendrogram generated by Jaccard's coefficient Bootstrap analysis showed low values, despite this, the resulting dendrogram grouped all provenances according to their geographical origin The standard genetic distances were fairly uniform implying a narrow genetic base in the cultivated Arabica coffee

28 citations


Journal ArticleDOI
TL;DR: The apparent genetic robustness of E. sieberi tonative forest regeneration practices is attributed to its local abundance combined with the favourable properties of its reproductivebiology, as well as to the limitation that only a single rotation was examined.
Abstract: Potential impacts of regeneration practices ongenetic diversity in the Australian nativeforest species Eucalyptus sieberi L.A.S.Johnson. (silvertop ash) were assessed usingDNA markers. Three different silviculturaltreatments were examined: clear-felling withaerial re-sowing, and the seed tree system withsite preparation by either burning ormechanical disturbance. In addition, twounharvested stands were chosen as controls. Atotal sample of 825 trees were genotyped at 35Mendelian markers: 26 single-copy nuclear RFLPsand 9 microsatellites. No significantdifferences were found among the treatments inany of four population genetic statistics:allelic richness, effective number of alleles,expected heterozygosity and the panmictic index(f). Rare alleles were prevalent, and a MonteCarlo simulation showed that the apparent lossof four rare alleles from the saplingregenerants was highly statisticallysignificant. There was no evidence for recentbottlenecks from analyses of either the levelsof expected heterozygosity relative to thatexpected under mutation drift equilibrium, orthe allele frequency profiles. A dendrogram ofthe relationships between the sampledpopulations suggested that the seed tree systemmay result in the promotion of genetic drift(slight expansion of the dendrogram) whileaerial re-sowing of clear falls with the sameseedlot will lead to genetic homogenisation(contraction of the dendrogram). The apparentgenetic robustness of E. sieberi tonative forest regeneration practices isattributed to its local abundance combined withthe favourable properties of its reproductivebiology, as well as to the limitation that onlya single rotation was examined.

Journal ArticleDOI
TL;DR: The existence of wide genetic diversity as revealed in the present study is supported by earlier reports of extensive inter- and intrapopulation morphological variability in pepper cultivars from South India.
Abstract: RAPD analysis was conducted in 22 cultivars of P. nigrum(black pepper) from South India and one accession each of P. longum and P. colubrinum. Twenty-four primers generated 372 RAPD markers of which 367 were polymorphic. Jaccard's similarity between pairs of accessions ranged between 0.11 and 0.66 with a mean of 0.38. Among P. nigrum cultivars, the similarity ranged between 0.20 and 0.66 and the mean was 0.42. The existence of wide genetic diversity as revealed in the present study is supported by earlier reports of extensive inter- and intrapopulation morphological variability in pepper cultivars from South India. UPGMA dendrogram and PCO plot revealed P. colubrinum to be most distant of the three species. Genetic proximity among P. nigrum cultivars could be related to their phenotypic similarities or geographical distribution. Greater divergence was observed among landraces than among advanced cultivars. Landraces grown in southern parts of coastal India and those grown in more northern parts were grouped in separate clusters of the dendrogram.

Journal ArticleDOI
TL;DR: Five cultivars of Plantago ovata Forsk.
Abstract: Five cultivars of Plantago ovata Forsk. (medicinal plant) have been developed by different agricultural universities in India. Genetic variability of these cultivars was estimated using RAPD markers. The data were correlated to morphological characters and a dendrogram was obtained from Jaccard's coefficient.

Journal ArticleDOI
TL;DR: The seriation method presented here uses simulated annealing to find an approximately optimal dendrogram ordering by minimizing a penalty function and employs a ‘similarity weighted distance’ penalty function that tends to avoid artifacts introduced by the traveling salesman problem algorithms commonly used for dendrograms seriation.
Abstract: Seriation is the ordering of the leaves of a dendrogram, such that leaves representing similar items are placed near each other according to some metric, within the constraints of the cluster tree. Such ordering greatly aids the interpretation of the relations represented by the dendrogram and reduces visual misinterpretation caused by unrelated items from different subtrees being placed near each other during random ordering. The seriation method presented here uses simulated annealing to find an approximately optimal dendrogram ordering by minimizing a penalty function. The method employs a 'similarity weighted distance' penalty function that tends to avoid artifacts introduced by the traveling salesman problem algorithms commonly used for dendrogram seriation. Examples are given showing the effectiveness of the method in presenting dendrograms of the structure of a social network, and additional examples show an application for interpreting the structure of a network of journal papers covering the subject of anthrax research.

01 Jan 2003
TL;DR: In this paper, a collection of 64 velvetbean accessions from different eco-geographical regions using amplified fragment length polymorphism (AFLP) were evaluated by generating a similarity matrix based on Nei and Li's (1979) similarity coefficient which ranges from 0 (no similarity) to 1 (perfect similarity).
Abstract: Genetic diversity was estimated in a collection of 64 velvetbean accessions from different eco-geographical regions using amplified fragment length polymorphism (AFLP). Genetic relationships within the accessions were evaluated by generating a similarity matrix based on Nei and Li’s (1979) similarity coefficient which ranges from 0 (no similarity) to 1 (perfect similarity). In the first study, the phenetic dendrogram (a system of classification of organisms based on overall or observable similarities) generated by UPGMA separated 40 velvetbean accessions into two main clusters. This grouping confirmed the existing phenological difference with regard to maturity. Similarity coefficients ranged from 0.87 to 0.97. In the second study, the phenetic dendrogram separated the 24 accessions into four main clusters. The cluster analysis indicated that velvetbean germplasm within the collection constitutes a broad genetic base with the values of genetic similarity ranging from 0.68 to 1.00. The grouping of a subcluster indicated differences with regard to growth habit. The level of genetic variability within the velvetbean accessions with AFLP analysis suggests that it is a reliable, efficient, and effective marker technology for delineating genetic relationships among genotypes and estimating genetic diversity, thereby enabling the formulation of appropriate strategies for velvetbean improvement programs.

21 Jan 2003
TL;DR: Four general techniques that could be used in interactive explorations of clustering algorithms are described, including overview of the entire dataset, coupled with a detail view so that high-level patterns and hot spots can be easily found and examined.
Abstract: Hierarchical clustering is widely used to find patterns in multi-dimensional datasets, especially for genomic microarray data. Finding groups of genes with similar expression patterns can lead to better understanding of the functions of genes. Early software tools produced only printed results, while newer ones enabled some online exploration. We describe four general techniques that could be used in interactive explorations of clustering algorithms: (1) overview of the entire dataset, coupled with a detail view so that high-level patterns and hot spots can be easily found and examined, (2) dynamic query controls so that users can restrict the number of clusters they view at a time and show those clusters more clearly, (3) coordinated displays: the overview mosaic has a bi-directional link to 2-dimensional scattergrams, (4) cluster comparisons to allow researchers to see how different clustering algorithms group the genes.

Journal ArticleDOI
TL;DR: A clustering method based on recursive bisection is introduced for analyzing microarray gene expression data with the advantage of much improved computational efficiency while retaining effective separation of data clusters under a distance metric, a straightforward parallel implementation, and useful extraction and presentation of biological information.

Journal Article
TL;DR: The genetic affinity were studied between 12 Cassava clones after studying the isozymatical systems: Peroxidase, Phenoloxidases and Carbonic Anhidrase to show the formation of four groups, according the phylogenetic relations.
Abstract: The genetic affinity were studied between 12 Cassava clones ( Manihot esculenta Crantz) after studying the isozymatical systems: Peroxidase, Phenoloxidase and Carbonic Anhidrase. The MAT-GEN program was used for the analyzing results. All the isozymic systems were polymorphous. The dendrogram showed the formation of four groups, according the phylogenetic relations. Key words: cassava, dendrogram, isozymatical system

Journal Article
TL;DR: It show that paris has high genetic diversity and 11 species were divided into 6 clusters according to the dendrogram constructed by UPGMA method.
Abstract: Genetic diversity of 11 species of Paris were analyzed by RAPD makers with 12 primers in this research During 86 bands, 82 (953%) were polymorphic It show that paris has high genetic diversity And 11 species were divided into 6 clusters according to the dendrogram constructed by UPGMA method 

01 Jan 2003
TL;DR: In this paper, the authors propose a novel approach to solve the problem of homonymity in homophily, called homophyphyphosis. ix CHAPTER
Abstract: ................................................................................ ix CHAPTER

01 Jan 2003
TL;DR: This thesis presents an application which visualizes yeast clustering results as a dendrogram along with color-coded gene keyword annotations, which will help the biologists to integrate the different type of visual information quickly and provide an intuitive way of correlating the gene expression results with gene function.
Abstract: FUNCTIONAL ANNOTATION AND DENDROGRAM REPRESENTATION OF GENE EXPRESSION CLUSTERING RESULTS by Antoaneta Vladimirova The advances in genomic sciences have created vast amounts of gene expression data. To make sense of the expression information, various techniques have been applied. Clustering is among the unsupervised methods used to group the results according to gene expression level. Dendrogram visualization allows graphical representation of the clustering. The aim of this thesis is to enhance these techniques by adding another layer of functionality, namely, annotating the dendrogram with gene functional information. Presented is an application which visualizes yeast clustering results as a dendrogram along with color-coded gene keyword annotations. Gene keyword information was extracted from a major biological database and was used to create a database which was queried by the program according to the user preferences. Functional annotation with keyword information will help the biologists to integrate the different type of visual information quickly and provide an intuitive way of correlating the gene expression results with gene function. FUNCTIONAL ANNOTATION AND DENDROGRAM REPRESENTATION OF GENE EXPRESSION CLUSTERING RESULTS by Antoaneta Petkova Vladimirova A Thesis Submitted to the Faculty of New Jersey Institute of Technology In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Department of Computer Science January 2003 Copyright C 2003 by Antoaneta Petkova Vladimirova

01 Jan 2003
TL;DR: This paper presents a novel variant of the hierarchical clustering from (2), and proposes an algorithm to build a similarity tree as a taxonomy that respects the hierarchical clusters determined above.
Abstract: This paper presents a novel variant of the hierarchical clustering from (2). We tried to solve the problem of repetitive similarity values that appears on distributional similarity values. Also we propose an algorithm to build a similarity tree as a taxonomy that respects the hierarchical clusters determined above.