scispace - formally typeset
Search or ask a question

Showing papers on "Dendrogram published in 2004"


Journal ArticleDOI
TL;DR: A new tree-structure self-organizing neural network, called dynamically growing self- Organizing tree (DGSOT) algorithm for hierarchical clustering, which extracts gene expression patterns at different levels and proposes a new cluster validation criterion based on the geometric property of the Voronoi partition of the dataset.
Abstract: Motivation: The increasing use of microarray technologies is generating large amounts of data that must be processed in order to extract useful and rational fundamental patterns of gene expression. Hierarchical clustering technology is one method used to analyze gene expression data, but traditional hierarchical clustering algorithms suffer from several drawbacks (e.g. fixed topology structure; mis-clustered data which cannot be reevaluated). In this paper, we introduce a new hierarchical clustering algorithm that overcomes some of these drawbacks. Result: We propose a new tree-structure self-organizing neural network, called dynamically growing self-organizing tree (DGSOT) algorithm for hierarchical clustering. The DGSOT constructs a hierarchy from top to bottom by division. At each hierarchical level, the DGSOT optimizes the number of clusters, from which the proper hierarchical structure of the underlying dataset can be found. In addition, we propose a new cluster validation criterion based on the geometric property of the Voronoi partition of the dataset in order to find the proper number of clusters at each hierarchical level. This criterion uses the Minimum Spanning Tree (MST) concept of graph theory and is computationally inexpensive for large datasets. A K-level up distribution (KLD) mechanism, which increases the scope of data distribution in the hierarchy construction, was used to improve the clustering accuracy. The KLD mechanism allows the data misclustered in the early stages to be reevaluated at a later stage and increases the accuracy of the final clustering result. The clustering result of the DGSOT is easily displayed as a dendrogram for visualization. Based on a yeast cell cycle microarray expression dataset, we found that our algorithm extracts gene expression patterns at different levels. Furthermore, the biological functionality enrichment in the clusters is considerably high and the hierarchical structure of the clusters is more reasonable. Availability: DGSOT is available upon request from the authors.

95 citations


Journal ArticleDOI
TL;DR: The results indicate that differences in cell-wall composition and structure can provide the basis for chemotaxonomy of flowering plants.
Abstract: Fourier transform infrared spectroscopy (FTIR) provides biochemical profiles containing overlapping signals from a majority of the compounds that are present when whole cells are analyzed. Leaf samples of seven higher plant species and varieties were subjected to FTIR to determine whether plants can be discriminated phylogenetically on the basis of biochemical profiles. A hierarchical dendrogram based on principal component analysis (PCA) of FTIR data showed relationships between plants that were in agreement with known plant taxonomy. Genetic programming (GP) analysis determined the top three to five biomarkers from FTIR data that discriminated plants at each hierarchical level of the dendrogram. Most biomarkers determined by GP analysis at each hierarchical level were specific to the carbohydrate fingerprint region (1,200-800 cm(-1)) of the FTIR spectrum. Our results indicate that differences in cell-wall composition and structure can provide the basis for chemotaxonomy of flowering plants.

76 citations


Journal Article
TL;DR: Twenty-six landraces of black gram collected from Orissa, India were analysed for genetic diversity using amplified fragment length polymorphism (AFLP) markers, and influence of soil pattern and topography in the genetic make up of the landrace was visible and seemed to contribute to the genetic distinctness of theLandraces.
Abstract: Twenty-six landraces of black gram collected from Orissa, India were analysed for genetic diversity using amplified fragment length polymorphism (AFLP) markers. Seven primer combinations were used for AFLP analysis. The percentage polymorphism across the samples varied from 74.5 to 93%. The level of rare and common alleles contributing to the diversity in the sample was analysed using the Shannon-Weiner (SW) diversity index. The SW index revealed that three samples of the entire twenty-six contributed significantly to the overall diversity of the sample set. A dendrogram was constructed based on the UPGMA clustering method, which revealed three major clusters. A principal component analysis of the same dataset revealed similar results to that of the dendrogram, with the first principle component accounting for 58% of the total variation. The analysed samples formed five significant groups. Three samples were distinct in their clustering and remained separate from the other samples. Influence of soil pattern and topography in the genetic make up of the landraces was visible and seemed to contribute to the genetic distinctness of the landraces. This genetic diversity could well be utilized for future breeding work.

53 citations


Journal ArticleDOI
TL;DR: This article presents a simple method for searching for clusters with the strongest enrichment of gene classes from a cluster tree and indicates that the clusters found could not have been obtained by simply cutting the cluster tree.
Abstract: A common clustering method in the analysis of gene expression data has been hierarchical clustering. Usually the analysis involves selection of clusters by cutting the tree at a suitable level and/or analysis of a sorted gene list that is obtained with the tree. Cutting of the hierarchical tree requires the selection of a suitable level and it results in the loss of information on the other level. Sorted gene lists depend on the sorting method of the joined clusters. Author proposes that the clusters should be selected using the gene classifications. This article presents a simple method for searching for clusters with the strongest enrichment of gene classes from a cluster tree. The clusters found are presented in the estimated order of importance. The method is demonstrated with a yeast gene expression data set and with two database classifications. The obtained clusters demonstrated a very strong enrichment of functional classes. The obtained clusters are also able to present similar gene groups to those that were observed from the data set in the original analysis and also many gene groups that were not reported in the original analysis. Visualization of the results on top of a cluster tree shows that the method finds informative clusters from several levels of the cluster tree and indicates that the clusters found could not have been obtained by simply cutting the cluster tree. Results were also used in the comparison of cluster trees from different clustering methods. The presented method should facilitate the exploratory analysis of big data sets when the associated categorical data is available.

47 citations


Journal ArticleDOI
TL;DR: A graph named “clustergram” is discussed to examine how cluster members are assigned to clusters as the number of clusters increases, useful in distinguishing between random and deterministic implementations of the Kmeans algorithm.
Abstract: In hierarchical cluster analysis dendrogram graphs are used to visualize how clusters are formed. Because each observation is displayed dendrograms are impractical when the data set is large. For non-hierarchical cluster algorithms (e.g. Kmeans) a graph like the dendrogram does not exist. This paper discusses a graph named “clustergram” to examine how cluster members are assigned to clusters as the number of clusters increases. The clustergram can also give insight into algorithms. For example, it can easily be seen that the “single linkage” algorithm tends to form clusters that consist of just one observation. It is also useful in distinguishing between random and deterministic implementations of the Kmeans algorithm. A data set related to asbestos claims and the Thailand Landmine Data are used throughout to illustrate the clustergram.

40 citations


Journal ArticleDOI
TL;DR: A one-way analysis of variance (ANOVA) test confirmed statistically significant differences in the average thousand seed mass (TSM) between eight subclusters of flax accessions from an ISSR-PCR-based UPGMA dendrogram, which indicate statistical correlation between flax ISSR polymorphism and TSM.
Abstract: Inter-simple sequence repeat (ISSR)-polymerase chain reaction (PCR) polymorphism was generated to provide useful markers for assessment of genetic diversity within flax germplasm collections. We used nine previously selected anchored ISSR primers for fingerprinting of 53 flax cultivars or genotypes and obtained 62 scorable bands, from which 45 bands (72.6%) were polymorphic. An efficient separation of 53 flax accessions into four groups and eight subgroups was achieved using unweighted pair group method with arithmetic means (UPGMA) clustering procedure based on genetic similarity expressed by the Jaccard similarity coefficient (JSC). Clustering procedure within both groups and subgroups successfully produced smaller homogenous clusters, whereas clustering between the main four groups of flax accessions displayed only a continuous decrease of similarity with a weak clustering effect. Statistical significance of grouping and subgrouping within a cluster dendrogram was estimated by calculation of the error flag and cophenetic correlation parameter for each branch. Principal coordinates (PCO) analysis mostly confirmed the separation by UPGMA clustering. We observed a statistically significant correlation between the number of total vs polymorphic bands in ISSR patterns. A one-way analysis of variance (ANOVA) test confirmed statistically significant differences in the average thousand seed mass (TSM) between eight subclusters of flax accessions from an ISSR-PCR-based UPGMA dendrogram, which indicate statistical correlation between flax ISSR polymorphism (the structure of ISSR-based clustering) TSM.

38 citations



Journal ArticleDOI
TL;DR: Analysis of banding patterns confirmed that two strongly aromatic cultivars IC1, IC4, were closely linked, but another aromatic cultivar, B1, formed a separate cluster, and the high yielding cultivars were closely related to B1.
Abstract: DNA was isolated from 14 cultivars of Vigna radiata (L.) Wilczek and subjected to RAPD analysis using 14 random decamer primers. These cultivars revealed polymorphism with respect to RAPD markers and were subjected to hierarchical cluster analysis. A dendrogram was prepared based on these data. Analysis of banding patterns confirmed that two strongly aromatic cultivars IC1, IC4, were closely linked. But another aromatic cultivar, B1, formed a separate cluster. The high yielding cultivars were closely related to B1. The phylogenetic tree constructed by the neighbour joining method showed that RAPD results were correlated with morphological characters like plant height, leaf and seed size, seed colour, etc.

31 citations


Journal ArticleDOI
TL;DR: An algorithm for automatically detecting clusters of samples that are discernable only in a subset of genes that can be used to discover partitions and their biological significance can be determined by comparing with clinical correlates and gene annotations.
Abstract: Background Clustering is one of the most commonly used methods for discovering hidden structure in microarray gene expression data. Most current methods for clustering samples are based on distance metrics utilizing all genes. This has the effect of obscuring clustering in samples that may be evident only when looking at a subset of genes, because noise from irrelevant genes dominates the signal from the relevant genes in the distance calculation.

31 citations


Journal ArticleDOI
TL;DR: The dendrograms produced by EVA consistently outperformed those from UNITY 2D in reproducing the experimental odour classifications of these 47 molecules.
Abstract: Structure–odour relationship analyses using hierarchical clustering were carried out on a diverse dataset of 47 molecules. These molecules were divided into seven odour categories: ambergris, bitter almond, camphoraceous, rose, jasmine, muguet, and musk. The alignment-independent descriptor EVA (EigenVAlue) was used as the molecular descriptor. The results were compared with those of another kind of descriptor, the UNITY 2D fingerprint. The dendrograms obtained with these descriptors were compared with the seven odour categories using the adjusted Rand index. The dendrograms produced by EVA consistently outperformed those from UNITY 2D in reproducing the experimental odour classifications of these 47 molecules.

25 citations


Journal ArticleDOI
TL;DR: As expected in a cross-pollinated crop, high genetic diversity and a larger variation within than among the populations is found and it is found that the currently used varieties have the same high level of heterozygosity as the landraces but in the dendrogram the two groups are separated.
Abstract: Genetic interpretation and diversity of 9 isozyme loci have been estimated in 7 improved varieties and 19 landraces from Sweden by means of starch gel electrophoresis. The isozyme systems were AGO, DIA, GPI, MDH, PGD and PGM. For the statistic analysis we used the following measures: average number of alleles per locus, percentage of polymorphic loci, average heterozygosity direct count and average heterozygosity Hardy-Weinberg expected unbiased estimate. The measures were made on species and population levels. The distribution of the total genetic diversity among populations was also calculated. To illustrate the genetic relationships among populations, genetic distances were measured and principal component analysis performed. As expected in a cross-pollinated crop we found high genetic diversity and a larger variation within than among the populations. Somewhat unexpectedly, however, we found that the currently used varieties have the same high level of heterozygosity as the landraces but in the dendrogram the two groups are separated. The dendrogram showed three main clusters. The large cluster included 21 populations and the two small clusters were clearly distinguishable from the rest. The landrace spring-type could not be separated from the landraces winter-type, but we did detect a difference between different spring types. A few populations had unique alleles for certain loci.

Journal ArticleDOI
TL;DR: It appears that the host plant and its pathogen did not cospeciate, and the strict adaptation of the bacterium to olive would represent a case of association by colonization.
Abstract: A total of 360 Pseudomonas savastanoi pv. savastanoi isolates obtained from 11 Italian olive (Olea europaea) cultivars grown in different provinces were assessed with repetitive PCR using short interspersed elements of the bacterial genome as primers (ERIC, BOX and REP primer sets). The population structure of the isolates was determined by using three different hierarchical clustering algorithms: UPGMA, single-link and complete-link methods. REP primers were the most discriminatory. The various fingerprints obtained from the same cultivar and locality persisted over 2 years of knot sampling. Repetitive PCR and UPGMA analysis, using the three data sets combined, revealed 20 patterns with an overall similarity of 81%, with no grouping of the isolates. The resulting dendrogram shows a bush-like topology. Similar results were obtained with the other two clustering methods. In contrast, data obtained from the literature showed that the genetic structure of olive is characterized by bifurcated dendrograms and clear grouping of cultivars. Therefore it appears that the host plant and its pathogen did not cospeciate. The strict adaptation of the bacterium to olive would represent a case of association by colonization.

Journal Article
LI Shijun1
TL;DR: A 3-phase algorithm based on overlapping partitions to reduce the time and memory requirement of hierarchical clustering for large data set use.
Abstract: A prominent clustering algorithm is hierarchical clustering. But the time complexity and space complexity of the traditional hierarchical clustering are high,and this limits its use for large data set. This paper proposes a 3-phase algorithm based on overlapping partitions to reduce the time and memory requirement.

Book ChapterDOI
02 Oct 2004
TL;DR: A hierarchical clustering is a clustering method in which each point is regarded as a single cluster initially and then the clustering algorithm repeats connecting the nearest two clusters until only one cluster remains.
Abstract: A hierarchical clustering is a clustering method in which each point is regarded as a single cluster initially and then the clustering algorithm repeats connecting the nearest two clusters until only one cluster remains. Because the result is presented as a dendrogram, one can easily figure out the distance and the inclusion relation between clusters.

01 Jan 2004
TL;DR: The dendrogram of the Chinese Polyura inferred by maximum likelihood method showed that there are 2 distinct clusters in the Japanese Polyura, in which one includes P. eudamippus and P. nepenthes, and the topological structure of the d endrogram is consistent with the morphological result.
Abstract: Phylogenetic analysis of 5 Chinese species of the genus Polyura was conducted based on mitochondrial COII sequences The results showed that there were 114% polymorphic loci in 405 bp length of partial COII in 12 specimens of the 5 species of Polyura, and most of them were transformed through transition The difference among individuals within species (05%-15%) was almost all lower than the difference among species (4%), except that the difference between 2 individuals of P athamas was higher than that between P eudamippus and P nepenthes The dendrogram of the Chinese Polyura inferred by maximum likelihood method showed that there are 2 distinct clusters in the Chinese Polyura, in which one includes P narcaea, P eudamippus and P nepenthes, and the other includes P schreiber and P athamas The topological structure of the dendrogram is consistent with the morphological result Therefore, the results of molecular phylogenetic analysis support the morphological results in genus Polyura

Journal Article
TL;DR: It may be reasonable to suggest that rainfall and temperature were the major factors that affected genetic differentiation of Stipa grandis on a larger scale (about 240km), and that on a smaller scale (below 50km) where the variation of rainfall and temperatures was not significant, a combination of several environmental factors was responsible for the genetic differentiation.

Journal Article
TL;DR: The figure showed that the tea trees in Yunnan processing much high genetic diversity on DNA molecular level, which was basically identical with morphological classification.
Abstract: Random Amplified Polymorphic DNA (RAPD) technique was used to analysis the genetic diversity of 48 materials,Which including wild type old tea tree,intermediate type old tea tree,cultivated type species,wild species,local variety and the kidney plant:Camellia fascicularis. The classification and blood relationship among 48 germplasm were explored from DNA level. No single-morphic band has been found among the total 112 DNA bands. The genetic diversity degree was 100%. The figure showed that the tea trees in Yunnan processing much high genetic diversity on DNA molecular level. The genetic distance was between 0.116 to 0.527 and the average was 0.202. All the RAPD amplified bands were clustered by Unweighted Pair Group with Mathematic Average (UPGMA) based on Euclidean distances. The dendrogram of UPGMA showed that the 48 materials could be classified into 5 groups including 3 complex groups and 2 simple groups, which was basically identical with morphological classification.

Journal ArticleDOI
01 Dec 2004
TL;DR: A review about the distance c2, which allows to perform non-exact hypothesis testing to determine existing statistical equality between pairs of operative taxonomical units obtained from the probabilistic distribution of c2 with one degree of freedom, found that the c2 distance was the best option.
Abstract: The most used dissimilarity coefficients in numerical taxonomy are those corresponding to different taxonomical distances. In the present paper we performed a review about the distance c2. This distance has advantages over the Average or Euclidian taxonomical distance, because it allows to perform non-exact hypothesis testing to determine existing statistical equality between pairs of operative taxonomical units obtained from the probabilistic distribution of c2 with one degree of freedom, and it uses the significance level (a) as a similitude measure. The cutting height for a dendrogram, obtained in a cluster analysis based on this matrix can be determined; thus, the researcher, when choosing the desired a level for placing groups together, does not need to calculate other dendrogram partitioning tests such as the cubic clustering criterion or Hottelling’s pseudo statistic t2. To illustrate the advantages of the c2 distance, compared to the Euclidian and Manhattan distances, we show an example with hawthorn genotypes (Crataegus spp.) using cluster analysis validated with canonical discriminant analysis. We found that the c2 distance was the best option.

Journal ArticleDOI
TL;DR: On the basis of Euclidean distance, the obtained dendrogram supports the recognition of A. submitis subsp.


Journal ArticleDOI
01 Jun 2004
TL;DR: The dendrogram obtained from cluster analysis showed high similarity among three species that some authors report as synonymous and that appeared very similar from previous phenotypic observations, suggesting that the species can be considered synonymous.
Abstract: The genus Limonium (fam. Plumbaginaceae) consists of about 300 species of mostly herbaceous perennials, some low shrubs, and annuals. Most botanical species are endemics in the Mediterranean region, but many species have their centre of origin in Caucaso, Turkestan, Caspian Sea, Russia, Iran, China, and South Africa. Limonium is grown in several regions of the world for use as a cut flower for both fresh and dry-flower arrangements.In this work, RAPD analyses were used for the study of genetic relationships in Limonium. Thirteen wild species were tested with 10 primers. A total of 244 bands were scored and used for the analysis of genetic distances. The dendrogram obtained from cluster analysis showed high similarity among three species that some authors report as synonymous and that appeared very similar from our previous phenotypic observations (L. caspia, L. bellidifolium and L. otolepis). In order to clarify the genetic relationships, further analyses were carried out on several genotypes belonging to these species. The new dendrogram, obtained scoring 151 RAPD bands, showed that the genotypes did not group in clear clusters. Analysis of molecular variance (AMOVA) confirmed this trend: the highest genetic variation resulted among genotypes and only 6,58 % of the total variation resulted among the species. These results suggest that the species can be considered synonymous. The use of RAPD markers in our case was thus useful for clarifying the highly probable identity of the three Limonium species, in a plant genus that is notably of difficult interpretation.

Journal Article
TL;DR: The genetic diversity among 89 accessions of Brassica campestris from Hubei and Zhejiang provinces in China was analyzed by random amplified polymorphic DNA Markers and indicated that there were abundant polymorphisms in the tested oilseed rape cultivars in two provinces.
Abstract: The genetic diversity among 89 accessions of Brassica campestris from Hubei and Zhejiang provinces in China was analyzed by random amplified polymorphic DNA Markers (RAPDs) Twenty primers selected from two hundreds primers amplified 184 fragments and 139 bands of them showed polymorphisms Polymorphic rate was 75% Dendrogram was constructed by SPSS software Dendrogram of agronomic traits showed that the oilseed in two provinces were differentiated Dendrogram of RAPD showed that the cultivars in two provinces had relativity The analytic results indicated that there were abundant polymorphisms in the tested oilseed rape cultivars in two provinces

01 Jan 2004
TL;DR: In this article, RAPD analyses were used for the study of genetic relationships in Limonium, and the results indicated that the species can be considered synonymous and that the genotypes did not group in clear clusters.
Abstract: The genus Limonium (fam. Plumbaginaceae) consists of about 300 species of mostly herbaceous perennials, some low shrubs, and annuals. Most botanical species are endemics in the Mediterranean region, but many species have their centre of origin in Caucaso, Turkestan, Caspian Sea, Russia, Iran, China, and South Africa. Limonium is grown in several regions of the world for use as a cut flower for both fresh and dry-flower arrangements.In this work, RAPD analyses were used for the study of genetic relationships in Limonium. Thirteen wild species were tested with 10 primers. A total of 244 bands were scored and used for the analysis of genetic distances. The dendrogram obtained from cluster analysis showed high similarity among three species that some authors report as synonymous and that appeared very similar from our previous phenotypic observations (L. caspia, L. bellidifolium and L. otolepis). In order to clarify the genetic relationships, further analyses were carried out on several genotypes belonging to these species. The new dendrogram, obtained scoring 151 RAPD bands, showed that the genotypes did not group in clear clusters. Analysis of molecular variance (AMOVA) confirmed this trend: the highest genetic variation resulted among genotypes and only 6,58 % of the total variation resulted among the species. These results suggest that the species can be considered synonymous. The use of RAPD markers in our case was thus useful for clarifying the highly probable identity of the three Limonium species, in a plant genus that is notably of difficult interpretation.

Journal Article
TL;DR: A comparative study of methods for clustering long-term temporal data suggested that complete-linkage (CL) criterion outperformed average- linkage (AL) criterion in terms of the interpret-ability of a dendrogram and clustering results.
Abstract: This paper presents a comparative study of methods for clustering long-term temporal data. We split a clustering procedure into two processes: similarity computation and grouping. As similarity computation methods, we employed dynamic time warping (DTW) and multiscale matching. As grouping methods, we employed conventional agglomerative hierarchical clustering (AHC) and rough sets-based clustering (RC). Using various combinations of these methods, we performed clustering experiments of the hepatitis data set and evaluated validity of the results. The results suggested that (1) complete-linkage (CL) criterion outperformed average-linkage (AL) criterion in terms of the interpret-ability of a dendrogram and clustering results, (2) combination of DTW and CL-AHC constantly produced interpretable results, (3) combination of DTW and RC would be used to find the core sequences of the clusters, (4) multiscale matching may suffer from the treatment of 'no-match' pairs, however, the problem may be eluded by using RC as a subsequent grouping method.

Journal Article
TL;DR: In the dendrogram, a tendency of clustering following a North-South gradient could be observed, the results implied that genetic distance of five populations of Oxya chinensis correlated with geographical distance to some degree.

Journal Article
TL;DR: Genetic distance among 55 varities showed that germplasm of near relationship was near in geographic distribution and genetic diversity index showed that Shannon-Weaver index and Simpson index of B.campestris L.camptris from Xinhuang county was the highest among 14 counties and the result of two indexes was consistent except Hanshou county.
Abstract: 55 samples were amplified with 20 random primers, 322 RAPD alleles were got .There were 220 polymorphic alleles in the 322 RAPD alleles, polymorphic rate was 68.3%. Using 322 marker alleles with UPMGA method, dendrogram on 55 B.campestris L. from Hunan province was constructed. Genetic distance among 55 varities showed that germplasm of near relationship was near in geographic distribution. It also revealed that the genetic distance between Luxi sweat rapeseed and Xupu rapeseed was the smallest, 0.0398. And their relationship was the closest. RAPD dendrogram was consisted of three groups of Ⅰand Ⅱand Ⅲ, which were made up of 49, 2,4 varieties respectively. Dendrogram of agronomic trait on 55 B.campestris L. was consisted of two groups of A and B, which were made up of 27 and 28 varieties respectively. Genetic diversity index showed that Shannon-Weaver index and Simpson index of B.camptris from Xinhuang county was the highest among 14 counties and that of B.campetris from Dongkou county was the lowest. The result of two indexes was consistent except Hanshou county.

Proceedings ArticleDOI
14 Mar 2004
TL;DR: In this article, a new type of aggregate called frequency aggregate is defined, which has a vector data type and can be used to record not only the observed values but also the distribution of the values of an attribute.
Abstract: This paper proposes a propositional method for hierarchical model-based clustering of relational data. We define a new type of aggregate -- frequency aggregate, which has a vector data type and can be used to record not only the observed values but also the distribution of the values of an attribute. A hierarchical agglomerative clustering algorithm with log-likelihood distance is then applied to cluster the aggregated data tentatively, and a mixture model-based method with the EM algorithm is developed to perform a further relocation clustering, in which Bayes Information Crieterion is used to determine the optimal number of clusters.

Journal ArticleDOI
TL;DR: Low correlation was found between physiologically based dendrogram and phylogenetic analysis constructed from an alignment of rDNA sequences, and the influence of genotype, physiological variability, environmental location and habitat on metabolite production is discussed.
Abstract: Sixteen isolates of Claviceps spp. were analyzed for the production of polysaccharides, oligosaccharides, and sucrose metabolism under conditions of submerged fermentation. Physiological markers calculated by the Verhulst-Pearl law were used for hierarchical cluster analysis. Low correlation was found between physiologically based dendrogram and phylogenetic analysis constructed from an alignment of rDNA sequences. To confirm the intraspecific uniformity of physiological markers three isolates of C. africana from different hosts and locations were included. The influence of genotype, physiological variability, environmental location and habitat on metabolite production is discussed.

Proceedings ArticleDOI
12 Apr 2004
TL;DR: Using various combinations of comparison methods and grouping methods, clustering experiments of the hepatitis data set suggested that complete-linkage criterion in agglomerative hierarchical clustering (AHC) outperformed average- linkage (AL) criterion in terms of the interpretability of a dendrogram and clustering results.
Abstract: This paper presents a comparative study about the characteristics of clustering methods for inhomogeneous time-series medical datasets. Using various combinations of comparison methods and grouping methods, we performed clustering experiments of the hepatitis data set and evaluated validity of the results. The results suggested that (1) complete-linkage (CL) criterion in agglomerative hierarchical clustering (AHC) outperformed average-linkage (AL) criterion in terms of the interpretability of a dendrogram and clustering results, (2) combination of dynamic time warping (DTW) and CL-AHC constantly produced interpretable results, (3) combination of DTW and rough clustering (RC) would be used to find the core sequences of the clusters, (4) multiscale matching may suffer from the treatment of 'no-match' pairs, however, the problem may be eluded by using RC as a subsequent grouping method.

Dissertation
01 Jan 2004
TL;DR: Assessment of diversity among 47 chilli germplasm accessions belonging to the species C. annuum revealed absence of significant geographical associations and the use of a larger number of SSRs with greater genome coverage could help to reveal genetic diversity more accurately.
Abstract: The progress in plant breeding rests on the availability of genetic diversity. Assessment of genetic variability has impact both in terms of crop improvement as well as for the efficient conservation and management of genetic resources. Information on genetic diversity can be obtained through DNA fingerprinting approaches which are capable of analyzing large number of loci with extensive variability. The present study was initiated to assess diversity among 47 chilli germplasm accessions belonging to the species C. annuum collected from different sources along with four released varieties. Twenty five selected SSR primers were used. All the primers were initially evaluated for their annealing temperature using a gradient PCR technique. A total of 59 alleles were detected with an average of 2.5 alleles per locus. The maximum number of alleles detected was five in two primers and one in 10 primers. Among the 25 primers screened, 11 primers exhibited a large number of null-alleles and hence only 14 informative primer pairs were selected for cluster analysis. At these 14 loci, a total of 31 alleles were scored including 8 monomorphic loci. The pair-wise similarity based on dice-coefficient for all the 47 accessions ranged from 0.73 to 1. The cophenetic correlation co-efficient (r) computed between the observed distances and the dendrogram was 0.83 which indicates a good fit between observed distances and the dendrogram. The clustering revealed absence of significant geographical associations. Further, AFLP diversity of 47 chilli accessions was studied using seven primer combinations, which yielded a total of 5458 scorable bands at 237 loci of which 12% were monomorphic. Average number of bands / primer combinations ranged from 7.57 (ES2/MS4) to 45.2 (ES1/MS2). Pair-wise similarity coefficient among the 47 accessions ranged from 0.68 to 0.85. The cophenetic correlation co-efficient (r) computed between the observed distances and the dendrogram was 0.69. The cophenetic correlation coefficient (r) between the observed and the PCoA plot was 0.54. Geographic associations were better indicated by the AFLPs rather than the SSRs. The high polymorphism between the accessions obtained with AFLPs may be a reflection of the highly divergent group of paprika chillies included in this study. The use of a larger number of SSRs with greater genome coverage could help to reveal genetic diversity more accurately. CHAPTER –