scispace - formally typeset
Search or ask a question

Showing papers on "Dendrogram published in 2021"


Journal ArticleDOI
TL;DR: This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index, which allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production.
Abstract: Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.

16 citations


Journal ArticleDOI
TL;DR: SSR analyses indicate that Yozgat province has an important genetic diversity pool and rich genetic variance of walnuts, and five major clusters supporting the PCoA results.
Abstract: The food needs for increasing population, climatic changes, urbanization and industrialization, along with the destruction of forests, are the main challenges of modern life. Therefore, it is very important to evaluate plant genetic resources in order to cope with these problems. Therefore, in this study, a set of ninety-one walnut (Juglans regia L.) accessions from Central Anatolia region, composed of seventy-four accessions and eight commercial cultivars from Turkey, and nine international reference cultivars, was analyzed using 45 SSR (Simple Sequence Repeats) markers to reveal the genetic diversity. SSR analysis identified 390 alleles for 91 accessions. The number of alleles per locus ranged from 3 to 19 alleles with a mean value of 9 alleles per locus. Genetic dissimilarity coefficients ranged from 0.03 to 0.68. The highest number of alleles was obtained from CUJRA212 locus (Na = 19). The values of polymorphism information content (PIC) ranged from 0.42 (JRHR222528) to 0.86 (CUJRA212) with a mean PIC value of 0.68. Genetic distances were estimated according to the UPGMA (Unweighted Pair Group Method with Arithmetic Average), Principal Coordinates (PCoA), and the Structure-based clustering. The UPGMA and Structure clustering of the accessions depicted five major clusters supporting the PCoA results. The dendrogram revealed the similarities and dissimilarities among the accessions by identifying five major clusters. Based on this study, SSR analyses indicate that Yozgat province has an important genetic diversity pool and rich genetic variance of walnuts.

14 citations


Journal ArticleDOI
TL;DR: The results of this study prove that the proposed clustering methods can intuitively provide reasonable and consistent results for the authors' example data, thereby enabling us to completely comprehend the results of the clustering method using interval-valued dissimilarity, via the arrow-dendrogram.

7 citations


Journal ArticleDOI
30 Mar 2021
TL;DR: In this paper, a selection of sesame landraces of different eco-geographical origin and breeding history have been characterized using 28 qualitative morpho-physiological descriptors and seven expressed sequence tag-simple sequence repeat (EST-SSR) markers coupled with a high-resolution melting (HRM) analysis.
Abstract: A selection of sesame (Sesamum indicum L.) landraces of different eco-geographical origin and breeding history have been characterized using 28 qualitative morpho-physiological descriptors and seven expressed sequence tag-simple sequence repeat (EST-SSR) markers coupled with a high-resolution melting (HRM) analysis. The most variable qualitative traits that could efficiently discriminate landraces, as revealed by the correlation analyses, were the plant growth type and position of the branches, leaf blade width, stem pubescence, flowering initiation, capsule traits and seed coat texture. The agglomerative hierarchical clustering analysis based on a dissimilarity matrix highlighted three main groups among the sesame landraces. An EST-SSR marker analysis revealed an average polymorphism information content (PIC) value of 0.82, which indicated that the selected markers were highly polymorphic. A principal coordinate analysis and dendrogram reconstruction based on the molecular data classified the sesame genotypes into four major clades. Both the morpho-physiological and molecular analyses showed that landraces from the same geographical origin were not always grouped in the same cluster, forming heterotic groups; however, clustering patterns were observed for the Greek landraces. The selective breeding of such traits could be employed to unlock the bottleneck of local phenotypic diversity and create new cultivars with desirable traits.

7 citations


Posted ContentDOI
07 Feb 2021-bioRxiv
TL;DR: Xuegong et al. as mentioned in this paper proposed a fast hierarchical graph-based clustering (HGC) method for single-cell data, which combines the advantages of graph based clustering and hierarchical clustering on the shared nearest neighbor graph of cells.
Abstract: Clustering is a key step in revealing heterogeneities in single-cell data Cell heterogeneity can be explored at different resolutions and the resulted varying cell states are inherently nested However, most existing single-cell clustering methods output a fixed number of clusters without the hierarchical information Classical hierarchical clustering provides dendrogram of cells, but cannot scale to large datasets due to the high computational complexity We present HGC, a fast Hierarchical Graph-based Clustering method to address both problems It combines the advantages of graph-based clustering and hierarchical clustering On the shared nearest neighbor graph of cells, HGC constructs the hierarchical tree with linear time complexity Experiments showed that HGC enables multiresolution exploration of the biological hierarchy underlying the data, achieves state-of-the-art accuracy on benchmark data, and can scale to large datasets HGC is freely available for academic use at https://wwwgithubcom/XuegongLab/HGC Contact zhangxg@tsinghuaeducn, stevenhuakui@gmailcom

7 citations


Journal ArticleDOI
TL;DR: In this paper, 10 randomly amplified polymorphic DNA (RAPD) and 6 ISSR markers were employed to assess genetic divergence among micro propagated, wild and field cultivated plants of Gloriosa superba collected from different parts of India.
Abstract: Gloriosa superba L., an endangered medicinal plant with global interest due to presence of colchicine, an important alkaloid used in formulations of Indian and Traditional medicine. The plant has become endangered due to its unscientifically exploitation and high medicinal values. In the Present study 10 randomly amplified polymorphic DNA (RAPD) and 6 ISSR markers were employed to assess genetic divergence among micro propagated, wild and field cultivated plants of Gloriosa superba collected from different parts of India. In RAPD analysis, all the 10 accession with 10 RAPD primers amplified 466 fragments, with 96.43 % polymorphism and with an average of 46.6 bands per primer. The size of amplicons varied from 1656 to 100 bp. While, ISSR primers produced 328 fragments of which 298 were polymorphic with an average of 49.7 bands per primer with 91.83% polymorphism. The size of amplicons ranges from 2395 to 181 bp. RAPD, ISSR markers were also assessed by calculating polymorphic information content (PIC) to discriminate the genotypes, Average PIC value for RAPD, ISSR and combined RAPD + ISSR markers obtained was ≤ 0.50 suggesting the informativeness of markers. Jaccard's coefficient ranges from 0.18 to 0.75 (RAPD) and 0.17 to 0.61 (ISSR) and 0.21-0.52 for pooled ISSR and RAPD markers. The clustering pattern based on UPGMA analysis of the genotypes in the combined analysis revealed that the majority of the genotypes remained similar to the ISSR dendrogram, while the RAPD-based dendrogram showed some variation in the clustering of genotypes. The result of PCA scattered plot obtained were in agreement with the UPGMA dendrogram, which further confirms the genetic relationships explain by cluster analysis. Results confirmed that the genotype studied had good genetic diversity and can be used for identification, conservation, and future breeding program of Gloriosa species and consequently for the benefit of the pharmaceutical industries.

6 citations


Journal ArticleDOI
07 Jun 2021-Forests
TL;DR: This is the first investigation employing molecular markers in capirona in Peru considering its natural distribution, and it is hoped that this helps to pave the way towards its genetic improvement and the urgent sustainable management of forests in Peru.
Abstract: Capirona (Calycophyllum spruceanum Benth.) is a tree species of commercial importance widely distributed in South American forests that is traditionally used for its medicinal properties and wood quality. Studies on this tree species have been focused mainly on wood properties, propagation, and growth. However, genetic studies on capirona have been very limited to date. Currently, it is possible to explore genetic diversity and population structure in a fast and reliable manner by using molecular markers. We here used 10 random amplified polymorphic DNA (RAPD) markers to analyze the genetic diversity and population structure of 59 samples of capirona that were sampled from four provinces located in the eastern region of the Peruvian amazon. A total of 186 bands were manually scored, generating a 59 × 186 presence/absence matrix. A dendrogram was generated using the UPGMA clustering algorithm, and, similar to the principal coordinate analysis (PCoA), it showed four groups that correspond to the geographic origin of the capirona samples (LBS, Irazola, Masisea, Inapari). Similarly, a discriminant analysis of principal components (DAPC) and STRUCTURE analysis confirmed that capirona is grouped into four clusters. However, we also noticed that a few samples were intermingled. Genetic diversity estimation was conducted considering the four groups (populations) identified by STRUCTURE software. AMOVA revealed the greatest variation within populations (71.56%) and indicated that variability among populations is 28.44%. Population divergence (Fst) between clusters 1 and 4 revealed the highest genetic difference (0.269), and the lowest Fst was observed between clusters 3 and 4 (0.123). RAPD markers were successful and effective. However, more studies are needed, employing other molecular tools. To the best of our knowledge, this is the first investigation employing molecular markers in capirona in Peru considering its natural distribution, and as such it is hoped that this helps to pave the way towards its genetic improvement and the urgent sustainable management of forests in Peru.

6 citations


Journal ArticleDOI
TL;DR: A dendrograms-based method for 3D visualization of hierarchical clustering for multidimensional data which can be collected from IoT devices and open databases is developed and significantly improves the quality of visualization and evaluation of cluster analysis results.

5 citations


Journal ArticleDOI
31 Mar 2021
TL;DR: In this paper, a PCR-based genetic platform was established to examine the hierarchical polar dendrogram of Euclidean genetic distances of one tailfin anchovy population, especially for Coilia nasus, by connecting with specifically designed oligonucleotide primer sets.
Abstract: The author established a PCR-based genetic platform to examine the hierarchical polar dendrogram of Euclidean genetic distances of one tailfin anchovy population, especially for Coilia nasus, which was further associated with other fish population, by connecting with specifically designed oligonucleotide primer sets. Five oligonucleotide primers were used to generate a total of 260 and 211 scorable fragments in Coilia populations I and II, respectively. The DNA fragments ranged from greater than (approximately) 100 to more than 2,000 bp. The average bandsharing values (BS) of individuals from the anchovy population I (0.693) displayed higher values than individuals from population II (0.675). The genetic distance between individuals established the existence of a close relationship in group II. Comparatively, individuals of one anchovy population were fairly related to other fish populations, as shown in the polar hierarchical dendrogram of Euclidean genetic distances. The noteworthy genetic distance determined between two Coilia nasus populations demonstrates that this PCR technique can be applied as one of the several devices for individuals and/or population biological DNA researches undertaken for safeguarding species and for production of anchovies in the littoral area of Korea.

5 citations


Journal ArticleDOI
01 Apr 2021-Agronomy
TL;DR: Based on the results, SSRs can be applied as a trustworthy tool for the evaluation of genetic diversity in terebinth genotypes and will promote the germplasm collection and the selection of the populations in future studies on tere binths for genetic mapping, genetic diversity, Germplasm characterization, and rootstock breeding.
Abstract: Genetic diversity and relationships of 54 wild-grown terebinths (Pistacia terebinthus L.) were determined using 40 SSR (simple sequence repeat) markers (38 in silico polymorphic SSR markers and 2 SSR markers). In silico polymorphic SSR analysis, 430 alleles were identified. The number of alleles per locus ranged from 3 to 25 with a mean value of 11 alleles per locus. The values of polymorphism information content (PIC) ranged from 0.34 (CUPOhBa4344) to 0.91 (CUPSiBa4072) with a mean PIC value of 0.68. Genetic distances were estimated according to the UPGMA (Unweighted Pair Group Method with Arithmetic Average), the Structure, and Principal Coordinates (PCoA) based clustering. The structure analysis and UPGMA clustering of the genotypes depicted two major clusters. PCoA results supported cluster analysis results. The dendrogram revealed two major clusters. Forty-two samples were obtained from the Kazankaya canyon and 12 samples from the Karanlikdere region. The two regions are 130 km apart from each other but in a dendrogram, we did not find geographical isolation. The results proved the efficiency of SSRs for genetic diversity analysis in the terebinth. Based on the results, SSRs can be applied as a trustworthy tool for the evaluation of genetic diversity in terebinth genotypes. Molecular analysis on the terebinth genotypes in this study will promote the germplasm collection and the selection of the populations in future studies on terebinths for genetic mapping, genetic diversity, germplasm characterization, and rootstock breeding.

4 citations


Journal ArticleDOI
TL;DR: The results show that lettuce landrace ‘Ljubljanska ledenka’ is not genetically uniform and is represented by a variety of genotypes, supported by the results obtained in morphological and molecular studies.
Abstract: A set of accessions of lettuce landrace ‘Ljubljanska ledenka’ (Lactuca sativa L.) was characterized by morphological and molecular markers and for resistance to Bremia lactucae, with the aim of assessing the variability of the collection, exploring the genetic structure and potentially identifying the characters responsible for differentiation of the accessions. Wide phenotypic variation was observed among 51 accessions screened for 26 morphological and phenological traits. UPGMA cluster analysis and principal component analysis based on phenotypic data enabled the studied accessions to be divided into four clusters. The most important character by which the largest two clusters were differentiated was anthocyanin coloration. The clustering pattern of the AFLP dendrogram where two major clusters were identified was similar but not identical to the pattern of the phenotypic dendrogram. The Mantel test showed a high correlation between the phenotypic and the molecular data obtained (r = 0.67). In spite of the weak genetic differentiation between the accessions, STRUCTURE analysis based on AFLP data provided a clear indication for the existence of sub-groups within the two clusters. Thirty-four accessions were screened for resistance to 12 races of B. lactucae. The results show that the accessions very frequently express various reaction patterns of race specificity. Expression of race-specificity was not uniform across the set of accessions and at least 11 different reaction patterns were recorded, indicating that different race-specific resistance factors (R-factors) or genes (Dm genes) could be expected. This conclusion is supported by the results obtained in morphological and molecular studies, showing that lettuce landrace ‘Ljubljanska ledenka’ is not genetically uniform and is represented by a variety of genotypes.

Journal ArticleDOI
09 Mar 2021
TL;DR: This study determined the phenomic diversity of locally available, affordable and climate-resilient cultivated and wild Crotalaria species for breeding purposes in Kenya to form the basis for exploring novel breeding strategies among cultivated species, between wild and cultivated CroTalaria species as well as determining molecular markers linked to phenomic traits.
Abstract: Diversification of global food systems through exploration of traditional varieties and wild edible plant species is a focal mitigation strategy for food security worldwide. The present study determined the phenomic diversity of locally available, affordable and climate-resilient cultivated and wild Crotalaria species for breeding purposes. Seed samples were collected from different administrative counties in Kenya spanning different climatic zones. Other seeds were provided by the Genetic Resources Research Institute of Kenya. A randomized complete block design with three replications was used for agro-morphological evaluation of the 83 accessions used in this study. Data on quantitative and qualitative traits was collected. Cluster analysis on R and R-studio was used to generate a dendrogram by the Euclidian genetic distance and dissimilarity indices while the non-metric multidimensional scaling (NMDS) method was used to determine the spatial interrelationship between the accessions. The Pearson’s correlation coefficients were used to determine the relationships between qualitative and quantitative traits while the principal component analysis was used to discriminate the accessions. Three edible species (C. brevidens Benth., C. ochroleuca G.Don, C. trichotoma Bojer.) were found to be cultivated by Kenyan farmers and a significant variation (p < 0.0001) for all parameters under study was recorded. Agglomerative hierarchical clustering grouped the accessions into 8 major clusters. The NMDS ordination formed 15 and 6 groups based on counties and regions respectively. This study forms the basis for exploring novel breeding strategies among cultivated species, between wild and cultivated Crotalaria species as well as determining molecular markers linked to phenomic traits.

Journal ArticleDOI
15 Sep 2021
TL;DR: In this paper, the authors used Simple Sequence Repeat (SSR) markers developed in M. truncatula to detect polymorphism in six M. tunetana accessions.
Abstract: In order to characterize and conserve the endemic pastoral species Medicago tunetana, many prospecting missions were carried out in mountainous regions of the Tunisian ridge. Twenty-seven eco-geographical and morphological traits were studied for six M. tunetana accessions and followed by molecular analysis using seven Simple Sequence Repeat (SSR). Only five markers were polymorphic and reproductible in the six M. tunetana populations. A total of 54 alleles were observed with an average of 10.8 bands/primer/genotype. Mean Polymorphism Information Content (PIC), Nei gene diversity (h) Shannon’s information index (I) indicated the high level of polymorphism. The generated dendrogram with hierarchical UPGMA cluster analysis grouped accessions into two main groups with various degree of subclustring. All the studied accessions shared 57% of genetic similarity. Analysis of variance showed high significant difference between morphological traits among M. tunetana populations where MT3 from Kesra showed different morphological patterns regarding leaf, pod and seeds traits. Canonical correspondence analysis (CCA) showed two principal groups of M. tunetana populations based on potassium, total and active lime contents in soil. Our results suggest that SSR markers developed in M. truncatula could be a valuable tool to detect polymorphism in M. tunetana. Furthermore, the studied morphological markers showed a large genetic diversity among M. tunetana populations. This approach may be applicable for the analysis of intra specific variability in M. tunetana accessions. Our study could help in the implementation of an effective and integrated conservation programs of perennial endemic Medicago.

Journal ArticleDOI
TL;DR: It was found that all 27 landraces had absent or very weak pubescence of panultimate leaf blade, broad decorated seed width and non-waxy kernel, however, they were highly polymorphic for the other 26 characters studied, indicating a significant level of polymorphism on molecular levels.
Abstract: This study was carried out to evaluate Turkish rice landraces, in 2016 and 2017. Twenty-nine morphological traits were used for morphological evolution and 10 SSR markers were used for molecular evolution in 27 varieties. Based on morphological dendrogram, the landraces were divided in to 11 groups at 5 level differences. It was found that all 27 landraces had absent or very weak pubescence of panultimate leaf blade, broad decorated seed width and non-waxy kernel, however, they were highly polymorphic for the other 26 characters studied. In total 51 alleles were produced by screening with SSRs and among the markers; RM552 and RM287 were highly polymorphic for Turkish rice landraces with 11 and 8 alleles respectively. Average allele number was 5.1 and PIC ranged from 0.36 to 0.84. The UPGMA cluster dendrogram generated by using SSRs information and cluster grouped the 27 landraces in 2 major clusters. A significant level of polymorphism on molecular levels was observed. The study shows that some landraces with same local name were very distant from each other, while some local varieties with different names were same landraces.

Journal ArticleDOI
TL;DR: The genetic diversity of 85 genotypes belonging to 20 species from the native flora of the Hatay, province of south-central part of Turkey, was analyzed using six microsatellite markers derived from S. officinalis to see their cross-species amplification in different Salvia species.

Proceedings ArticleDOI
Sayan Roy1, Binoy Sasmal1
19 May 2021
TL;DR: In this paper, the use of two databases to help analyze football, the FIFA football video game dataset and Stats Bomb Open data, has been proposed to visualize a link between player positions using Principal Component Analysis, combining all clustering approaches for football player positions (PCA).
Abstract: In the last century, football has been the world's most common sport, but very little is understood about its structure. There are many incidents in each season that neither experts nor fans can reliably classify or forecast, despite the sport being continuously discussed and reported worldwide. We propose in this paper the use of two databases to help analyze football, the FIFA football video game dataset and Stats Bomb Open data. We have incorporated the clustering capability of two unsupervised learning clustering methods: Hierarchical and Expectation Maximization Clustering. To visualize a link between player positions using Principal Component Analysis, we have combined all clustering approaches for football player positions (PCA). We use 4 and 11 clusters in these visualizations that lead to player positions in the field. We use purity and the silhouette score to interpret Hierarchical and Expectation Maximization clustering. Results show that Hierarchical Clustering classifies the data better than Expectation Maximization.

Proceedings ArticleDOI
27 Jan 2021
TL;DR: In this paper, the authors show that specific higher dimensional shape information of point cloud data can be recovered by observing lower dimensional hierarchical clustering dynamics, and they compare differences between these diagrams using the bottleneck metric, and examine the resulting distribution.
Abstract: We show that specific higher dimensional shape information of point cloud data can be recovered by observing lower dimensional hierarchical clustering dynamics. We generate multiple point samples from point clouds and perform hierarchical clustering within each sample to produce dendrograms. From these dendrograms, we take cluster evolution and merging data that capture clustering behavior to construct simplified diagrams that record the lifetime of clusters akin to what zero dimensional persistence diagrams do in topological data analysis. We compare differences between these diagrams using the bottleneck metric, and examine the resulting distribution. Finally, we show that statistical features drawn from these bottleneck distance distributions detect artefacts of, and can be tapped to recover higher dimensional shape characteristics.

Journal ArticleDOI
23 Aug 2021-Vegetos
TL;DR: Within the narrow base of the winged bean genotypes studied; four main groups were observed with EC142666 being most divergent; followed by two diverse groups comprising of 3 and 4 genotypes respectively; with the other genotypes were very close to each other indicating genetic bottleneck.
Abstract: Winged bean (Psophocarpus tetragonolobus (L.) DC) is a nutritious tropical legume known for its high protein content in the seeds. Twenty winged bean genotypes were studied for the genetic diversity using Inter-simple sequence repeat (ISSR) markers. On ISSR analysis, 124 bands were generated by 11 informative primers, of which 76 bands were polymorphic and 3 were unique. The PIC values of ISSR primers studied ranged from 0.06 to 0.39; resolving power (Rp) values ranged from 0.9 to 4.7 and the mean number of bands per locus ranged from 9.6 to 18.2. The mean number of bands per accession ranged from 3.9 to 13.7. Narrow genetic base was revealed with the similarity values amongst genotypes ranging from 0.72 to 0.98; with 64 combinations having similarity value more than 0.90 and another 105 genotypic combinations having similarity index in between 0.80 and 0.90. The dendrogram showed two major clusters with 4 groups; with cluster-I being mono-genotypic (EC142666); while two little bit divergent groups could be clearly distinguished from highly homogenous core twelve genotypes. The principal component analysis (PCA) derived pattern was almost in accordance with the clustering pattern based dendrogram. Within the narrow base of the winged bean genotypes studied; four main groups were observed with EC142666 being most divergent; followed by two diverse groups comprising of 3 (EC-178268/EC-178269/EC-178311) and 4 (EC-121921/EC-178291/EC-178310/EC-251020) genotypes respectively; with the other genotypes were very close to each other indicating genetic bottleneck. This necessitates the need for identification of divergence germplasm for breeding to broaden the genetic base.

Posted Content
TL;DR: In this article, the authors propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram.
Abstract: We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram. Specifically, we propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram. Current approaches to these tasks lead to loss of information since they require the user to generate a single partition of the instances by cutting the dendrogram at a specified level. Our proposed methods, instead, use the full structure of the dendrogram. The key insight behind the proposed methods is to view a dendrogram as a phylogeny. This analogy permits the assignment of a feature value to each internal node of a tree through ancestral state reconstruction. Real and simulated datasets provide evidence that our proposed framework has desirable outcomes. We provide an R package that implements our methods.

Posted Content
TL;DR: Zhang et al. as mentioned in this paper proposed a hierarchical clustering algorithm, in which, while building the clustering dendrogram, they can effectively detect the representative point based on scoring the reciprocal nearest data points in each sub-minimum spanning tree.
Abstract: One of the main challenges for hierarchical clustering is how to appropriately identify the representative points in the lower level of the cluster tree, which are going to be utilized as the roots in the higher level of the cluster tree for further aggregation. However, conventional hierarchical clustering approaches have adopted some simple tricks to select the "representative" points which might not be as representative as enough. Thus, the constructed cluster tree is less attractive in terms of its poor robustness and weak reliability. Aiming at this issue, we propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point based on scoring the reciprocal nearest data points in each sub-minimum-spanning-tree. Extensive experiments on UCI datasets show that the proposed algorithm is more accurate than other benchmarks. Meanwhile, under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data with less time and storage consumptions.

Journal ArticleDOI
TL;DR: This study revealed the potential markers with high diversity that can be used to determine genetically homogenous/heterogeneous landraces and revealed the use of PV-ctt001, PV-ag001, and PV-at003 could be beneficial in future breeding, conservation, and marker-assisted selection studies.
Abstract: The morpho-agronomic and genetic studies recorded variations in vegetative and reproductive traits, and in molecular information through population structure and clustering approaches among South African Phaseolus vulgaris landraces. Phaseolus vulgaris L., commonly known as common beans, is widely used for its edible leaves, immature pods, and dry seeds. Studies on variation in morphology and genetics among P. vulgaris landraces are limited in South Africa. Therefore, the current study aimed to determine the morpho-agronomic and genetic variations among P. vulgaris landraces. Thirty-eight landraces from different agro-ecological origins, planted in a randomized complete block design, had their variation in vegetative and reproductive traits determined. These landraces were studied for their genetic diversity using simple sequence repeat (SSR) markers. The landraces were clustered in a biplot and dendrogram based on their seed coats, shape, similar morpho-agronomic traits, and their areas of origin. A total of 57 alleles were produced with a mean of 3.64 per SSR locus. The polymorphism information content ranged from 0.00 to 0.58. The population structure had the highest delta value K = 2, thus the 38 landraces were divided into two subpopulations based on the Bayesian approach. The population structure showed an overlap among the landraces as several from the Mesoamerican carried some seed traits or genes from the Andean gene pool, and showed a high level of admixtures. The principal coordinate analysis and the dendrogram had a similar clustering pattern as the population structure. This study revealed the potential markers with high diversity that can be used to determine genetically homogenous/heterogeneous landraces. Therefore, the use of PV-ctt001, PV-ag001, and PV-at003 could be beneficial in future breeding, conservation, and marker-assisted selection studies.