scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks

TL;DR: In this paper, the results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance (p value < 0.05).
Abstract: Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective of this research was to test and optimize different analysis methods based on ML for the prioritization and selection of morphological descriptors of Rubus spp. 55 descriptors were evaluated in 26 genotypes and the weight of each one and its ability to discriminating capacity was determined. ML methods as random forest (RF), support vector machines, in the linear and radial forms, and neural networks were optimized and compared. Subsequently, the results were validated with two discriminating methods and their variants: hierarchical agglomerative clustering and K-means. The results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance (p value < 0.05). Additionally, K-means method with optimized descriptors based on RF had greater discriminating power on Rubus spp., accessions according to evaluated statistics. This study presents one application of ML for the optimization of specific morphological variables for plant germplasm bank characterization.
Citations
More filters
Journal ArticleDOI
TL;DR: The objective of this research was to evaluate the use of multispectral cameras on UAVs to discriminate vascular wilt caused by Verticillium spp.
Abstract: The rapid and precise detection of diseases and plant disorders is the basis for the adequate and timely design of management strategies. Currently, there are several non-destructive alternatives that allow early detection, highlighting the use of spectral cameras attached to unmanned aerial vehicles (UAVs). The objective of this research was to evaluate the use of multispectral cameras on UAVs to discriminate vascular wilt caused by Verticillium spp., (VW), waterlogging stress (WL), and an unknown alteration (UA) in commercial potato (Solanum tuberosum) variety “Diacol Capiro” crops. Plots were monitored during the crop cycle, performing the visual characterization of the diseases and disorders present. Five spectral band images were acquired using a MicaSense RedEdge spectral camera attached to a Map-T680 hexacopter drone to extract the bands and calculate the vegetation indices that were calibrated and evaluated to determine their ability to discriminate between diseased and healthy plants based on a generalized linear model (GLM) and Kappa index. Additionally, the supervised random forest classification method was implemented, optimized, and evaluated using the accuracy, area under receiver operating characteristic curve (ROC-AUC), kappa index, and inference error based on k-fold cross-validation. After algorithms optimization our results show a classifier accuracy, kappa and ROC-AUC values to VW, WL and UA between 73.5–82.5%, 0.56–0.71, 0.97–0.98, and 35 37.5–51.9%, 0.07–0.06, and 0.88–0.94 for plots 1 and 2, respectively. This study reports an approach to the use of multispectral cameras attached to UAVs as a tool with potential for the detection of diseases and physiological disorders in commercial potato crops.

12 citations

Journal ArticleDOI
27 Feb 2023-Foods
TL;DR: In this paper , the physicochemical and antioxidant properties of honey produced by two species of bees (Melipona eburnea and Apis mellifera) in two seasons were evaluated.
Abstract: Honey is a functional food used worldwide and recognized for its multiple health benefits. In the present study, the physicochemical and antioxidant properties of honey produced by two species of bees (Melipona eburnea and Apis mellifera) in two seasons were evaluated. In addition, the antimicrobial activity of honey against three bacterial strains was studied. The quality of honey analyzed by LDA (linear discriminant analysis) showed four clusters mediated by the interaction, the bee species, and the collection season resulting from a multivariate function of discrimination. The physicochemical properties of the honey produced by A. mellifera met the requirements of the Codex Alimentarius, while the M. eburnea honey had moisture values outside the established ranges of the Codex. Antioxidant activity was higher in the honey of A. mellifera, and both kinds of honey showed inhibitory activity against S. typhimurium ATCC 14028 and L. monocytogenes ATCC 9118. E. coli ATCC 25922 showed resistance to the analyzed honey.

1 citations

Book ChapterDOI
01 Jan 2023
TL;DR: Mora de Castilla is an important breeding material due to its large fruit size, low chilling requirements, excellent fruit quality, aroma, small drupelets, seeds, everbearing production habit, and resistance to root diseases as discussed by the authors .
Abstract: The germplasm of Rubus is highly diverse and has important features for cultivated plant improvement, such as resistance to diseases and pests. Hybrids of Mora de Castilla have been observed in the wild. Rubus glaucus Benth was an allopolyploid obtained by hybridization between a black raspberry and a South American blackberry. Mora de Castilla is an important breeding material due to its large fruit size, low chilling requirements, excellent fruit quality, aroma, small drupelets, seeds, everbearing production habit, and resistance to root diseases. Particularly, Mora de Castilla has great potential as a parent to improve the size and quality of cultivated Rubus species. Mora de Castilla cultivars have been selected for breeding and improvement purposes in Colombia and Ecuador. These countries have implemented breeding programs that have identified and selected high-yield plant materials. Wild-type Mora de Castilla has been bred among commercial cultivars to improve their quality features. Germplasm repositories are important for maintaining cultivar diversity.
Journal ArticleDOI
TL;DR: In this article , a line of P. vulgaris, P. acutifolius and P. parviflius accessions and their crosses were sown in the mesh house according to CIAT seed regeneration procedures.
Abstract: Introduction Evaluations of interspecific hybrids are limited, as classical genebank accession descriptors are semi-subjective, have qualitative traits and show complications when evaluating intermediate accessions. However, descriptors can be quantified using recognized phenomic traits. This digitalization can identify phenomic traits which correspond to the percentage of parental descriptors remaining expressed/visible/measurable in the particular interspecific hybrid. In this study, a line of P. vulgaris, P. acutifolius and P. parvifolius accessions and their crosses were sown in the mesh house according to CIAT seed regeneration procedures. Methodology Three accessions and one derived breeding line originating from their interspecific crosses were characterized and classified by selected phenomic descriptors using multivariate and machine learning techniques. The phenomic proportions of the interspecific hybrid (line INB 47) with respect to its three parent accessions were determined using a random forest and a respective confusion matrix. Results The seed and pod morphometric traits, physiological behavior and yield performance were evaluated. In the classification of the accession, the phenomic descriptors with highest prediction force were Fm’, Fo’, Fs’, LTD, Chl, seed area, seed height, seed Major, seed MinFeret, seed Minor, pod AR, pod Feret, pod round, pod solidity, pod area, pod major, pod seed weight and pod weight. Physiological traits measured in the interspecific hybrid present 2.2% similarity with the P. acutifolius and 1% with the P. parvifolius accessions. In addition, in seed morphometric characteristics, the hybrid showed 4.5% similarity with the P. acutifolius accession. Conclusions Here we were able to determine the phenomic proportions of individual parents in their interspecific hybrid accession. After some careful generalization the methodology can be used to: i) verify trait-of-interest transfer from P. acutifolius and P. parvifolius accessions into their hybrids; ii) confirm selected traits as “phenomic markers” which would allow conserving desired physiological traits of exotic parental accessions, without losing key seed characteristics from elite common bean accessions; and iii) propose a quantitative tool that helps genebank curators and breeders to make better-informed decisions based on quantitative analysis.
Journal ArticleDOI
TL;DR: In this paper , the spatial distribution of Aeneolamia varia in commercial sugarcane fields and validate machine learning tools for indirect injury detection and impact on yield (damage) using satellite images.
Abstract: Abstract The spittlebug ( Aeneolamia varia ) is one of the most important sugarcane pests in Colombia, where a recent increase in population and distribution specially in southwestern Colombia have led to the need for new technologies for integrated pest management. The objectives of this study were to determine the spatial distribution of this pest in commercial sugarcane fields and to validate machine learning (ML) tools for indirect injury detection and impact on yield (damage) using satellite images. This study was carried out in fields grown with the CC 01-1940 variety in El Cerrito, Valle del Cauca, Colombia, where systematic sampling of the populations (number of adults and nymphs per stem) was carried out. The spatial aggregation and distribution were determined using Moran’s index and point patterns, sequence observations, and analysis with distance indicators (Sadie). The indirect injury detection and quantification of the impact on production were carried out with a ML approach using satellite image products with 10 m spatial and five days temporal resolutions, obtained from a Sentinel-2 sensor using Google Earth Engine. The results indicated that spittlebug populations had an aggregate spatial behavior and high spatial dependence. In addition, the ML algorithms predicted spittlebug injury, and the effect on production was estimated at 26.4 tons of cane per hectare, which represented a 17% reduction in the expected yield. The use of spatial analysis and remote sensing tools are an alternative for indirect detection of injury and for understanding population dynamics of the pest in sugarcane, so they can become instrumental for decision-making on an integrated pest management program.
References
More filters
Journal ArticleDOI
TL;DR: AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities.

5,359 citations

Journal ArticleDOI
17 Jul 2015-Science
TL;DR: The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing.
Abstract: Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today’s most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing.

4,545 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method called the "gap statistic" for estimating the number of clusters (groups) in a set of data, which uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution.
Abstract: We propose a method (the ‘gap statistic’) for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution. Some theory is developed for the proposal and a simulation study shows that the gap statistic usually outperforms other methods that have been proposed in the literature.

4,283 citations

Journal ArticleDOI
TL;DR: The survey work and case studies will be useful for all those involved in developing software for data analysis using Ward’s hierarchical clustering method.
Abstract: The Ward error sum of squares hierarchical clustering method has been very widely used since its first description by Ward in a 1963 publication. It has also been generalized in various ways. Two algorithms are found in the literature and software, both announcing that they implement the Ward clustering method. When applied to the same distance matrix, they produce different results. One algorithm preserves Ward's criterion, the other does not. Our survey work and case studies will be useful for all those involved in developing software for data analysis using Ward's hierarchical clustering method.

2,331 citations

Journal ArticleDOI
Mahesh Pal1
TL;DR: It is suggested that the random forest classifier performs equally well to SVMs in terms of classification accuracy and training time and the number of user‐defined parameters required byrandom forest classifiers is less than the number required for SVMs and easier to define.
Abstract: Growing an ensemble of decision trees and allowing them to vote for the most popular class produced a significant increase in classification accuracy for land cover classification. The objective of this study is to present results obtained with the random forest classifier and to compare its performance with the support vector machines (SVMs) in terms of classification accuracy, training time and user defined parameters. Landsat Enhanced Thematic Mapper Plus (ETM+) data of an area in the UK with seven different land covers were used. Results from this study suggest that the random forest classifier performs equally well to SVMs in terms of classification accuracy and training time. This study also concludes that the number of user‐defined parameters required by random forest classifiers is less than the number required for SVMs and easier to define.

2,255 citations