scispace - formally typeset
Search or ask a question

Showing papers on "Selection (genetic algorithm) published in 2013"


Journal ArticleDOI
TL;DR: In this paper, the authors investigate mathematically and empirically which of the existing threshold selection methods can be used confidently with presence-only data and show that Max SSS is a promising threshold selection method for threshold selection when only presence data are available.
Abstract: Aim Species distribution models have been widely used to tackle ecological, evolutionary and conservation problems. Most species distribution modelling techniques produce continuous suitability predictions, but many real applications (e.g. reserve design, species invasion and climate change impact assessment) and model evaluations require binary outputs, and thresholds are needed for these transformations. Although there are many threshold selection methods for presence/absence data, it is unclear whether these are suitable for presence-only data. In this paper, we investigate mathematically and empirically which of the existing threshold selection methods can be used confidently with presence-only data. Location We used real spatially explicit environmental data derived from the western part of the state of Victoria, south-eastern Australia, and simulated species distributions within this area. Methods Thirteen existing threshold selection methods were investigated mathematically to see whether the same threshold can be produced using either presence/absence data or presence-only data. We further adopted a simulation approach, created many virtual species with differing prevalences in a real landscape in south-eastern Australia, generated data sets with different proportions of pseudo-absences, built eight types of models with four modelling techniques, and investigated the behaviours of four threshold selection methods in these situations. Results Three threshold selection methods were not affected by pseudo-absences, including max SSS (which is based on maximizing the sum of sensitivity and specificity), the prevalence of model training data and the mean predicted value of a set of random points. Max SSS produced higher sensitivity in most cases and higher true skill statistic and kappa in many cases than the other methods. The other methods produced different thresholds from presence-only data to those determined from presence/absence data. Main conclusions Max SSS is a promising method for threshold selection when only presence data are available.

947 citations


Journal ArticleDOI
TL;DR: This work presents an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes, and leaves the distribution of selection parameters essentially unconstrained.
Abstract: Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes. This leaves the distribution of selection parameters essentially unconstrained, and also allows sites experiencing positive and purifying selection to be identified orders of magnitude faster than by existing methods. We demonstrate that popular random effects likelihood methods can produce misleading results when sites assigned to the same site class experience different levels of positive or purifying selection—an unavoidable scenario when using a small number of site classes. Our Fast Unconstrained Bayesian AppRoximation (FUBAR) is unaffected by this problem, while achieving higher power than existing unconstrained (fixed effects likelihood) methods. The speed advantage of FUBAR allows us to analyze larger data sets than other methods: We illustrate this on a large influenza hemagglutinin data set (3,142 sequences). FUBAR is available as a batch file within the latest HyPhy distribution (http://www.hyphy.org), as well as on the Datamonkey web server (http://www.datamonkey.org/).

939 citations


Journal ArticleDOI
TL;DR: It is shown that selection likely acts on distinct targets or multiple functionally equivalent alleles in different portions of the geographic range of wheat, suggesting either weak selection pressure or temporal variation in the targets of directional selection during breeding probably associated with changing agricultural practices or environmental conditions.
Abstract: Domesticated crops experience strong human-mediated selection aimed at developing high-yielding varieties adapted to local conditions. To detect regions of the wheat genome subject to selection during improvement, we developed a high-throughput array to interrogate 9,000 gene-associated single-nucleotide polymorphisms (SNP) in a worldwide sample of 2,994 accessions of hexaploid wheat including landraces and modern cultivars. Using a SNP-based diversity map we characterized the impact of crop improvement on genomic and geographic patterns of genetic diversity. We found evidence of a small population bottleneck and extensive use of ancestral variation often traceable to founders of cultivars from diverse geographic regions. Analyzing genetic differentiation among populations and the extent of haplotype sharing, we identified allelic variants subjected to selection during improvement. Selective sweeps were found around genes involved in the regulation of flowering time and phenology. An introgression of a wild relative-derived gene conferring resistance to a fungal pathogen was detected by haplotype-based analysis. Comparing selective sweeps identified in different populations, we show that selection likely acts on distinct targets or multiple functionally equivalent alleles in different portions of the geographic range of wheat. The majority of the selected alleles were present at low frequency in local populations, suggesting either weak selection pressure or temporal variation in the targets of directional selection during breeding probably associated with changing agricultural practices or environmental conditions. The developed SNP chip and map of genetic variation provide a resource for advancing wheat breeding and supporting future population genomic and genome-wide association studies in wheat.

871 citations


Journal ArticleDOI
TL;DR: A systematic literature review on articles published from 2008 to 2012 on the application of DM techniques for supplier selection is provided by using a methodological decision analysis in four aspects including decision problems, decision makers, decision environments, and decision approaches.
Abstract: Despite the importance of decision-making (DM) techniques for construction of effective decision models for supplier selection, there is a lack of a systematic literature review for it. This paper provides a systematic literature review on articles published from 2008 to 2012 on the application of DM techniques for supplier selection. By using a methodological decision analysis in four aspects including decision problems, decision makers, decision environments, and decision approaches, we finally selected and reviewed 123 journal articles. To examine the research trend on uncertain supplier selection, these articles are roughly classified into seven categories according to different uncertainties. Under such classification framework, 26 DM techniques are identified from three perspectives: (1) Multicriteria decision making (MCDM) techniques, (2) Mathematical programming (MP) techniques, and (3) Artificial intelligence (AI) techniques. We reviewed each of the 26 techniques and analyzed the means of integrating these techniques for supplier selection. Our survey provides the recommendation for future research and facilitates knowledge accumulation and creation concerning the application of DM techniques in supplier selection.

825 citations


05 Mar 2013
TL;DR: For many applications, a randomized algorithm is either the simplest or the fastest algorithm available, and sometimes both. as discussed by the authors introduces the basic concepts in the design and analysis of randomized algorithms and provides a comprehensive and representative selection of the algorithms that might be used in each of these areas.
Abstract: For many applications, a randomized algorithm is either the simplest or the fastest algorithm available, and sometimes both. This book introduces the basic concepts in the design and analysis of randomized algorithms. The first part of the text presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications. Algorithmic examples are also given to illustrate the use of each tool in a concrete setting. In the second part of the book, each chapter focuses on an important area to which randomized algorithms can be applied, providing a comprehensive and representative selection of the algorithms that might be used in each of these areas. Although written primarily as a text for advanced undergraduates and graduate students, this book should also prove invaluable as a reference for professionals and researchers.

785 citations


Journal ArticleDOI
TL;DR: The R package GA is described, a collection of general purpose functions that provide a flexible set of tools for applying a wide range of genetic algorithm methods, ranging from mathematical functions in one and two dimensions known to be hard to optimize with standard derivative-based methods, to some selected statistical problems which require the optimization of user defined objective functions.
Abstract: Genetic algorithms (GAs) are stochastic search algorithms inspired by the basic principles of biological evolution and natural selection. GAs simulate the evolution of living organisms, where the fittest individuals dominate over the weaker ones, by mimicking the biological mechanisms of evolution, such as selection, crossover and mutation. GAs have been successfully applied to solve optimization problems, both for continuous (whether differentiable or not) and discrete functions. This paper describes the R package GA, a collection of general purpose functions that provide a flexible set of tools for applying a wide range of genetic algorithm methods. Several examples are discussed, ranging from mathematical functions in one and two dimensions known to be hard to optimize with standard derivative-based methods, to some selected statistical problems which require the optimization of user defined objective functions. (This paper contains animations that can be viewed using the Adobe Acrobat PDF viewer.)

599 citations


Journal ArticleDOI
TL;DR: The dominant features that could drive differences in linked selection among species are characterized — including roles for selective sweeps being 'hard' or 'soft' — and the concealing effects of demography and confounding genomic variables are characterized.
Abstract: Population genetics theory supplies powerful predictions about how natural selection interacts with genetic linkage to sculpt the genomic landscape of nucleotide polymorphism. Both the spread of beneficial mutations and the removal of deleterious mutations act to depress polymorphism levels, especially in low-recombination regions. However, empiricists have documented extreme disparities among species. Here we characterize the dominant features that could drive differences in linked selection among species--including roles for selective sweeps being 'hard' or 'soft'--and the concealing effects of demography and confounding genomic variables. We advocate targeted studies of closely related species to unify our understanding of how selection and linkage interact to shape genome evolution.

444 citations


Journal ArticleDOI
TL;DR: The data demonstrate that even under static conditions, there is a moment-to-moment reweighting of attentional priorities based on object properties, revealed through rhythmic patterns of visual-target detection both within (at 8 Hz) and between (at 4 Hz).

342 citations


Journal ArticleDOI
TL;DR: It is shown that breaking with current trends in pre-processing is essential, as all selection approaches have serious drawbacks and cannot be properly used.
Abstract: Data pre-processing is an essential part of chemometric data analysis, which aims to remove unwanted variation (such as instrumental artifacts) and thereby focusing on the variation of interest. The choice of an optimal pre-processing method or combination of methods may strongly influence the analysis results, but is far from straightforward, since it depends on the characteristics of the data set and the goal of data analysis. This first critical review is devoted to the selection procedure for appropriate pre-processing strategies. We show that breaking with current trends in pre-processing is essential, as all selection approaches have serious drawbacks and cannot be properly used.

338 citations


Journal ArticleDOI
21 Aug 2013-Nature
TL;DR: It is found that an allele conferring larger horns, Ho+, is associated with higher reproductive success, whereas a smaller horn allele, HoP, confers increased survival, resulting in a net effect of overdominance for fitness at RXFP2.
Abstract: Sexual selection, through intra-male competition or female choice, is assumed to be a source of strong and sustained directional selection in the wild. In the presence of such strong directional selection, alleles enhancing a particular trait are predicted to become fixed within a population, leading to a decrease in the underlying genetic variation. However, there is often considerable genetic variation underlying sexually selected traits in wild populations, and consequently, this phenomenon has become a long-discussed issue in the field of evolutionary biology. In wild Soay sheep, large horns confer an advantage in strong intra-sexual competition, yet males show an inherited polymorphism for horn type and have substantial genetic variation in their horn size. Here we show that most genetic variation in this trait is maintained by a trade-off between natural and sexual selection at a single gene, relaxin-like receptor 2 (RXFP2). We found that an allele conferring larger horns, Ho(+), is associated with higher reproductive success, whereas a smaller horn allele, Ho(P), confers increased survival, resulting in a net effect of overdominance (that is, heterozygote advantage) for fitness at RXFP2. The nature of this trade-off is simple relative to commonly proposed explanations for the maintenance of sexually selected traits, such as genic capture ('good genes') and sexually antagonistic selection. Our results demonstrate that by identifying the genetic architecture of trait variation, we can determine the principal mechanisms maintaining genetic variation in traits under strong selection and explain apparently counter-evolutionary observations.

300 citations


Journal ArticleDOI
TL;DR: It is shown, through theoretical analysis, that the proposed framework not only encompasses many widely used feature selection criteria, but also naturally overcomes their common weakness in handling feature redundancy.
Abstract: In the literature of feature selection, different criteria have been proposed to evaluate the goodness of features. In our investigation, we notice that a number of existing selection criteria implicitly select features that preserve sample similarity, and can be unified under a common framework. We further point out that any feature selection criteria covered by this framework cannot handle redundant features, a common drawback of these criteria. Motivated by these observations, we propose a new "Similarity Preserving Feature Selection” framework in an explicit and rigorous way. We show, through theoretical analysis, that the proposed framework not only encompasses many widely used feature selection criteria, but also naturally overcomes their common weakness in handling feature redundancy. In developing this new framework, we begin with a conventional combinatorial optimization formulation for similarity preserving feature selection, then extend it with a sparse multiple-output regression formulation to improve its efficiency and effectiveness. A set of three algorithms are devised to efficiently solve the proposed formulations, each of which has its own advantages in terms of computational complexity and selection performance. As exhibited by our extensive experimental study, the proposed framework achieves superior feature selection performance and attractive properties.

Journal ArticleDOI
TL;DR: A review of the existing literature on green supplier selection can be found in this paper, where 60 articles are reviewed, all published in peer-reviewed journals between 1991 and 2011, and a conceptual model is presented, aimed at integrating the different dimensions of green suppliers selection and identifying directions for future research.

Journal ArticleDOI
TL;DR: The genomic selection methodology is a form of marker-assisted selection on a genome-wide scale, and the discovery of large numbers of single-nucleotide markers and cost effective methods to genotype them.
Abstract: Three recent breakthroughs have resulted in the current widespread use of DNA information: the genomic selection (GS) methodology, which is a form of marker-assisted selection on a genome-wide scale, and the discovery of large numbers of single-nucleotide markers and cost effective methods to genotype them. GS estimates the effect of thousands of DNA markers simultaneously. Nonlinear estimation methods yield higher accuracy, especially for traits with major genes. The marker effects are estimated in a genotyped and phenotyped training population and are used for the estimation of breeding values of selection candidates by combining their genotypes with the estimated marker effects. The benefits of GS are greatest when selection is for traits that are not themselves recorded on the selection candidates before they can be selected. In the future, genome sequence data may replace SNP genotypes as markers. This could increase GS accuracy because the causative mutations should be included in the data.

Journal ArticleDOI
TL;DR: It is suggested that classifications of selection based on distinction between the form of competition or the components of fitness that are involved introduce unnecessary complexities and that the most useful approach in understanding the evolution and distribution of differences and similarities between the sexes is to compare the operation of selection in males and females in different reproductive systems.
Abstract: During the latter half of the last century, evidence of reproductive competition between males and male selection by females led to the development of a stereotypical view of sex differences that characterized males as competitive and aggressive, and females as passive and choosy, which is currently being revised. Here, we compare social competition and its consequences for selection in males and females and argue that similar selection processes operate in both sexes and that contrasts between the sexes are quantitative rather than qualitative. We suggest that classifications of selection based on distinction between the form of competition or the components of fitness that are involved introduce unnecessary complexities and that the most useful approach in understanding the evolution and distribution of differences and similarities between the sexes is to compare the operation of selection in males and females in different reproductive systems.

Journal ArticleDOI
TL;DR: A new method to aggregate the opinion of experts or decision makers on different criteria, regarding a set of alternatives, where the expert opinion is represented by hesitant fuzzy linguistic term sets is proposed.
Abstract: We propose a new method to aggregate the opinion of experts or decision makers on different criteria, regarding a set of alternatives, where the opinion of the experts is represented by hesitant fuzzy linguistic term sets. An illustrative example is provided to elaborate the proposed method for selection of the best alternative.

Journal ArticleDOI
TL;DR: In this article, a variant of stability selection, called complementary pairs stability selection (CPSS), is introduced, and bounds are derived on the expected number of variables included by CPSS that have low selection probability under the original procedure.
Abstract: Summary. Stability selection was recently introduced by Meinshausen and Buhlmann as a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. We introduce a variant, called complementary pairs stability selection, and derive bounds both on the expected number of variables included by complementary pairs stability selection that have low selection probability under the original procedure, and on the expected number of high selection probability variables that are excluded. These results require no (e.g. exchangeability) assumptions on the underlying model or on the quality of the original selection procedure. Under reasonable shape restrictions, the bounds can be further tightened, yielding improved error control, and therefore increasing the applicability of the methodology.

Journal ArticleDOI
Graham Bell1
TL;DR: Neither the standing genetic variation of small populations nor the mutation supply of large populations, however, may be sufficient to provide evolutionary rescue for most populations.
Abstract: Populations subject to severe stress may be rescued by natural selection, but its operation is restricted by ecological and genetic constraints. The cost of natural selection expresses the limited capacity of a population to sustain the load of mortality or sterility required for effective selection. Genostasis expresses the lack of variation that prevents many populations from adapting to stress. While the role of relative fitness in adaptation is well understood, evolutionary rescue emphasizes the need to recognize explicitly the importance of absolute fitness. Permanent adaptation requires a range of genetic variation in absolute fitness that is broad enough to provide a few extreme types capable of sustained growth under a stress that would cause extinction if they were not present. This principle implies that population size is an important determinant of rescue. The overall number of individuals exposed to selection will be greater when the population declines gradually under a constant stress, or is progressively challenged by gradually increasing stress. In gradually deteriorating environments, survival at lethal stress may be procured by prior adaptation to sublethal stress through genetic correlation. Neither the standing genetic variation of small populations nor the mutation supply of large populations, however, may be sufficient to provide evolutionary rescue for most populations.

Journal ArticleDOI
TL;DR: It is suggested that Cudeck and Henly's (1991) framework can be applied to guide the selection process for exploratory factor analysis, and it is recommended that researchers more thoroughly consider what they mean by “the right number of factors” before they choose fit indices.
Abstract: A central problem in the application of exploratory factor analysis is deciding how many factors to retain (m). Although this is inherently a model selection problem, a model selection perspective is rarely adopted for this task. We suggest that Cudeck and Henly's (1991) framework can be applied to guide the selection process. Researchers must first identify the analytic goal: identifying the (approximately) correct m or identifying the most replicable m. Second, researchers must choose fit indices that are most congruent with their goal. Consistent with theory, a simulation study showed that different fit indices are best suited to different goals. Moreover, model selection with one goal in mind (e.g., identifying the approximately correct m) will not necessarily lead to the same number of factors as model selection with the other goal in mind (e.g., identifying the most replicable m). We recommend that researchers more thoroughly consider what they mean by "the right number of factors" before they choose fit indices.

Journal ArticleDOI
01 Jan 2013-Genetics
TL;DR: A new statistic, denoted hapFLK, which focuses instead on the differences of haplotype frequencies between populations, and is robust with respect to bottlenecks and migration and improves over existing approaches in many situations.
Abstract: The detection of molecular signatures of selection is one of the major concerns of modern population genetics. A widely used strategy in this context is to compare samples from several populations and to look for genomic regions with outstanding genetic differentiation between these populations. Genetic differentiation is generally based on allele frequency differences between populations, which are measured by FST or related statistics. Here we introduce a new statistic, denoted hapFLK, which focuses instead on the differences of haplotype frequencies between populations. In contrast to most existing statistics, hapFLK accounts for the hierarchical structure of the sampled populations. Using computer simulations, we show that each of these two features-the use of haplotype information and of the hierarchical structure of populations-significantly improves the detection power of selected loci and that combining them in the hapFLK statistic provides even greater power. We also show that hapFLK is robust with respect to bottlenecks and migration and improves over existing approaches in many situations. Finally, we apply hapFLK to a set of six sheep breeds from Northern Europe and identify seven regions under selection, which include already reported regions but also several new ones. We propose a method to help identifying the population(s) under selection in a detected region, which reveals that in many of these regions selection most likely occurred in more than one population. Furthermore, several of the detected regions correspond to incomplete sweeps, where the favorable haplotype is only at intermediate frequency in the population(s) under selection.

Journal ArticleDOI
TL;DR: Analytical and numerical results show that buffer-aided relaying with adaptive link selection achieves significant throughput gains compared to conventional relaying protocols with and without buffers where the relay employs a fixed schedule for reception and transmission.
Abstract: In this paper, we consider a simple network consisting of a source, a half-duplex decode-and-forward relay, and a destination. We propose a new relaying protocol employing adaptive link selection, i.e., in any given time slot, based on the channel state information of the source-relay and the relay-destination link a decision is made whether the source or the relay transmits. In order to avoid data loss at the relay, adaptive link selection requires the relay to be equipped with a buffer such that data can be queued until the relay-destination link is selected for transmission. We study both delay-constrained and delay-unconstrained transmission. For the delay-unconstrained case, we characterize the optimal link selection policy, derive the corresponding throughput, and develop an optimal power allocation scheme. For the delay-constrained case, we propose to starve the buffer of the relay by choosing the decision threshold of the link selection policy smaller than the optimal one and derive a corresponding upper bound on the average delay. Furthermore, we propose a modified link selection protocol which avoids buffer overflow by limiting the queue size. Our analytical and numerical results show that buffer-aided relaying with adaptive link selection achieves significant throughput gains compared to conventional relaying protocols with and without buffers where the relay employs a fixed schedule for reception and transmission.

Journal ArticleDOI
TL;DR: This paper attempts to clarify the concepts and terminology used in animal resource studies by illustrating the relationships among these various concepts and providing their statistical underpinnings.
Abstract: 1. During the last decade, there has been a proliferation of statistical methods for studying resource selection by animals. While statistical techniques are advancing at a fast pace, there is confusion in the conceptual understanding of the meaning of various quantities that these statistical techniques provide. 2. Terms such as selection, choice, use, occupancy and preference often are employed as if they are synonymous. Many practitioners are unclear about the distinctions between different concepts such as 'probability of selection,' 'probability of use,' 'choice probabilities' and 'probability of occupancy'. 3. Similarly, practitioners are not always clear about the differences between and relevance of 'relative probability of selection' vs. 'probability of selection' to effective management. 4. Practitioners also are unaware that they are using only a single statistical model for modelling resource selection, namely the exponential probability of selection, when other models might be more appropriate. Currently, such multimodel inference is lacking in the resource selection literature. 5. In this paper, we attempt to clarify the concepts and terminology used in animal resource studies by illustrating the relationships among these various concepts and providing their statistical underpinnings.

Journal ArticleDOI
TL;DR: It is shown that the proposed antenna selection based SM systems are capable of attaining a significant gain in signal-to-noise ratio (SNR) compared to conventional SM systems, and also outperform the conventional MIMO systems employing antenna selection at both low and medium SNRs.
Abstract: Novel transmit antenna selection techniques are conceived for Spatial Modulation (SM) systems and their symbol error rate (SER) performance is investigated. Specifically, low-complexity Euclidean Distance optimized Antenna Selection (EDAS) and Capacity Optimized Antenna Selection (COAS) are studied. It is observed that the COAS scheme gives a better SER performance than the EDAS scheme. We show that the proposed antenna selection based SM systems are capable of attaining a significant gain in signal-to-noise ratio (SNR) compared to conventional SM systems, and also outperform the conventional MIMO systems employing antenna selection at both low and medium SNRs.

Journal ArticleDOI
TL;DR: Polyandry can reduce a male's ability to monopolize females, and thus weaken male focused sexual selection, and estimates of sexual selection intensity rely heavily on measures of male mating success, but polyandry now raises serious questions over the validity of such approaches.
Abstract: The Darwin–Bateman paradigm recognizes competition among males for access to multiple mates as the main driver of sexual selection. Increasingly, however, females are also being found to benefit from multiple mating so that polyandry can generate competition among females for access to multiple males, and impose sexual selection on female traits that influence their mating success. Polyandry can reduce a male's ability to monopolize females, and thus weaken male focused sexual selection. Perhaps the most important effect of polyandry on males arises because of sperm competition and cryptic female choice. Polyandry favours increased male ejaculate expenditure that can affect sexual selection on males by reducing their potential reproductive rate. Moreover, sexual selection after mating can ameliorate or exaggerate sexual selection before mating. Currently, estimates of sexual selection intensity rely heavily on measures of male mating success, but polyandry now raises serious questions over the validity of such approaches. Future work must take into account both pre- and post-copulatory episodes of selection. A change in focus from the products of sexual selection expected in males, to less obvious traits in females, such as sensory perception, is likely to reveal a greater role of sexual selection in female evolution.

Journal ArticleDOI
TL;DR: A large body of literature on linear mixed model selection methods based on four major approaches is reviewed, including information criteria such as AIC or BIC, shrinkage methodsbased on penalized loss functions such as LASSO, the Fence procedure and Bayesian techniques.
Abstract: Linear mixed effects models are highly flexible in handling a broad range of data types and are therefore widely used in applications. A key part in the analysis of data is model selection, which often aims to choose a parsimonious model with other desirable properties from a possibly very large set of candidate statistical models. Over the last 5-10 years the literature on model selection in linear mixed models has grown extremely rapidly. The problem is much more complicated than in linear regression because selection on the covariance structure is not straightforward due to computational issues and boundary problems arising from positive semidefinite constraints on covariance matrices. To obtain a better understanding of the available methods, their properties and the relationships between them, we review a large body of literature on linear mixed model selection. We arrange, implement, discuss and compare model selection methods based on four major approaches: information criteria such as AIC or BIC, shrinkage methods based on penalized loss functions such as LASSO, the Fence procedure and Bayesian techniques.

Journal ArticleDOI
TL;DR: The continuous application of traditional breeding methods in a given species could lead to the narrowing of the gene pool from which cultivars are drawn, rendering crops vulnerable to biotic and abiotic stresses and hampering future progress.
Abstract: Plant breeding can be broadly defined as alterations caused in plants as a result of their use by humans, ranging from unintentional changes resulting from the advent of agriculture to the application of molecular tools for precision breeding. The vast diversity of breeding methods can be simplified into three categories: (i) plant breeding based on observed variation by selection of plants based on natural variants appearing in nature or within traditional varieties; (ii) plant breeding based on controlled mating by selection of plants presenting recombination of desirable genes from different parents; and (iii) plant breeding based on monitored recombination by selection of specific genes or marker profiles, using molecular tools for tracking within-genome variation. The continuous application of traditional breeding methods in a given species could lead to the narrowing of the gene pool from which cultivars are drawn, rendering crops vulnerable to biotic and abiotic stresses and hampering future progre...

Journal ArticleDOI
Robert P. Sheridan1
TL;DR: Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building, and gives an R(2) that is more like that of true prospective prediction than the R(1) from random selection or from the analog of leave-class-out selection.
Abstract: Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R2) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.

Journal ArticleDOI
TL;DR: In this paper, a large body of literature on linear mixed model selection is reviewed, including information criteria such as AIC or BIC, shrinkage methods based on penalized loss functions such as LASSO, the Fence procedure and Bayesian techniques.
Abstract: Linear mixed effects models are highly flexible in handling a broad range of data types and are therefore widely used in applications. A key part in the analysis of data is model selection, which often aims to choose a parsimonious model with other desirable properties from a possibly very large set of candidate statistical models. Over the last 5–10 years the literature on model selection in linear mixed models has grown extremely rapidly. The problem is much more complicated than in linear regression because selection on the covariance structure is not straightforward due to computational issues and boundary problems arising from positive semidefinite constraints on covariance matrices. To obtain a better understanding of the available methods, their properties and the relationships between them, we review a large body of literature on linear mixed model selection. We arrange, implement, discuss and compare model selection methods based on four major approaches: information criteria such as AIC or BIC, shrinkage methods based on penalized loss functions such as LASSO, the Fence procedure and Bayesian techniques.

Journal ArticleDOI
TL;DR: It is found that selection tends to vary mainly in strength and less in direction among populations, which may limit the potential for ongoing adaptive population divergence.
Abstract: Local adaptation, adaptive population divergence and speciation are often expected to result from populations evolving in response to spatial variation in selection. Yet, we lack a comprehensive understanding of the major features that characterise the spatial patterns of selection, namely the extent of variation among populations in the strength and direction of selection. Here, we analyse a data set of spatially replicated studies of directional phenotypic selection from natural populations. The data set includes 60 studies, consisting of 3937 estimates of selection across an average of five populations. We performed meta-analyses to explore features characterising spatial variation in directional selection. We found that selection tends to vary mainly in strength and less in direction among populations. Although differences in the direction of selection occur among populations they do so where selection is often weakest, which may limit the potential for ongoing adaptive population divergence. Overall, we also found that spatial variation in selection appears comparable to temporal (annual) variation in selection within populations; however, several deficiencies in available data currently complicate this comparison. We discuss future research needs to further advance our understanding of spatial variation in selection.

Journal ArticleDOI
01 Jan 2013
TL;DR: A neighborhood-based collaborative filtering approach to predict such unknown values for QoS-based selection and has three new features: the adjusted-cosine-based similarity calculation to remove the impact of different QoS scale; a data smoothing process to improve prediction accuracy; and a similarity fusion approach to handle the data sparsity problem.
Abstract: Quality-of-service-based (QoS) service selection is an important issue of service-oriented computing. A common premise of previous research is that the QoS values of services to target users are supposed to be all known. However, many of QoS values are unknown in reality. This paper presents a neighborhood-based collaborative filtering approach to predict such unknown values for QoS-based selection. Compared with existing methods, the proposed method has three new features: 1) the adjusted-cosine-based similarity calculation to remove the impact of different QoS scale; 2) a data smoothing process to improve prediction accuracy; and 3) a similarity fusion approach to handle the data sparsity problem. In addition, a two-phase neighbor selection strategy is proposed to improve its scalability. An extensive performance study based on a public data set demonstrates its effectiveness.

Journal ArticleDOI
TL;DR: It is found that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars.
Abstract: Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed. A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits. The selection regions were not distributed randomly or uniformly throughout the genome. Instead, clusters of selection hotspots in certain genomic regions were observed. Moreover, a set of candidate genes (4.38% of the total annotated genes) significantly affected by selection underlying soybean domestication and genetic improvement were identified. Given the uniqueness of the soybean germplasm sequenced, this study drew a clear picture of human-mediated evolution of the soybean genomes. The genomic resources and information provided by this study would also facilitate the discovery of genes/loci underlying agronomically important traits.