Showing papers on "Selection (genetic algorithm) published in 2012"

PDF

Open Access

Journal Article•DOI•

PartitionFinder: Combined Selection of Partitioning Schemes and Substitution Models for Phylogenetic Analyses

[...]

Robert Lanfear¹, Brett Calcott¹, Simon Y. W. Ho², Stéphane Guindon³•Institutions (3)

Australian National University¹, University of Sydney², University of Auckland³

01 Jun 2012-Molecular Biology and Evolution

TL;DR: Two new objective methods for the combined selection of best-fit partitioning schemes and nucleotide substitution models are described and implemented in an open-source program, PartitionFinder, which it is hoped will encourage the objective selection of partitions and thus lead to improvements in phylogenetic analyses.

...read moreread less

Abstract: In phylogenetic analyses of molecular sequence data, partitioning involves estimating independent models of molecular evolution for different sets of sites in a sequence alignment. Choosing an appropriate partitioning scheme is an important step in most analyses because it can affect the accuracy of phylogenetic reconstruction. Despite this, partitioning schemes are often chosen without explicit statistical justification. Here, we describe two new objective methods for the combined selection of best-fit partitioning schemes and nucleotide substitution models. These methods allow millions of partitioning schemes to be compared in realistic time frames and so permit the objective selection of partitioning schemes even for large multilocus DNA data sets. We demonstrate that these methods significantly outperform previous approaches, including both the ad hoc selection of partitioning schemes (e.g., partitioning by gene or codon position) and a recently proposed hierarchical clustering method. We have implemented these methods in an open-source program, PartitionFinder. This program allows users to select partitioning schemes and substitution models using a range of information-theoretic metrics (e.g., the Bayesian information criterion, akaike information criterion [AIC], and corrected AIC). We hope that PartitionFinder will encourage the objective selection of partitioning schemes and thus lead to improvements in phylogenetic analyses. PartitionFinder is written in Python and runs under Mac OSX 10.4 and above. The program, source code, and a detailed manual are freely available from www.robertlanfear.com/partitionfinder.

...read moreread less

4,877 citations

Journal Article•DOI•

Detecting individual sites subject to episodic diversifying selection.

[...]

Ben Murrell¹, Ben Murrell², Joel O. Wertheim³, Sasha Moola², Thomas Weighill², Konrad Scheffler³, Konrad Scheffler², Sergei L. Kosakovsky Pond³ - Show less +4 more•Institutions (3)

Medical Research Council¹, Stellenbosch University², University of California, San Diego³

12 Jul 2012-PLOS Genetics

TL;DR: It is found that episodic selection is widespread and it is concluded that the number of sites experiencing positive selection may have been vastly underestimated.

...read moreread less

Abstract: The imprint of natural selection on protein coding genes is often difficult to identify because selection is frequently transient or episodic, i.e. it affects only a subset of lineages. Existing computational techniques, which are designed to identify sites subject to pervasive selection, may fail to recognize sites where selection is episodic: a large proportion of positively selected sites. We present a mixed effects model of evolution (MEME) that is capable of identifying instances of both episodic and pervasive positive selection at the level of an individual site. Using empirical and simulated data, we demonstrate the superior performance of MEME over older models under a broad range of scenarios. We find that episodic selection is widespread and conclude that the number of sites experiencing positive selection may have been vastly underestimated.

...read moreread less

1,327 citations

Journal Article•DOI•

Regression testing minimization, selection and prioritization: a survey

[...]

Shin Yoo¹, Mark Harman¹•Institutions (1)

King's College London¹

01 Mar 2012-Software Testing, Verification & Reliability

TL;DR: This paper surveys each area of minimization, selection and prioritization technique and discusses open problems and potential directions for future research.

...read moreread less

Abstract: Regression testing is a testing activity that is performed to provide confidence that changes do not harm the existing behaviour of the software. Test suites tend to grow in size as software evolves, often making it too costly to execute entire test suites. A number of different approaches have been studied to maximize the value of the accrued test suite: minimization, selection and prioritization. Test suite minimization seeks to eliminate redundant test cases in order to reduce the number of tests to run. Test case selection seeks to identify the test cases that are relevant to some set of recent changes. Test case prioritization seeks to order test cases in such a way that early fault detection is maximized. This paper surveys each area of minimization, selection and prioritization technique and discusses open problems and potential directions for future research. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

1,276 citations

Journal Article•DOI•

Selection models in accounting research

[...]

Clive S. Lennox¹, Jere R. Francis², Zi-Tian Wang¹•Institutions (2)

Nanyang Technological University¹, University of Missouri²

01 Mar 2012-The Accounting Review

TL;DR: In this paper, the authors explain the challenges associated with the Heckman (1979) procedure to control for selection bias, assess the quality of its application in accounting research, and offer guidance for better implementation of selection models.

...read moreread less

Abstract: This study explains the challenges associated with the Heckman (1979) procedure to control for selection bias, assesses the quality of its application in accounting research, and offers guidance for better implementation of selection models. A survey of 75 recent accounting articles in leading journals reveals that many researchers implement the technique in a mechanical way with relatively little appreciation of important econometric issues and problems surrounding its use. Using empirical examples motivated by prior research, we illustrate that selection models are fragile and can yield quite literally any possible outcome in response to fairly minor changes in model specification. We conclude with guidance on how researchers can better implement selection models that will provide more convincing evidence on potential selection bias, including the need to justify model specifications and careful sensitivity analyses with respect to robustness and multicollinearity. Data Availability: Data used...

...read moreread less

1,171 citations

Journal Article•DOI•

The productivity advantages of large cities: distinguishing agglomeration from firm selection

[...]

Pierre-Philippe Combes¹, Gilles Duranton², Laurent Gobillon³, Diego Puga⁴, Sébastien Roux⁵ - Show less +1 more•Institutions (5)

Aix-Marseille University¹, University of Pennsylvania², Institut national d'études démographiques³, CEMFI⁴, INSEE⁵

01 Nov 2012-Econometrica

TL;DR: In this paper, a generalised version of a tractable firm selection model and a standard model of agglomeration were used to show that firm selection cannot explain spatial productivity differences.

...read moreread less

Abstract: Firms are more productive on average in larger cities. Two main explanations have been offered: firm selection (larger cities toughen competition, allowing only the most productive to survive) and agglomeration economies (larger cities promote interactions that increase productivity), possibly reinforced by localised natural advantage. To distinguish between them, we nest a generalised version of a tractable firm selection model and a standard model of agglomeration. Stronger selection in larger cities left-truncates the productivity distribution whereas stronger agglomeration right-shifts and dilates the distribution. Using this prediction, French establishment level data, and a new quantile approach, we show that firm selection cannot explain spatial productivity differences. This result holds across sectors, city size thresholds, establishment samples, and area definitions.

...read moreread less

753 citations

Journal Article•DOI•

Genome-wide analysis of the world's sheep breeds reveals high levels of historic mixture and strong recent selection.

[...]

James W. Kijas¹, Johannes A. Lenstra², Ben J. Hayes, Simon Boitard³, Laercio R. Porto Neto¹, Magali San Cristobal³, Bertrand Servin³, Russell McCulloch¹, Vicki Whan¹, Kimberly Gietzen⁴, Samuel Rezende Paiva⁵, William Barendse¹, Elena Ciani⁶, Herman W. Raadsma⁷, John C. McEwan⁸, Brian P. Dalrymple¹ - Show less +12 more•Institutions (8)

Commonwealth Scientific and Industrial Research Organisation¹, Utrecht University², Institut national de la recherche agronomique³, Illumina⁴, Empresa Brasileira de Pesquisa Agropecuária⁵, University of Bari⁶, University of Sydney⁷, AgResearch⁸

07 Feb 2012-PLOS Biology

TL;DR: Genomic structure in a global collection of domesticated sheep reveals a history of artificial selection for horn loss and traits relating to pigmentation, reproduction, and body size.

...read moreread less

Abstract: Through their domestication and subsequent selection, sheep have been adapted to thrive in a diverse range of environments. To characterise the genetic consequence of both domestication and selection, we genotyped 49,034 SNP in 2,819 animals from a diverse collection of 74 sheep breeds. We find the majority of sheep populations contain high SNP diversity and have retained an effective population size much higher than most cattle or dog breeds, suggesting domestication occurred from a broad genetic base. Extensive haplotype sharing and generally low divergence time between breeds reveal frequent genetic exchange has occurred during the development of modern breeds. A scan of the genome for selection signals revealed 31 regions containing genes for coat pigmentation, skeletal morphology, body size, growth, and reproduction. We demonstrate the strongest selection signal has occurred in response to breeding for the absence of horns. The high density map of genetic variability provides an in-depth view of the genetic history for this important livestock species.

...read moreread less

684 citations

Journal Article•DOI•

Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study

[...]

Salvador García¹, Joaquín Derrac², José Ramón Cano, Francisco Herrera²•Institutions (2)

University of Jaén¹, University of Granada²

01 Mar 2012-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A taxonomy based on the main characteristics presented in prototype selection is proposed and an experimental study involving different sizes of data sets is conducted for measuring their performance in terms of accuracy, reduction capabilities, and runtime.

...read moreread less

Abstract: The nearest neighbor classifier is one of the most used and well-known techniques for performing recognition tasks. It has also demonstrated itself to be one of the most useful algorithms in data mining in spite of its simplicity. However, the nearest neighbor classifier suffers from several drawbacks such as high storage requirements, low efficiency in classification response, and low noise tolerance. These weaknesses have been the subject of study for many researchers and many solutions have been proposed. Among them, one of the most promising solutions consists of reducing the data used for establishing a classification rule (training data) by means of selecting relevant prototypes. Many prototype selection methods exist in the literature and the research in this area is still advancing. Different properties could be observed in the definition of them, but no formal categorization has been established yet. This paper provides a survey of the prototype selection methods proposed in the literature from a theoretical and empirical point of view. Considering a theoretical point of view, we propose a taxonomy based on the main characteristics presented in prototype selection and we analyze their advantages and drawbacks. Empirically, we conduct an experimental study involving different sizes of data sets for measuring their performance in terms of accuracy, reduction capabilities, and runtime. The results obtained by all the methods studied have been verified by nonparametric statistical tests. Several remarks, guidelines, and recommendations are made for the use of prototype selection for nearest neighbor classification.

...read moreread less

654 citations

Journal Article•DOI•

Roulette-wheel selection via stochastic acceptance

[...]

Adam Lipowski¹, Dorota Lipowska¹•Institutions (1)

Adam Mickiewicz University in Poznań¹

15 Mar 2012-Physica A-statistical Mechanics and Its Applications

TL;DR: A simple roulette-wheel selection algorithm is presented, which typically has O(1) complexity and is based on stochastic acceptance instead of searching, which might be suitable for highly heterogeneous weight distributions, found, for example, in some models of complex networks.

...read moreread less

Abstract: Roulette-wheel selection is a frequently used method in genetic and evolutionary algorithms or in modeling of complex networks Existing routines select one of N individuals using search algorithms of O ( N ) or O ( log N ) complexity We present a simple roulette-wheel selection algorithm, which typically has O ( 1 ) complexity and is based on stochastic acceptance instead of searching We also discuss a hybrid version, which might be suitable for highly heterogeneous weight distributions, found, for example, in some models of complex networks With minor modifications, the algorithm might also be used for sampling with fitness cut-off at a certain value or for sampling without replacement

...read moreread less

579 citations

Journal Article•DOI•

An Adaptive Differential Evolution Algorithm With Novel Mutation and Crossover Strategies for Global Numerical Optimization

[...]

Sk. Minhazul Islam¹, Swagatam Das, Saurav Ghosh², Subhrajit Roy¹, Ponnuthurai Nagaratnam Suganthan² - Show less +1 more•Institutions (2)

Jadavpur University¹, Nanyang Technological University²

01 Apr 2012

TL;DR: A new mutation strategy, a fitness- induced parent selection scheme for the binomial crossover of DE, and a simple but effective scheme of adapting two of its most important control parameters with an objective of achieving improved performance are proposed.

...read moreread less

Abstract: Differential evolution (DE) is one of the most powerful stochastic real parameter optimizers of current interest In this paper, we propose a new mutation strategy, a fitness- induced parent selection scheme for the binomial crossover of DE, and a simple but effective scheme of adapting two of its most important control parameters with an objective of achieving improved performance The new mutation operator, which we call DE/current-to-gr_best/1, js a variant of the classical DE/current-to-best/1 scheme It uses the best of a group (whose size is q% of the population size) of randomly selected solutions from current generation to perturb the parent (target) vector, unlike DE/current-to-best/1 that always picks the best vector of the entire population to perturb the target vector In our modified framework of recombination, a biased parent selection scheme has been incorporated by letting each mutant undergo the usual binomial crossover with one of the p top-ranked individuals from the current population and not with the target vector with the same index as used in all variants of DE A DE variant obtained by integrating the proposed mutation, crossover, and parameter adaptation strategies with the classical DE framework (developed in 1995) is compared with two classical and four state-of-the-art adaptive DE variants over 25 standard numerical benchmarks taken from the IEEE Congress on Evolutionary Computation 2005 competition and special session on real parameter optimization Our comparative study indicates that the proposed schemes improve the performance of DE by a large magnitude such that it becomes capable of enjoying statistical superiority over the state-of-the-art DE variants for a wide variety of test problems Finally, we experimentally demonstrate that, if one or more of our proposed strategies are integrated with existing powerful DE variants such as jDE and JADE, their performances can also be enhanced

...read moreread less

566 citations

Journal Article•DOI•

Modeling stabilizing selection: expanding the Ornstein-Uhlenbeck model of adaptive evolution.

[...]

Jeremy M. Beaulieu¹, Dwueng-Chwuan Jhwueng², Dwueng-Chwuan Jhwueng³, Carl Boettiger⁴, Brian C. O'Meara⁵ - Show less +1 more•Institutions (5)

Yale University¹, Feng Chia University², National Institute for Mathematical and Biological Synthesis³, University of California, Davis⁴, University of Tennessee⁵

01 Aug 2012-Evolution

TL;DR: The OU model of adaptive evolution is expanded to include models that variously relax the assumption of a constant rate and strength of selection, and can assign each selective regime a separate trait optimum, a rate of stochastic motion parameter, and a parameter for the strength ofselection.

...read moreread less

Abstract: Comparative methods used to study patterns of evolutionary change in a continuous trait on a phylogeny range from Brownian motion processes to models where the trait is assumed to evolve according to an Ornstein-Uhlenbeck (OU) process. Although these models have proved useful in a variety of contexts, they still do not cover all the scenarios biologists want to examine. For models based on the OU process, model complexity is restricted in current implementations by assuming that the rate of stochastic motion and the strength of selection do not vary among selective regimes. Here, we expand the OU model of adaptive evolution to include models that variously relax the assumption of a constant rate and strength of selection. In its most general form, the methods described here can assign each selective regime a separate trait optimum, a rate of stochastic motion parameter, and a parameter for the strength of selection. We use simulations to show that our models can detect meaningful differences in the evolutionary process, especially with larger sample sizes. We also illustrate our method using an empirical example of genome size evolution within a large flowering plant clade.

...read moreread less

546 citations

Journal Article•DOI•

A sensorimotor paradigm for Bayesian model selection

[...]

Tim Genewein¹, Daniel Braun¹•Institutions (1)

Max Planck Society¹

19 Oct 2012-Frontiers in Human Neuroscience

TL;DR: This work designs a sensorimotor task that requires subjects to compensate visuomotor shifts in a three-dimensional virtual reality setup and finds that model selection procedures based on Bayesian statistics provided a better explanation for subjects' choice behavior than simple non-probabilistic heuristics.

...read moreread less

Abstract: Sensorimotor control is thought to rely on predictive internal models in order to cope efficiently with uncertain environments. Recently, it has been shown that humans not only learn different internal models for different tasks, but that they also extract common structure between tasks. This raises the question of how the motor system selects between different structures or models, when each model can be associated with a range of different task-specific parameters. Here we design a sensorimotor task that requires subjects to compensate visuomotor shifts in a three-dimensional virtual reality setup, where one of the dimensions can be mapped to a model variable and the other dimension to the parameter variable. By introducing probe trials that are neutral in the parameter dimension, we can directly test for model selection. We found that model selection procedures based on Bayesian statistics provided a better explanation for subjects’ choice behavior than simple non-probabilistic heuristics. Our experimental design lends itself to the general study of model selection in a sensorimotor context as it allows to separately query model and parameter variables from subjects.

...read moreread less

Journal Article•DOI•

The evolution of female ornaments and weaponry: social selection, sexual selection and ecological competition.

[...]

Joe Tobias¹, Robert Montgomerie², Bruce E. Lyon³•Institutions (3)

University of Oxford¹, Queen's University², University of California, Santa Cruz³

19 Aug 2012-Philosophical Transactions of the Royal Society B

TL;DR: It is shown that selection often falls outside the limits of traditional sexual selection theory, particularly in females, and it is concluded that the evolution of these traits in both sexes is best understood within the unifying framework of social selection.

...read moreread less

Abstract: Ornaments, weapons and aggressive behaviours may evolve in female animals by mate choice and intrasexual competition for mating opportunities—the standard forms of sexual selection in males. However, a growing body of evidence suggests that selection tends to operate in different ways in males and females, with female traits more often mediating competition for ecological resources, rather than mate acquisition. Two main solutions have been proposed to accommodate this disparity. One is to expand the concept of sexual selection to include all mechanisms related to fecundity; another is to adopt an alternative conceptual framework—the theory of social selection—in which sexual selection is one component of a more general form of selection resulting from all social interactions. In this study, we summarize the history of the debate about female ornaments and weapons, and discuss potential resolutions. We review the components of fitness driving ornamentation in a wide range of systems, and show that selection often falls outside the limits of traditional sexual selection theory, particularly in females. We conclude that the evolution of these traits in both sexes is best understood within the unifying framework of social selection.

...read moreread less

Journal Article•DOI•

Buffer-Aided Relay Selection for Cooperative Diversity Systems without Delay Constraints

[...]

Ioannis Krikidis¹, Themistoklis Charalambous, John Thompson²•Institutions (2)

University of Patras¹, University of Edinburgh²

05 Apr 2012-IEEE Transactions on Wireless Communications

TL;DR: It is shown that the proposed relay selection scheme significantly outperforms conventional relay selection policies for all cases and ensures a diversity gain equal to two times the number of relays for large buffer sizes.

...read moreread less

Abstract: In this paper, we study the relay selection problem for a finite buffer-aided decode-and-forward cooperative wireless network. A relay selection policy that fully exploits the flexibility offered by the buffering ability of the relay nodes in order to maximize the achieved diversity gain is investigated. This new scheme incorporates the instantaneous strength of the wireless links as well as the status of the finite relay buffers and adapts the relay selection decision on the strongest available link by dynamically switching between relay reception and transmission. In order to analyse the new relay selection policy in terms of outage probability and diversity gain, a theoretical framework that models the evolution of the relay buffers as a Markov chain (MC) is introduced. The construction of the state transition matrix and the related steady state of the MC are studied and their impact on the derivation of the outage probability is investigated. We show that the proposed relay selection scheme significantly outperforms conventional relay selection policies for all cases and ensures a diversity gain equal to two times the number of relays for large buffer sizes.

...read moreread less

Journal Article•DOI•

TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

[...]

Anne-Claire Haury¹, Fantine Mordelet², Paola Vera-Licona³, Paola Vera-Licona¹, Paola Vera-Licona⁴, Jean-Philippe Vert³, Jean-Philippe Vert¹, Jean-Philippe Vert⁴ - Show less +4 more•Institutions (4)

Mines ParisTech¹, Duke University², Curie Institute³, French Institute of Health and Medical Research⁴

22 Nov 2012-BMC Systems Biology

TL;DR: A novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS, is introduced, which was ranked among the top GRN inference methods in the DREAM5 gene network inference challenge and was evaluated to be the best linear regression-based method in the challenge.

...read moreread less

Abstract: Background Inferring the structure of gene regulatory networks (GRN) from a collection of gene expression data has many potential applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy.

...read moreread less

Journal Article•DOI•

Accuracy of Genomic Selection Methods in a Standard Data Set of Loblolly Pine ( Pinus taeda L.)

[...]

Marcio F. R. Resende¹, Patricio R. Munoz¹, Marcos Deon Vilela de Resende², Marcos Deon Vilela de Resende³, Dorian J. Garrick⁴, Rohan L. Fernando⁴, John M. Davis¹, Eric J. Jokela¹, Timothy A. Martin¹, Gary F. Peter¹, Matias Kirst¹ - Show less +7 more•Institutions (4)

University of Florida¹, Universidade Federal de Viçosa², Empresa Brasileira de Pesquisa Agropecuária³, Iowa State University⁴

01 Apr 2012-Genetics

TL;DR: Four different original methods of genomic selection that differ with respect to assumptions regarding distribution of marker effects are presented, including ridge regression–best linear unbiased prediction (RR–BLUP), Bayes A, (iii) Bayes Cπ, and (iv) Bayesian LASSO, which suggest that alternative approaches to genomic selection prediction models may perform differently for traits with distinct genetic properties.

...read moreread less

Abstract: Genomic selection can increase genetic gain per generation through early selection. Genomic selection is expected to be particularly valuable for traits that are costly to phenotype and expressed late in the life cycle of long-lived species. Alternative approaches to genomic selection prediction models may perform differently for traits with distinct genetic properties. Here the performance of four different original methods of genomic selection that differ with respect to assumptions regarding distribution of marker effects, including (i) ridge regression–best linear unbiased prediction (RR–BLUP), (ii) Bayes A, (iii) Bayes Cπ, and (iv) Bayesian LASSO are presented. In addition, a modified RR–BLUP (RR–BLUP B) that utilizes a selected subset of markers was evaluated. The accuracy of these methods was compared across 17 traits with distinct heritabilities and genetic architectures, including growth, development, and disease-resistance properties, measured in a Pinus taeda (loblolly pine) training population of 951 individuals genotyped with 4853 SNPs. The predictive ability of the methods was evaluated using a 10-fold, cross-validation approach, and differed only marginally for most method/trait combinations. Interestingly, for fusiform rust disease-resistance traits, Bayes Cπ, Bayes A, and RR–BLUB B had higher predictive ability than RR–BLUP and Bayesian LASSO. Fusiform rust is controlled by few genes of large effect. A limitation of RR–BLUP is the assumption of equal contribution of all markers to the observed variation. However, RR-BLUP B performed equally well as the Bayesian approaches.The genotypic and phenotypic data used in this study are publically available for comparative analysis of genomic selection prediction models.

...read moreread less

Journal Article•DOI•

Full-Duplex Relay Selection for Amplify-and-Forward Cooperative Networks

[...]

Ioannis Krikidis¹, Himal A. Suraweera², Peter J. Smith³, Chau Yuen²•Institutions (3)

University of Cyprus¹, Singapore University of Technology and Design², University of Canterbury³

25 Oct 2012-IEEE Transactions on Wireless Communications

TL;DR: An optimal relay selection procedure that incorporates a hybrid relaying strategy, which dynamically switches between FD and half-duplex relaying according to the instantaneous CSI, is investigated.

...read moreread less

Abstract: This paper focuses on the relay selection problem in amplify-and-forward (AF) cooperative communication with full-duplex (FD) operation. Different relay selection schemes assuming the availability of different instantaneous information are studied. We consider optimal relay selection that maximizes the instantaneous FD channel capacity and requires global channel state information (CSI) as well as several sub-optimal relay selection policies that utilize partial CSI knowledge such as a) source-relay and relay-destination links b) loop interference c) source-relay links and loop interference. To facilitate comparison, exact outage probability expressions and asymptotic approximations of these policies that show a zero diversity order are derived. In addition, an optimal relay selection procedure that incorporates a hybrid relaying strategy, which dynamically switches between FD and half-duplex relaying according to the instantaneous CSI, is also investigated.

...read moreread less

Journal Article•DOI•

An STEEP-fuzzy AHP-TOPSIS framework for evaluation and selection of thermal power plant location: A case study from India

[...]

Devendra Kumar Choudhary¹, Ravi Shankar¹•Institutions (1)

Indian Institute of Technology Delhi¹

01 Jun 2012-Energy

TL;DR: In this paper, an STEEP-fuzzy AHP-TOPSIS based framework for evaluation and selection of optimal locations for thermal power plant (TPPs) is proposed, where potential feasible locations are identified based on social, technical, economical, environmental, and political (STEEP) considerations.

...read moreread less

Journal Article•DOI•

Evidence of widespread selection on standing variation in Europe at height-associated SNPs

[...]

Michael C. Turchin¹, Charleston W. K. Chiang, Cameron D. Palmer, Sriram Sankararaman², Sriram Sankararaman³, David Reich², David Reich³, Joel N. Hirschhorn - Show less +4 more•Institutions (3)

Boston Children's Hospital¹, Broad Institute², Harvard University³

01 Sep 2012-Nature Genetics

TL;DR: By studying height, a classic polygenic trait, this work demonstrates the first human signature of widespread selection on standing variation, and shows that frequencies of alleles associated with increased height are systematically elevated in Northern Europeans compared with Southern Europeans.

...read moreread less

Abstract: Strong signatures of positive selection at newly arising genetic variants are well documented in humans(1-8), but this form of selection may not be widespread in recent human evolution(9). Because many human traits are highly polygenic and partly determined by common, ancient genetic variation, an alternative model for rapid genetic adaptation has been proposed: weak selection acting on many pre-existing (standing) genetic variants, or polygenic adaptation(10-12). By studying height, a classic polygenic trait, we demonstrate the first human signature of widespread selection on standing variation. We show that frequencies of alleles associated with increased height, both at known loci and genome wide, are systematically elevated in Northern Europeans compared with Southern Europeans (P < 4.3 × 10(-4)). This pattern mirrors intra-European height differences and is not confounded by ancestry or other ascertainment biases. The systematic frequency differences are consistent with the presence of widespread weak selection (selection coefficients ∼10(-3)-10(-5) per allele) rather than genetic drift alone (P < 10(-15)).

...read moreread less

Proceedings Article•

PAC Subset Selection in Stochastic Multi-armed Bandits

[...]

Shivaram Kalyanakrishnan¹, Ambuj Tewari², Peter Auer³, Peter Stone²•Institutions (3)

Yahoo!¹, University of Texas at Austin², Information Technology University³

26 Jun 2012

TL;DR: The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.

...read moreread less

Abstract: We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This "subset selection" problem finds application in a variety of areas. In the authors' previous work (Kalyanakrishnan & Stone, 2010), this problem is framed under a PAC setting (denoted "Explore-m"), and corresponding sampling algorithms are analyzed. Whereas the formal analysis therein is restricted to the worst case sample complexity of algorithms, in this paper, we design and analyze an algorithm ("LUCB") with improved expected sample complexity. Interestingly LUCB bears a close resemblance to the well-known UCB algorithm for regret minimization. The expected sample complexity bound we show for LUCB is novel even for single-arm selection (Explore-1). We also give a lower bound on the worst case sample complexity of PAC algorithms for Explore-m.

...read moreread less

Journal Article•DOI•

Likelihood-based selection and sharp parameter estimation.

[...]

Xiaotong Shen¹, Wei Pan¹, Yunzhang Zhu¹•Institutions (1)

University of Minnesota¹

31 Jan 2012-Journal of the American Statistical Association

TL;DR: Theoretically, it is shown that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency andsharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation.

...read moreread less

Abstract: In high-dimensional data analysis, feature selection becomes one effective means for dimension reduction, which proceeds with parameter estimation. Concerning accuracy of selection and estimation, we study nonconvex constrained and regularized likelihoods in the presence of nuisance parameters. Theoretically, we show that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency and sharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation. It permits up to exponentially many candidate features. Computationally, we develop difference convex methods to implement the computational surrogate through prime and dual subproblems. These results establish a central role of L 0 constrained and regularized likelihoods in feature selection and parameter estimation involving selection. As applications of the general method and theory, we perform feature selection...

...read moreread less

Journal Article•DOI•

Maximizing the Reliability of Genomic Selection by Optimizing the Calibration Set of Reference Individuals: Comparison of Methods in Two Diverse Groups of Maize Inbreds ( Zea mays L.)

[...]

Renaud Rincent, Denis Laloë¹, Stéphane Nicolas¹, Thomas Altmann², Dominique Brunel¹, Pedro Revilla³, Víctor M. Rodríguez³, Jesús Moreno-González, Albrecht E. Melchinger⁴, Eva Bauer⁵, C-C. Schoen⁵, Nina Meyer, Catherine Giauffret¹, Cyril Bauland¹, Philippe Jamin¹, Jacques Laborde¹, Hervé Monod¹, Pascal Flament⁶, Alain Charcosset¹, Laurence Moreau¹ - Show less +16 more•Institutions (6)

Institut national de la recherche agronomique¹, Max Planck Society², Spanish National Research Council³, University of Hohenheim⁴, Technische Universität München⁵, Groupe Limagrain⁶

01 Oct 2012-Genetics

TL;DR: In this article, different criteria based on the diversity or on the prediction error variance (PEV) derived from the realized additive relationship matrix (RA-BLUP) were used to select the reference individuals.

...read moreread less

Abstract: Genomic selection refers to the use of genotypic information for predicting breeding values of selection candidates. A prediction formula is calibrated with the genotypes and phenotypes of reference individuals constituting the calibration set. The size and the composition of this set are essential parameters affecting the prediction reliabilities. The objective of this study was to maximize reliabilities by optimizing the calibration set. Different criteria based on the diversity or on the prediction error variance (PEV) derived from the realized additive relationship matrix–best linear unbiased predictions model (RA–BLUP) were used to select the reference individuals. For the latter, we considered the mean of the PEV of the contrasts between each selection candidate and the mean of the population (PEVmean) and the mean of the expected reliabilities of the same contrasts (CDmean). These criteria were tested with phenotypic data collected on two diversity panels of maize (Zea mays L.) genotyped with a 50k SNPs array. In the two panels, samples chosen based on CDmean gave higher reliabilities than random samples for various calibration set sizes. CDmean also appeared superior to PEVmean, which can be explained by the fact that it takes into account the reduction of variance due to the relatedness between individuals. Selected samples were close to optimality for a wide range of trait heritabilities, which suggests that the strategy presented here can efficiently sample subsets in panels of inbred lines. A script to optimize reference samples based on CDmean is available on request.

...read moreread less

Posted Content•

TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

[...]

Mines ParisTech¹, Duke University², Curie Institute³, French Institute of Health and Medical Research⁴

06 May 2012-arXiv: Machine Learning

TL;DR: TIGRESS (Trustful Inference of Gene Regression using Stability Selection) as discussed by the authors is the state-of-the-art method for gene regulatory network inference using least angle regression (LARS) and stability selection.

...read moreread less

Abstract: Inferring the structure of gene regulatory networks (GRN) from gene expression data has many applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (Trustful Inference of Gene REgulation using Stability Selection), was ranked among the top methods in the DREAM5 gene network reconstruction challenge. We investigate in depth the influence of the various parameters of the method and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference. TIGRESS reaches state-of-the-art performance on benchmark data. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on this http URL. Running TIGRESS online is possible on GenePattern: this http URL.

...read moreread less

Journal Article•DOI•

Opportunistic Spectrum Access in Unknown Dynamic Environment: A Game-Theoretic Stochastic Learning Solution

[...]

Yuhua Xu, Jinlong Wang, Qihui Wu, Alagan Anpalagan¹, Yu-Dong Yao² - Show less +1 more•Institutions (2)

Ryerson University¹, Stevens Institute of Technology²

13 Feb 2012-IEEE Transactions on Wireless Communications

TL;DR: This work proposes a stochastic learning automata (SLA) based channel selection algorithm, with which the secondary users learn from their individual action-reward history and adjust their behaviors towards a NE point, and investigates the achievable performance of the game in terms of system throughput and fairness.

...read moreread less

Abstract: We investigate the problem of distributed channel selection using a game-theoretic stochastic learning solution in an opportunistic spectrum access (OSA) system where the channel availability statistics and the number of the secondary users are apriori unknown. We formulate the channel selection problem as a game which is proved to be an exact potential game. However, due to the lack of information about other users and the restriction that the spectrum is time-varying with unknown availability statistics, the task of achieving Nash equilibrium (NE) points of the game is challenging. Firstly, we propose a genie-aided algorithm to achieve the NE points under the assumption of perfect environment knowledge. Based on this, we investigate the achievable performance of the game in terms of system throughput and fairness. Then, we propose a stochastic learning automata (SLA) based channel selection algorithm, with which the secondary users learn from their individual action-reward history and adjust their behaviors towards a NE point. The proposed learning algorithm neither requires information exchange, nor needs prior information about the channel availability statistics and the number of secondary users. Simulation results show that the SLA based learning algorithm achieves high system throughput with good fairness.

...read moreread less

Journal Article•DOI•

Bayesian Model Selection in High-Dimensional Settings

[...]

Valen E. Johnson¹, David Rossell•Institutions (1)

University of Texas MD Anderson Cancer Center¹

14 May 2012-Journal of the American Statistical Association

TL;DR: Modifications of Bayesian model selection methods by imposing nonlocal prior densities on model parameters are proposed and it is demonstrated that these model selection procedures perform as well or better than commonly used penalized likelihood methods in a range of simulation settings.

...read moreread less

Abstract: Standard assumptions incorporated into Bayesian model selection procedures result in procedures that are not competitive with commonly used penalized likelihood methods. We propose modifications of these methods by imposing nonlocal prior densities on model parameters. We show that the resulting model selection procedures are consistent in linear model settings when the number of possible covariates p is bounded by the number of observations n, a property that has not been extended to other model selection procedures. In addition to consistently identifying the true model, the proposed procedures provide accurate estimates of the posterior probability that each identified model is correct. Through simulation studies, we demonstrate that these model selection procedures perform as well or better than commonly used penalized likelihood methods in a range of simulation settings. Proofs of the primary theorems are provided in the Supplementary Material that is available online.

...read moreread less

Journal Article•DOI•

Accelerating the domestication of trees using genomic selection: accuracy of prediction models across ages and environments.

[...]

Marcio Fr Resende¹, Patricio R. Munoz¹, Juan J. Acosta¹, Gary F. Peter¹, John M. Davis¹, Dario Grattapaglia², Dario Grattapaglia³, Marcos Deon Vilela de Resende⁴, Marcos Deon Vilela de Resende³, Matias Kirst¹ - Show less +6 more•Institutions (4)

University of Florida¹, Universidade Católica de Brasília², Empresa Brasileira de Pesquisa Agropecuária³, Universidade Federal de Viçosa⁴

01 Feb 2012-New Phytologist

TL;DR: The results demonstrate the feasibility and remarkable gain that can be achieved by incorporating genomic selection in breeding programs, as long as models are used at the relevant selection age and within the breeding zone in which they were estimated.

...read moreread less

Abstract: Summary •Genomic selection is increasingly considered vital to accelerate genetic improvement. However, it is unknown how accurate genomic selection prediction models remain when used across environments and ages. This knowledge is critical for breeders to apply this strategy in genetic improvement. •Here, we evaluated the utility of genomic selection in a Pinus taeda population of c. 800 individuals clonally replicated and grown on four sites, and genotyped for 4825 single-nucleotide polymorphism (SNP) markers. Prediction models were estimated for diameter and height at multiple ages using genomic random regression best linear unbiased predictor (BLUP). •Accuracies of prediction models ranged from 0.65 to 0.75 for diameter, and 0.63 to 0.74 for height. The selection efficiency per unit time was estimated as 53–112% higher using genomic selection compared with phenotypic selection, assuming a reduction of 50% in the breeding cycle. Accuracies remained high across environments as long as they were used within the same breeding zone. However, models generated at early ages did not perform well to predict phenotypes at age 6 yr. •These results demonstrate the feasibility and remarkable gain that can be achieved by incorporating genomic selection in breeding programs, as long as models are used at the relevant selection age and within the breeding zone in which they were estimated.

...read moreread less

Journal Article•DOI•

Serial Founder Effects During Range Expansion: A Spatial Analog of Genetic Drift

[...]

Montgomery Slatkin¹, Laurent Excoffier², Laurent Excoffier³•Institutions (3)

University of California, Berkeley¹, Swiss Institute of Bioinformatics², University of Bern³

01 May 2012-Genetics

TL;DR: The population genetic consequences of surfing can be predicted approximately by the effective number of founders and the effective selection coefficients, even in the presence of migration among populations.

...read moreread less

Abstract: Range expansions cause a series of founder events. We show that, in a one-dimensional habitat, these founder events are the spatial analog of genetic drift in a randomly mating population. The spatial series of allele frequencies created by successive founder events is equivalent to the time series of allele frequencies in a population of effective size ke, the effective number of founders. We derive an expression for ke in a discrete-population model that allows for local population growth and migration among established populations. If there is selection, the net effect is determined approximately by the product of the selection coefficients and the number of generations between successive founding events. We use the model of a single population to compute analytically several quantities for an allele present in the source population: (i) the probability that it survives the series of colonization events, (ii) the probability that it reaches a specified threshold frequency in the last population, and (iii) the mean and variance of the frequencies in each population. We show that the analytic theory provides a good approximation to simulation results. A consequence of our approximation is that the average heterozygosity of neutral alleles decreases by a factor of 1 – 1/(2ke) in each new population. Therefore, the population genetic consequences of surfing can be predicted approximately by the effective number of founders and the effective selection coefficients, even in the presence of migration among populations. We also show that our analytic results are applicable to a model of range expansion in a continuously distributed population.

...read moreread less

Journal Article•DOI•

A hybrid approach for efficient Web service composition with end-to-end QoS constraints

[...]

Mohammad Alrifai¹, Thomas Risse¹, Wolfgang Nejdl¹•Institutions (1)

Leibniz University of Hanover¹

04 Jun 2012-ACM Transactions on The Web

TL;DR: This article proposes a hybrid solution that combines global optimization with local selection techniques to benefit from the advantages of both worlds and significantly outperforms existing solutions in terms of computation time while achieving close-to-optimal results.

...read moreread less

Abstract: Dynamic selection of Web services at runtime is important for building flexible and loosely-coupled service-oriented applications. An abstract description of the required services is provided at design-time, and matching service offers are located at runtime. With the growing number of Web services that provide the same functionality but differ in quality parameters (e.g., availability, response time), a decision needs to be made on which services should be selected such that the user's end-to-end QoS requirements are satisfied. Although very efficient, local selection strategy fails short in handling global QoS requirements. Solutions based on global optimization, on the other hand, can handle global constraints, but their poor performance renders them inappropriate for applications with dynamic and realtime requirements. In this article we address this problem and propose a hybrid solution that combines global optimization with local selection techniques to benefit from the advantages of both worlds. The proposed solution consists of two steps: first, we use mixed integer programming (MIP) to find the optimal decomposition of global QoS constraints into local constraints. Second, we use distributed local selection to find the best Web services that satisfy these local constraints. The results of experimental evaluation indicate that our approach significantly outperforms existing solutions in terms of computation time while achieving close-to-optimal results.

...read moreread less

Journal Article•DOI•

Synthetic analyses of phenotypic selection in natural populations: lessons, limitations and future directions

[...]

Joel G. Kingsolver¹, Sarah E. Diamond², Adam M. Siepielski³, Stephanie M. Carlson⁴•Institutions (4)

University of North Carolina at Chapel Hill¹, North Carolina State University², University of San Diego³, University of California⁴

24 Feb 2012-Evolutionary Ecology

TL;DR: Three promising areas for expanding the understanding of selection in the wild are highlighted: field studies of stabilizing selection, selection on physiological and behavioral traits, and the ecological causes of selection; new statistical models and methods that connect phenotypic variation to population demography and selection; and availability of the underlying individual-level data sets from past and future selection studies, which will allow comprehensive modeling of selection and fitness variation within and across systems.

...read moreread less

Abstract: There are now thousands of estimates of phenotypic selection in natural populations, resulting in multiple synthetic reviews of these data. Here we consider several major lessons and limitations emerging from these syntheses, and how they may guide future studies of selection in the wild. First, we review past analyses of the patterns of directional selection. We present new meta-analyses that confirm differences in the direction and magnitude of selection for different types of traits and fitness components. Second, we describe patterns of temporal and spatial variation in directional selection, and their implications for cumulative selection and directional evolution. Meta-analyses suggest that sampling error contributes importantly to observed temporal variation in selection, and indicate that evidence for frequent temporal changes in the direction of selection in natural populations is limited. Third, we review the apparent lack of evidence for widespread stabilizing selection, and discuss biological and methodological explanations for this pattern. Finally, we describe how sampling error, statistical biases, choice of traits, fitness measures and selection metrics, environmental covariance and other factors may limit the inferences we can draw from analyses of selection coefficients. Current standardized selection metrics based on simple parametric statistical models may be inadequate for understanding patterns of non-linear selection and complex fitness surfaces. We highlight three promising areas for expanding our understanding of selection in the wild: (1) field studies of stabilizing selection, selection on physiological and behavioral traits, and the ecological causes of selection; (2) new statistical models and methods that connect phenotypic variation to population demography and selection; and (3) availability of the underlying individual-level data sets from past and future selection studies, which will allow comprehensive modeling of selection and fitness variation within and across systems, rather than meta-analyses of standardized selection metrics.

...read moreread less

Journal Article•DOI•

Accuracy of genomic selection in European maize elite breeding populations

[...]

Yusheng Zhao¹, Manje Gowda¹, Wenxin Liu¹, Tobias Würschum¹, Hans Peter Maurer¹, Friedrich Longin¹, Nicolas Ranc², Jochen C. Reif¹ - Show less +4 more•Institutions (2)

University of Hohenheim¹, Syngenta²

01 Mar 2012-Theoretical and Applied Genetics

TL;DR: The accuracy of genomic selection realized for grain yield corresponds to the precision of phenotyping at unreplicated field trials in 3–4 locations and, consequently, genomic selection holds great promise for maize breeding programs.

...read moreread less

Abstract: Genomic selection is a promising breeding strategy for rapid improvement of complex traits. The objective of our study was to investigate the prediction accuracy of genomic breeding values through cross validation. The study was based on experimental data of six segregating populations from a half-diallel mating design with 788 testcross progenies from an elite maize breeding program. The plants were intensively phenotyped in multi-location field trials and fingerprinted with 960 SNP markers. We used random regression best linear unbiased prediction in combination with fivefold cross validation. The prediction accuracy across populations was higher for grain moisture (0.90) than for grain yield (0.58). The accuracy of genomic selection realized for grain yield corresponds to the precision of phenotyping at unreplicated field trials in 3–4 locations. As for maize up to three generations are feasible per year, selection gain per unit time is high and, consequently, genomic selection holds great promise for maize breeding programs.

...read moreread less

Journal Article•DOI•

Amplify-and-Forward Relay Selection with Outdated Channel Estimates

[...]

Diomidis S. Michalopoulos¹, Himal A. Suraweera², George K. Karagiannidis³, Robert Schober¹•Institutions (3)

University of British Columbia¹, Singapore University of Technology and Design², Aristotle University of Thessaloniki³

10 May 2012-IEEE Transactions on Communications

TL;DR: It is shown that it may be preferable, in terms of outage and symbol error probability, not to include links in the relay selection process that experience a sufficiently high maximum Doppler shift, since in those cases partial relay selection outperforms best relay selection.

...read moreread less

Abstract: We study the effect of outdated channel state information on the outage and error rate performance of amplify-and-forward (AF) relay selection, where only one out of the set of available relays is activated. We consider two variations of AF relay selection, namely best relay selection and partial relay selection, when the selection is based upon outdated channel estimates. For both these variations, closed-form expressions for the outage probability are obtained, along with approximate expressions for the symbol error rate in the medium to high signal-to-noise-ratio (SNR) regime. The diversity gain and coding gain of the above schemes are also explicitly derived. Numerical results manifest that the outage performance of AF relay selection is highly dependent on the level of correlation between the actual channel conditions and their corresponding (outdated) estimates. This result has a significant impact on the deployment of relay selection in practical applications, implying that a high feedback rate may be required in practice in order to attain the full benefits of relay selection. It is further shown that it may be preferable, in terms of outage and symbol error probability, not to include links in the relay selection process that experience a sufficiently high maximum Doppler shift, since in those cases partial relay selection outperforms best relay selection.

...read moreread less

Collapse