scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Understanding the origin of species with genome-scale data: modelling gene flow

01 Jun 2013-Nature Reviews Genetics (Nature Publishing Group)-Vol. 14, Iss: 6, pp 404-414
TL;DR: Current data, models, methods and the potential pitfalls in using them will be considered here, especially with regard to including recombination in genetic models of the divergence process.
Abstract: As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population genomic data sets. Such data hold the potential to resolve long-standing questions in evolutionary biology about the role of gene exchange in species formation. In principle, the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However, there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets and shows that it allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods.
Abstract: We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.

1,199 citations


Cites methods from "Understanding the origin of species..."

  • ...One advantage of SFS-based inference methods is that they can handle large next generation sequencing (NGS) data sets [28–30]....

    [...]

Journal ArticleDOI
TL;DR: Emergent trends and gaps in understanding are identified, new approaches to more fully integrate genomics into speciation research are proposed, and an integrative definition of the field of speciation genomics is provided.
Abstract: Speciation is a fundamental evolutionary process, the knowledge of which is crucial for understanding the origins of biodiversity. Genomic approaches are an increasingly important aspect of this research field. We review current understanding of genome-wide effects of accumulating reproductive isolation and of genomic properties that influence the process of speciation. Building on this work, we identify emergent trends and gaps in our understanding, propose new approaches to more fully integrate genomics into speciation research, translate speciation theory into hypotheses that are testable using genomic tools and provide an integrative definition of the field of speciation genomics.

875 citations

Journal ArticleDOI
TL;DR: A global timetree of life synthesized from 2,274 studies representing 50,632 species and examined the pattern and rate of diversification as well as the timing of speciation suggests that speciation and diversification are processes dominated by random events and that adaptive change is largely a separate process.
Abstract: Genomic data are rapidly resolving the tree of living species calibrated to time, the timetree of life, which will provide a framework for research in diverse fields of science. Previous analyses of taxonomically restricted timetrees have found a decline in the rate of diversification in many groups of organisms, often attributed to ecological interactions among species. Here, we have synthesized a global timetree of life from 2,274 studies representing 50,632 species and examined the pattern and rate of diversification as well as the timing of speciation. We found that species diversity has been mostly expanding overall and in many smaller groups of species, and that the rate of diversification in eukaryotes has been mostly constant. We also identified, and avoided, potential biases that may have influenced previous analyses of diversification including low levels of taxon sampling, small clade size, and the inclusion of stem branches in clade analyses. We found consistency in time-to-speciation among plants and animals, ∼2 My, as measured by intervals of crown and stem species times. Together, this clock-like change at different levels suggests that speciation and diversification are processes dominated by random events and that adaptive change is largely a separate process.

809 citations


Cites methods from "Understanding the origin of species..."

  • ...Considering a standard model of speciation based on geographic isolation, in the absence of gene flow (Sousa and Hey 2013), we estimated the time required for speciation to occur....

    [...]

Posted Content
TL;DR: This paper synthesized a global timetree of life from 2,274 studies representing 50,632 species and examined the pattern and rate of diversification as well as the timing of speciation.
Abstract: Genomic data are rapidly resolving the tree of living species calibrated to time, the timetree of life, which will provide a framework for research in diverse fields of science. Previous analyses of taxonomically restricted timetrees have found a decline in the rate of diversification in many groups of organisms, often attributed to ecological interactions among species. Here we have synthesized a global timetree of life from 2,274 studies representing 50,632 species and examined the pattern and rate of diversification as well as the timing of speciation. We found that species diversity has been mostly expanding overall and in many smaller groups of species, and that the rate of diversification in eukaryotes has been mostly constant. We also identified, and avoided, potential biases that may have influenced previous analyses of diversification including low levels of taxon sampling, small clade size, and the inclusion of stem branches in clade analyses. We found consistency in time-to-speciation among plants and animals, approximately two million years, as measured by intervals of crown and stem species times. Together, this clock-like change at different levels suggests that speciation and diversification are processes dominated by random events and that adaptive change is largely a separate process.

643 citations

Journal ArticleDOI
Hans Ellegren1
TL;DR: High-throughput sequencing technologies are revolutionizing the life sciences, and the past 12 months have seen a burst of genome sequences from non-model organisms, in each case representing a fundamental source of data of significant importance to biological research.
Abstract: High-throughput sequencing technologies are revolutionizing the life sciences. The past 12 months have seen a burst of genome sequences from non-model organisms, in each case representing a fundamental source of data of significant importance to biological research. This has bearing on several aspects of evolutionary biology, and we are now beginning to see patterns emerging from these studies. These include significant heterogeneity in the rate of recombination that affects adaptive evolution and base composition, the role of population size in adaptive evolution, and the importance of expansion of gene families in lineage-specific adaptation. Moreover, resequencing of population samples (population genomics) has enabled the identification of the genetic basis of critical phenotypes and cast light on the landscape of genomic divergence during speciation.

607 citations

References
More filters
Book
01 Jan 1963

7,870 citations

Journal ArticleDOI
01 Mar 1931-Genetics
TL;DR: Page 108, last line of text, for "P/P″" read "P′/ P″."
Abstract: Page 108, last line of text, for "P/P″" read "P′/P″." Page 120, last line, for "δ v " read "δ y ." Page 123, line 10, for "4Nn" read "4Nu." Page 125, line 1, for "q" read "q." Page 126, line 12, for "q" read "q." Page 135, line 5 from bottom, for "y4Nsq" read "e4Nsq." Page 141, lines 8

7,850 citations

Journal ArticleDOI
28 Oct 2010-Nature
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

7,538 citations

Journal ArticleDOI
TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.
Abstract: Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.

7,023 citations


"Understanding the origin of species..." refers background in this paper

  • ...The diagram shows the divergence of two sister populations (1 and 2), a third population (potential source of introgressed genes; 3) and an outgroup population (4) over time....

    [...]

Journal ArticleDOI
26 Mar 1964-Copeia

5,857 citations