scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Unbroken: RADseq remains a powerful tool for understanding the genetics of adaptation in natural populations

TL;DR: It is strongly argued that RADseq remains a powerful and efficient approach that provides sufficient marker density for studying selection in many natural populations and that researchers should consider a wide range of trade‐offs among genomic techniques.
Abstract: Recently, Lowry et al. addressed the ability of RADseq approaches to detect loci under selection in genome scans. While the authors raise important considerations, such as accounting for the extent of linkage disequilibrium in a study system, we strongly disagree with their overall view of the ability of RADseq to inform our understanding of the genetic basis of adaptation. The family of RADseq protocols has radically improved the field of population genomics, expanding by several orders of magnitude the number of markers available while substantially reducing the cost per marker. Researchers whose goal is to identify regions of the genome under selection must consider the LD of the experimental system; however, there is no magical LD cutoff below which researchers should refuse to use RADseq. Lowry et al. further made two major arguments: a theoretical argument that modeled the likelihood of detecting selective sweeps with RAD markers, and gross summaries based on an anecdotal collection of RAD studies. Unfortunately, their simulations were off by two orders of magnitude in the worst case, while their anecdotes merely showed that it is possible to get widely divergent densities of RAD tags for any particular experiment, either by design or due to experimental efficacy. We strongly argue that RADseq remains a powerful and efficient approach that provides sufficient marker density for studying selection in many natural populations. Given limited resources, we argue that researchers should consider a wide range of trade-offs among genomic techniques, in light of their study question and the power of different techniques to answer it.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper describes the first software natively capable of using paired‐end sequencing to derive short contigs from de novo RAD data, and shows that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired‐ end de noVO data sets.
Abstract: For half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively-parallel, short-read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here, we describe the first software natively capable of using paired-end sequencing to derive short contigs from de novo RAD data. Stacks version 2 employs a de Bruijn graph assembler to build and connect contigs from forward and reverse reads for each de novo RAD locus, which it then uses as a reference for read alignments. The new architecture allows all the individuals in a metapopulation to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes, generating RAD loci that are 400-800 bp in length. To prove its recall and precision, we tested the software with simulated data and compared reference-aligned and de novo analyses of three empirical data sets. Our study shows that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired-end de novo data sets.

479 citations


Cites background from "Unbroken: RADseq remains a powerful..."

  • ...Often, the differences between these studies generated substantial discussion in the community (Catchen et al., 2017; Lowry et al., 2017; McKinney, Larson, Seeb, & Seeb, 2017) and a lot of speculation as to the inherent limitations of reduced repre‐ sentation sequencing....

    [...]

Journal ArticleDOI
TL;DR: The diversity patterns of bivalves are uneven across the globe with hotspots in the interior basin in the United States of America (USA), Central America, Indian subcontinent and Southeast Asia.
Abstract: Bivalves are ubiquitous members of freshwater ecosystems and responsible for important functions and services. The present paper revises freshwater bivalve diversity, conservation status and threats at the global scale and discusses future research needs and management actions. The diversity patterns are uneven across the globe with hotspots in the interior basin in the United States of America (USA), Central America, Indian subcontinent and Southeast Asia. Freshwater bivalves are affected by multiple threats that vary across the globe; however, pollution and natural system (habitat) modifications being consistently found as the most impacting. Freshwater bivalves are among the most threatened groups in the world with 40% of the species being near threatened, threatened or extinct, and among them the order Unionida is the most endangered. We suggest that global cooperation between scientists, managers, politicians and general public, and application of new technologies (new generation sequencing and remote sensing, among others) will strengthen the quality of studies on the natural history and conservation of freshwater bivalves. Finally, we introduce the articles published in this special issue of Hydrobiologia under the scope of the Second International Meeting on Biology and Conservation of Freshwater Bivalves held in 2015 in Buffalo, New York, USA.

223 citations


Cites methods from "Unbroken: RADseq remains a powerful..."

  • ...Furthermore, using reduced genome representations or snip analyses, it is now possible to get more information on the phylogeographic patterns of species and on the definition of conservation units (Catchen et al., 2017; Desalle & Amato, 2017)....

    [...]

Journal ArticleDOI
TL;DR: No single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias; proposed ways to minimize such biases are discussed.
Abstract: Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology.

191 citations


Cites background or methods from "Unbroken: RADseq remains a powerful..."

  • ...In addition, Catchen et al. (2017) argue that even with short LD blocks, RAD-seq has been successful at detecting adaptive loci (e.g., the Eda locus in three-spine sticklebacks), and that endangered species usually exhibit small effective population sizes for which large LD blocks should be…...

    [...]

  • ...A higher marker density (and thus genome coverage) can be achieved with the use of frequent cutter enzymes (McKinney, Larson, Seeb, & Seeb, 2017; Catchen et al., 2017); however, the target marker density should be informed by the genetic architecture of phenotypic traits of interest (i.e., the…...

    [...]

  • ...Although previous knowledge on the extend of LD decay or recombination rate is generally lacking for nonmodel species, LD block size can be estimated from a dense genetic map or from RAD-seq data with a reference genome (Catchen et al., 2017)....

    [...]

  • ...In the absence of reference genome, RAD-seq is an affordable alternative for the screening of neutral and putatively adaptive variation in a fraction of the genome (McKinney et al., 2017; Catchen et al., 2017) with some limitations (Hoban et al., 2016; Lowry et al., 2017a,b)....

    [...]

Posted ContentDOI
22 Apr 2019-bioRxiv
TL;DR: This paper describes the first software capable of using paired-end sequencing to derive short contigs from de novo RAD data natively, and shows that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired- end de noVO datasets.
Abstract: For half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively-parallel, short-read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here we describe the first software capable of using paired-end sequencing to derive short contigs from de novo RAD data natively. Stacks version 2 employs a de Bruijn graph assembler to build contigs from paired-end reads and overlap those contigs with the corresponding single-end loci. The new architecture allows all the individuals in a meta population to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes – generating RAD loci that are 400-800bp in length. To prove its recall and precision, we test the software with simulated data and compare reference-aligned and de novo analyses of three empirical datasets. We show that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired-end de novo datasets.

160 citations


Cites methods from "Unbroken: RADseq remains a powerful..."

  • ...For de novo analyses, the core of the Stacks clustering algorithm (ustackscstacks-sstacks), which builds loci out of the single-end reads, remains as previously described [Catchen2011, Catchen2013], but has received a number of improvements and optimizations. Stacks has been capable of gapped assemblies since version 1.38 (2016), when Needleman-Wunch comparisons between stacks sharing many k-mers were added, and in Stacks v2 this capability has been refined and has become the default....

    [...]

  • ...Individuals could then be matched to the metapopulation data contained in the catalog with sstacks [Catchen2011]. Availability of computational resources undergirded this design decision as the pipeline needed to process potentially thousands of individual samples, each with millions of raw reads. This architecture prevented population-level information, such as presence of a polymorphic site, from being incorporated into individual genotype calls. To incorporate this population-level information, version 1.10 of Stacks (2013) incorporated the rxstacks program to make population-level corrections retrospectively, after the core pipeline had executed....

    [...]

Journal ArticleDOI
TL;DR: It is found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection and it is recommended that future studies consistently report geographical coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure.
Abstract: Detecting genetic variants under selection using FST outlier analysis (OA) and environmental association analyses (EAAs) are popular approaches that provide insight into the genetic basis of local adaptation. Despite the frequent use of OA and EAA approaches and their increasing attractiveness for detecting signatures of selection, their application to field-based empirical data have not been synthesized. Here, we review 66 empirical studies that use Single Nucleotide Polymorphisms (SNPs) in OA and EAA. We report trends and biases across biological systems, sequencing methods, approaches, parameters, environmental variables and their influence on detecting signatures of selection. We found striking variability in both the use and reporting of environmental data and statistical parameters. For example, linkage disequilibrium among SNPs and numbers of unique SNP associations identified with EAA were rarely reported. The proportion of putatively adaptive SNPs detected varied widely among studies, and decreased with the number of SNPs analysed. We found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection. OA identified a higher proportion of outliers when more individuals were sampled, but this was not the case for EAA. To facilitate repeatability, interpretation and synthesis of studies detecting selection, we recommend that future studies consistently report geographical coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure. Identifying standards for how OA and EAA studies are designed and reported will aid future transparency and comparability of SNP-based selection studies and help to progress landscape and evolutionary genomics.

157 citations


Cites background from "Unbroken: RADseq remains a powerful..."

  • ...Alternate genomic data sets have different advantages and disadvantages (e.g., Catchen et al., 2017; Josephs, Stinchcombe, & Wright, 2017; Lowry et al., 2017; McKinney, Larson, Seeb, & Seeb, 2017)....

    [...]

References
More filters
Journal ArticleDOI
05 Apr 2012-Nature
TL;DR: A high-quality reference genome assembly for threespine stickleback fish is developed and it is indicated that reuse of globally shared standing genetic variation has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation.
Abstract: Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine-freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine-freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature.

1,557 citations


"Unbroken: RADseq remains a powerful..." refers background in this paper

  • ...Whole-genome resequencing samples a much smaller number of individuals for a given total sequencing effort, so while it can sample all LD blocks, it relies heavily on assumptions that the individuals sampled are representative of the populations under study (e.g. Jones et al. (2012))....

    [...]

Journal ArticleDOI
TL;DR: Overall estimates of genetic diversity and differentiation among populations confirm the biogeographic hypothesis that large panmictic oceanic populations have repeatedly given rise to phenotypically divergent freshwater populations and identify several novel regions showing parallel differentiation across independent populations.
Abstract: Next-generation sequencing technology provides novel opportunities for gathering genome-scale sequence data in natural populations, laying the empirical foundation for the evolving field of population genomics. Here we conducted a genome scan of nucleotide diversity and differentiation in natural populations of threespine stickleback (Gasterosteus aculeatus). We used Illumina-sequenced RAD tags to identify and type over 45,000 single nucleotide polymorphisms (SNPs) in each of 100 individuals from two oceanic and three freshwater populations. Overall estimates of genetic diversity and differentiation among populations confirm the biogeographic hypothesis that large panmictic oceanic populations have repeatedly given rise to phenotypically divergent freshwater populations. Genomic regions exhibiting signatures of both balancing and divergent selection were remarkably consistent across multiple, independently derived populations, indicating that replicate parallel phenotypic evolution in stickleback may be occurring through extensive, parallel genetic evolution at a genome-wide scale. Some of these genomic regions co-localize with previously identified QTL for stickleback phenotypic variation identified using laboratory mapping crosses. In addition, we have identified several novel regions showing parallel differentiation across independent populations. Annotation of these regions revealed numerous genes that are candidates for stickleback phenotypic evolution and will form the basis of future genetic analyses in this and other organisms. This study represents the first high-density SNP–based genome scan of genetic diversity and differentiation for populations of threespine stickleback in the wild. These data illustrate the complementary nature of laboratory crosses and population genomic scans by confirming the adaptive significance of previously identified genomic regions, elucidating the particular evolutionary and demographic history of such regions in natural populations, and identifying new genomic regions and candidate genes of evolutionary significance.

1,406 citations


"Unbroken: RADseq remains a powerful..." refers background in this paper

  • ...…the Eda locus in threespine stickleback has been repeatedly identified as a strong target of divergent selection in many independent RADSeq studies (Hohenlohe et al. 2010; Roesti et al. 2012; Ferchaud & Hansen 2016) – using less frequent cutters (SbfI, 8 bp) than Lowry et al. deem workable....

    [...]

  • ...For example, the Eda locus in threespine stickleback has been repeatedly identified as a strong target of divergent selection in many independent RADSeq studies (Hohenlohe et al. 2010; Roesti et al. 2012; Ferchaud & Hansen 2016) – using less frequent cutters (SbfI, 8 bp) than Lowry et al....

    [...]

Journal ArticleDOI
TL;DR: This Review provides a comprehensive discussion of RADseq methods to aid researchers in choosing among the many different approaches and avoiding erroneous scientific conclusions from RADseq data, a problem that has plagued other genetic marker types in the past.
Abstract: High-throughput techniques based on restriction site-associated DNA sequencing (RADseq) are enabling the low-cost discovery and genotyping of thousands of genetic markers for any species, including non-model organisms, which is revolutionizing ecological, evolutionary and conservation genetics. Technical differences among these methods lead to important considerations for all steps of genomics studies, from the specific scientific questions that can be addressed, and the costs of library preparation and sequencing, to the types of bias and error inherent in the resulting data. In this Review, we provide a comprehensive discussion of RADseq methods to aid researchers in choosing among the many different approaches and avoiding erroneous scientific conclusions from RADseq data, a problem that has plagued other genetic marker types in the past.

1,102 citations


"Unbroken: RADseq remains a powerful..." refers background in this paper

  • ...In particular, RADseq protocols have a large degree of flexibility for tailoring sampling and study design for particular systems (Andrews et al. 2016), and accounting for factors such as LD, and they have demonstrated their potential to identify genetic signatures of selection in nature....

    [...]

  • ...Pooled sequencing of various library types, while cost-efficient, carries inherent risks and limitations, particularly in the absence of a well-characterized genome (Schlotterer et al. 2014; Andrews et al. 2016)....

    [...]

Journal ArticleDOI
TL;DR: There is no theoretical or empirical basis for the evo devo contention that adaptations involving morphology evolve by genetic mechanisms different from those involving physiology and other traits, and substantial data on the genetic basis of adaptation from both genome-wide surveys and single-locus studies are examined.
Abstract: An important tenet of evolutionary developmental biology (”evo devo”) is that adaptive mutations affecting morphology are more likely to occur in the cis-regulatory regions than in the protein-coding regions of genes. This argument rests on two claims: (1) the modular nature of cis-regulatory elements largely frees them from deleterious pleiotropic effects, and (2) a growing body of empirical evidence appears to support the predominant role of gene regulatory change in adaptation, especially morphological adaptation. Here we discuss and critique these assertions. We first show that there is no theoretical or empirical basis for the evo devo contention that adaptations involving morphology evolve by genetic mechanisms different from those involving physiology and other traits. In addition, some forms of protein evolution can avoid the negative consequences of pleiotropy, most notably via gene duplication. In light of evo devo claims, we then examine the substantial data on the genetic basis of adaptation from both genome-wide surveys and single-locus studies. Genomic studies lend little support to the cis-regulatory theory: many of these have detected adaptation in protein-coding regions, including transcription factors, whereas few have examined regulatory regions. Turning to single-locus studies, we note that the most widely cited examples of adaptive cis-regulatory mutations focus on trait loss rather than gain, and none have yet pinpointed an evolved regulatory site. In contrast, there are many studies that have both identified structural mutations and functionally verified their contribution to adaptation and speciation. Neither the theoretical arguments nor the data from nature, then, support the claim for a predominance of cis-regulatory mutations in evolution. Although this claim may be true, it is at best premature. Adaptation and speciation probably proceed through a combination of cis-regulatory and structural mutations, with a substantial contribution of the latter.

953 citations


"Unbroken: RADseq remains a powerful..." refers background in this paper

  • ...While the relative contribution of coding vs. regulatory regions still remains an open question (Hoekstra & Coyne 2007), biasing the genomic sampling a priori simply cannot address this question and may actually provide a biased view of the genomic determinants of evolutionary change....

    [...]