scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Ecology Resources in 2016"


Journal ArticleDOI
TL;DR: The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context, helping to set up tailor‐made analysis pipelines for a broad range of DNA metabARCoding applications, including biodiversity surveys or diet analyses.
Abstract: DNA metabarcoding offers new perspectives in biodiversity research. This recently developed approach to ecosystem study relies heavily on the use of next-generation sequencing (NGS) and thus calls upon the ability to deal with huge sequence data sets. The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to set up tailor-made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The obitools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools. A Galaxy wrapper is available on the GenOuest core facility toolshed: http://toolshed.genouest.org.

711 citations


Journal ArticleDOI
TL;DR: Four new supervised methods to detect the number of clusters were developed and tested and were found to outperform the existing methods using both evenly and unevenly sampled data sets and a subsampling strategy aiming to reduce sampling unevenness between subpopulations is presented and tested.
Abstract: Inferences of population structure and more precisely the identification of genetically homogeneous groups of individuals are essential to the fields of ecology, evolutionary biology and conservation biology. Such population structure inferences are routinely investigated via the program structure implementing a Bayesian algorithm to identify groups of individuals at Hardy-Weinberg and linkage equilibrium. While the method is performing relatively well under various population models with even sampling between subpopulations, the robustness of the method to uneven sample size between subpopulations and/or hierarchical levels of population structure has not yet been tested despite being commonly encountered in empirical data sets. In this study, I used simulated and empirical microsatellite data sets to investigate the impact of uneven sample size between subpopulations and/or hierarchical levels of population structure on the detected population structure. The results demonstrated that uneven sampling often leads to wrong inferences on hierarchical structure and downward-biased estimates of the true number of subpopulations. Distinct subpopulations with reduced sampling tended to be merged together, while at the same time, individuals from extensively sampled subpopulations were generally split, despite belonging to the same panmictic population. Four new supervised methods to detect the number of clusters were developed and tested as part of this study and were found to outperform the existing methods using both evenly and unevenly sampled data sets. Additionally, a subsampling strategy aiming to reduce sampling unevenness between subpopulations is presented and tested. These results altogether demonstrate that when sampling evenness is accounted for, the detection of the correct population structure is greatly improved.

631 citations


Journal ArticleDOI
TL;DR: It is concluded that genome scans based on RADseq data alone, while useful for studies of neutral genetic variation and genetic population structure, will likely miss many loci under selection in studies of local adaptation.
Abstract: Understanding how and why populations evolve is of fundamental importance to molecular ecology. Restriction site-associated DNA sequencing (RADseq), a popular reduced representation method, has ushered in a new era of genome-scale research for assessing population structure, hybridization, demographic history, phylogeography and migration. RADseq has also been widely used to conduct genome scans to detect loci involved in adaptive divergence among natural populations. Here, we examine the capacity of those RADseq-based genome scan studies to detect loci involved in local adaptation. To understand what proportion of the genome is missed by RADseq studies, we developed a simple model using different numbers of RAD-tags, genome sizes and extents of linkage disequilibrium (length of haplotype blocks). Under the best-case modelling scenario, we found that RADseq using six- or eight-base pair cutting restriction enzymes would fail to sample many regions of the genome, especially for species with short linkage disequilibrium. We then surveyed recent studies that have used RADseq for genome scans and found that the median density of markers across these studies was 4.08 RAD-tag markers per megabase (one marker per 245 kb). The length of linkage disequilibrium for many species is one to three orders of magnitude less than density of the typical recent RADseq study. Thus, we conclude that genome scans based on RADseq data alone, while useful for studies of neutral genetic variation and genetic population structure, will likely miss many loci under selection in studies of local adaptation.

327 citations


Journal ArticleDOI
TL;DR: The results illustrate the potential for eDNA sampling and metabarcode approaches to improve quantification of aquatic species diversity in natural environments and point the way towards using eDNA metabarcoding as an index of macrofaunal species abundance.
Abstract: Freshwater fauna are particularly sensitive to environmental change and disturbance. Management agencies frequently use fish and amphibian biodiversity as indicators of ecosystem health and a way to prioritize and assess management strategies. Traditional aquatic bioassessment that relies on capture of organisms via nets, traps and electrofishing gear typically has low detection probabilities for rare species and can injure individuals of protected species. Our objective was to determine whether environmental DNA (eDNA) sampling and metabarcoding analysis can be used to accurately measure species diversity in aquatic assemblages with differing structures. We manipulated the density and relative abundance of eight fish and one amphibian species in replicated 206-L mesocosms. Environmental DNA was filtered from water samples, and six mitochondrial gene fragments were Illumina-sequenced to measure species diversity in each mesocosm. Metabarcoding detected all nine species in all treatment replicates. Additionally, we found a modest, but positive relationship between species abundance and sequencing read abundance. Our results illustrate the potential for eDNA sampling and metabarcoding approaches to improve quantification of aquatic species diversity in natural environments and point the way towards using eDNA metabarcoding as an index of macrofaunal species abundance.

312 citations


Journal ArticleDOI
TL;DR: The new universal ITS primers will find wide application in both plant and fungal biology, and the new plant‐specific ITSPrimers will significantly improve the quality of ITS sequence information collections in plant molecular systematics and DNA barcoding.
Abstract: The internal transcribed spacer (ITS) of nuclear ribosomal DNA is one of the most commonly used DNA markers in plant phylogenetic and DNA barcoding analyses, and it has been recommended as a core plant DNA barcode. Despite this popularity, the universality and specificity of PCR primers for the ITS region are not satisfactory, resulting in amplification and sequencing difficulties. By thoroughly surveying and analysing the 18S, 5.8S and 26S sequences of Plantae and Fungi from GenBank, we designed new universal and plant-specific PCR primers for amplifying the whole ITS region and a part of it (ITS1 or ITS2) of plants. In silico analyses of the new and the existing ITS primers based on these highly representative data sets indicated that (i) the newly designed universal primers are suitable for over 95% of plants in most groups; and (ii) the plant-specific primers are suitable for over 85% of plants in most groups without amplification of fungi. A total of 335 samples from 219 angiosperm families, 11 gymnosperm families, 24 fern and lycophyte families, 16 moss families and 17 fungus families were used to test the performances of these primers. In vitro PCR produced similar results to those from the in silico analyses. Our new primer pairs gave PCR improvements up to 30% compared with common-used ones. The new universal ITS primers will find wide application in both plant and fungal biology, and the new plant-specific ITS primers will, by eliminating PCR amplification of nonplant templates, significantly improve the quality of ITS sequence information collections in plant molecular systematics and DNA barcoding.

248 citations


Journal ArticleDOI
TL;DR: Results show that fish released more eDNA in warm water than in cold water and that eDNA concentration better reflects fish abundance/biomass at high temperature, which supports the importance of including water temperature in fish abundance and biomass prediction models based on eDNA.
Abstract: Environmental DNA (eDNA) promises to ease noninvasive quantification of fish biomass or abundance, but its integration within conservation and fisheries management is currently limited by a lack of understanding of the influence of eDNA collection method and environmental conditions on eDNA concentrations in water samples. Water temperature is known to influence the metabolism of fish and consequently could strongly affect eDNA release rate. As water temperature varies in temperate regions (both seasonally and geographically), the unknown effect of water temperature on eDNA concentrations poses practical limitations on quantifying fish populations using eDNA from water samples. This study aimed to clarify how water temperature and the eDNA capture method alter the relationships between eDNA concentration and fish abundance/biomass. Water samples (1 L) were collected from 30 aquaria including triplicate of 0, 5, 10, 15 and 20 Brook Charr specimens at two different temperatures (7 °C and 14 °C). Water samples were filtered with five different types of filters. The eDNA concentration obtained by quantitative PCR (qPCR) varied significantly with fish abundance and biomass and types of filters (mixed-design ANOVA, P < 0.001). Results also show that fish released more eDNA in warm water than in cold water and that eDNA concentration better reflects fish abundance/biomass at high temperature. From a technical standpoint, higher levels of eDNA were captured with glass fibre (GF) filters than with mixed cellulose ester (MCE) filters and support the importance of adequate filters to quantify fish abundance based on the eDNA method. This study supports the importance of including water temperature in fish abundance/biomass prediction models based on eDNA.

214 citations


Journal ArticleDOI
TL;DR: This study uses one type of NGS approach, sequence capture of ultraconserved elements (UCEs), to collect data from bird museum specimens as old as 120 years, and detects contamination in some samples and determined that contamination was more prevalent in older samples that were subject to less sequencing.
Abstract: New DNA sequencing technologies are allowing researchers to explore the genomes of the millions of natural history specimens collected prior to the molecular era. Yet, we know little about how well specific next-generation sequencing (NGS) techniques work with the degraded DNA typically extracted from museum specimens. Here, we use one type of NGS approach, sequence capture of ultraconserved elements (UCEs), to collect data from bird museum specimens as old as 120 years. We targeted 5060 UCE loci in 27 western scrub-jays (Aphelocoma californica) representing three evolutionary lineages that could be species, and we collected an average of 3749 UCE loci containing 4460 single nucleotide polymorphisms (SNPs). Despite older specimens producing fewer and shorter loci in general, we collected thousands of markers from even the oldest specimens. More sequencing reads per individual helped to boost the number of UCE loci we recovered from older specimens, but more sequencing was not as successful at increasing the length of loci. We detected contamination in some samples and determined that contamination was more prevalent in older samples that were subject to less sequencing. For the phylogeny generated from concatenated UCE loci, contamination led to incorrect placement of some individuals. In contrast, a species tree constructed from SNPs called within UCE loci correctly placed individuals into three monophyletic groups, perhaps because of the stricter analytical procedures used for SNP calling. This study and other recent studies on the genomics of museum specimens have profound implications for natural history collections, where millions of older specimens should now be considered genomic resources.

192 citations


Journal ArticleDOI
TL;DR: This study presents TESS3, a major update of the spatial ancestry estimation program TESS, which provides estimates of ancestry coefficients with accuracy comparable to TESS and with run‐times much faster than the Bayesian version.
Abstract: Geography and landscape are important determinants of genetic variation in natural populations, and several ancestry estimation methods have been proposed to investigate population structure using genetic and geographic data simultaneously. Those approaches are often based on computer-intensive stochastic simulations and do not scale with the dimensions of the data sets generated by high-throughput sequencing technologies. There is a growing demand for faster algorithms able to analyse genomewide patterns of population genetic variation in their geographic context. In this study, we present TESS3, a major update of the spatial ancestry estimation program TESS. By combining matrix factorization and spatial statistical methods, TESS3 provides estimates of ancestry coefficients with accuracy comparable to TESS and with run-times much faster than the Bayesian version. In addition, the TESS3 program can be used to perform genome scans for selection, and separate adaptive from nonadaptive genetic variation using ancestral allele frequency differentiation tests. The main features of TESS3 are illustrated using simulated data and analysing genomic data from European lines of the plant species Arabidopsis thaliana.

180 citations


Journal ArticleDOI
TL;DR: The results indicate that the 50/50 RCF approach is an effective tool for evaluating and correcting biases in DNA metabarcoding studies and is applicable to field studies.
Abstract: DNA metabarcoding is a powerful new tool allowing characterization of species assemblages using high-throughput amplicon sequencing. The utility of DNA metabarcoding for quantifying relative species abundances is currently limited by both biological and technical biases which influence sequence read counts. We tested the idea of sequencing 50/50 mixtures of target species and a control species in order to generate relative correction factors (RCFs) that account for multiple sources of bias and are applicable to field studies. RCFs will be most effective if they are not affected by input mass ratio or co-occurring species. In a model experiment involving three target fish species and a fixed control, we found RCFs did vary with input ratio but in a consistent fashion, and that 50/50 RCFs applied to DNA sequence counts from various mixtures of the target species still greatly improved relative abundance estimates (e.g. average per species error of 19 ± 8% for uncorrected vs. 3 ± 1% for corrected estimates). To demonstrate the use of correction factors in a field setting, we calculated 50/50 RCFs for 18 harbour seal (Phoca vitulina) prey species (RCFs ranging from 0.68 to 3.68). Applying these corrections to field-collected seal scats affected species percentages from individual samples (Δ 6.7 ± 6.6%) more than population-level species estimates (Δ 1.7 ± 1.2%). Our results indicate that the 50/50 RCF approach is an effective tool for evaluating and correcting biases in DNA metabarcoding studies. The decision to apply correction factors will be influenced by the feasibility of creating tissue mixtures for the target species, and the level of accuracy needed to meet research objectives.

173 citations


Journal ArticleDOI
TL;DR: Comparisons showed that the MP Biomedicals FastDNA SPIN Kit yielded the most carp eDNA and was the most sensitive for detection purposes, despite minor inhibition, and the MoBio PowerSoil DNA Isolation Kit had the lowest coefficient of variation in extraction efficiency between lake and well water and had no detectable inhibition.
Abstract: Few studies have examined capture and extraction methods for environmental DNA (eDNA) to identify techniques optimal for detection and quantification. In this study, precipitation, centrifugation and filtration eDNA capture methods and six commercially available DNA extraction kits were evaluated for their ability to detect and quantify common carp (Cyprinus carpio) mitochondrial DNA using quantitative PCR in a series of laboratory experiments. Filtration methods yielded the most carp eDNA, and a glass fibre (GF) filter performed better than a similar pore size polycarbonate (PC) filter. Smaller pore sized filters had higher regression slopes of biomass to eDNA, indicating that they were potentially more sensitive to changes in biomass. Comparison of DNA extraction kits showed that the MP Biomedicals FastDNA SPIN Kit yielded the most carp eDNA and was the most sensitive for detection purposes, despite minor inhibition. The MoBio PowerSoil DNA Isolation Kit had the lowest coefficient of variation in extraction efficiency between lake and well water and had no detectable inhibition, making it most suitable for comparisons across aquatic environments. Of the methods tested, we recommend using a 1.5 μm GF filter, followed by extraction with the MP Biomedicals FastDNA SPIN Kit for detection. For quantification of eDNA, filtration through a 0.2-0.6 μm pore size PC filter, followed by extraction with MoBio PowerSoil DNA Isolation Kit was optimal. These results are broadly applicable for laboratory studies on carps and potentially other cyprinids. The recommendations can also be used to inform choice of methodology for field studies.

173 citations


Journal ArticleDOI
TL;DR: The importance of controlling for false detection from early steps of eDNA analyses (laboratory, bioinformatics), to improve the quality of results and allow an efficient use of the site occupancy‐detection modelling (SODM) framework for limiting false presences in eDNA analysis is discussed.
Abstract: Environmental DNA (eDNA) and metabarcoding are boosting our ability to acquire data on species distribution in a variety of ecosystems. Nevertheless, as most of sampling approaches, eDNA is not perfect. It can fail to detect species that are actually present, and even false positives are possible: a species may be apparently detected in areas where it is actually absent. Controlling false positives remains a main challenge for eDNA analyses: in this issue of Molecular Ecology Resources, Lahoz-Monfort et al. () test the performance of multiple statistical modelling approaches to estimate the rate of detection and false positives from eDNA data. Here, we discuss the importance of controlling for false detection from early steps of eDNA analyses (laboratory, bioinformatics), to improve the quality of results and allow an efficient use of the site occupancy-detection modelling (SODM) framework for limiting false presences in eDNA analysis.

Journal ArticleDOI
TL;DR: A model to estimate target DNA concentration and dispersion at survey sites and to estimate the sensitivity of an eDNA survey method is presented and it is shown how these data can be used to compare sampling schemes that differ in the number of field samples collected per site and number of PCR replicates per sample to achieve ≥95% sensitivity at a given targetDNA concentration.
Abstract: Imperfect sensitivity, or imperfect detection, is a feature of all survey methods that needs to be accounted for when interpreting survey results. Detection of environmental DNA (eDNA) is increasingly being used to infer species distributions, yet the sensitivity of the technique has not been fully evaluated. Sensitivity, or the probability of detecting target DNA given it is present at a site, will depend on both the survey method and the concentration and dispersion of target DNA molecules at a site. We present a model to estimate target DNA concentration and dispersion at survey sites and to estimate the sensitivity of an eDNA survey method. We fitted this model to data from a species-specific eDNA survey for Oriental weatherloach, Misgurnus anguillicaudatus, at three sites sampled in both autumn and spring. The concentration of target DNA molecules was similar at all three sites in autumn but much higher at two sites in spring. Our analysis showed the survey method had ≥95% sensitivity at sites where target DNA concentrations were ≥11 molecules per litre. We show how these data can be used to compare sampling schemes that differ in the number of field samples collected per site and number of PCR replicates per sample to achieve ≥95% sensitivity at a given target DNA concentration. These models allow researchers to quantify the sensitivity of eDNA survey methods to optimize the probability of detecting target species, and to compare DNA concentrations spatially and temporarily.

Journal ArticleDOI
TL;DR: This work advocates alternative approaches to account for false‐positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false‐ positive errors.
Abstract: Environmental DNA (eDNA) sampling is prone to both false-positive and false-negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false-positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false-positive rates. We advocate alternative approaches to account for false-positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false-positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false-negative and false-positive errors, the methods presented here should be more routinely adopted in eDNA studies.

Journal ArticleDOI
TL;DR: Custom exon capture provides a complement to existing, more generic target capture methods and is a practical and robust option across low‐moderate levels of phylogenetic divergence.
Abstract: The evolutionary histories of species are not measured directly, but estimated using genealogies inferred for particular loci. Individual loci can have discordant histories, but in general we expect to infer evolutionary histories more accurately as more of the genome is sampled. High Throughput Sequencing (HTS) is now providing opportunities to incorporate thousands of loci in ‘phylogenomic’ studies. Here, we used target enrichment to sequence c.3000 protein-coding exons in a group of Australian skink lizards (crown group age c.80 Ma). This method uses synthetic probes to ‘capture’ target exons that were identified in the transcriptomes of selected probe design (PD) samples. The target exons are then enriched in sample DNA libraries prior to performing HTS. Our main goal was to study the efficacy of enrichment of targeted loci at different levels of phylogenetic divergence from the PD species. In taxa sharing a common ancestor with PD samples up to c.20 Ma, we detected little reduction in efficacy, measured here as sequencing depth of coverage. However, at around 80 Myr divergence from the PD species, we observed an approximately two-fold reduction in efficacy. A secondary goal was to develop a workflow for analysing exon capture studies of phylogenetically diverse samples, while minimizing potential bias. Our approach assembles each exon in each sample separately, by first recruiting short sequencing reads having homology to the corresponding protein sequence. In sum, custom exon capture provides a complement to existing, more generic target capture methods and is a practical and robust option across low-moderate levels of phylogenetic divergence.

Journal ArticleDOI
TL;DR: It is shown that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and provide a cost‐effective solution for downstream applications, including DNA sequencing on HTS platforms.
Abstract: The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules retrieved but also the relative importance of endogenous DNA fragments and their molecular diversity. Therefore, these methods provide a cost-effective solution for downstream applications, including DNA sequencing on HTS platforms.

Journal ArticleDOI
TL;DR: RADcap is introduced, an approach that combines the major benefits of RADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches.
Abstract: Molecular ecologists seek to genotype hundreds to thousands of loci from hundreds to thousands of individuals at minimal cost per sample. Current methods, such as restriction-site-associated DNA sequencing (RADseq) and sequence capture, are constrained by costs associated with inefficient use of sequencing data and sample preparation. Here, we introduce RADcap, an approach that combines the major benefits of RADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches. RADcap uses a new version of dual-digest RADseq (3RAD) to identify candidate SNP loci for capture bait design and subsequently uses custom sequence capture baits to consistently enrich candidate SNP loci across many individuals. We combined this approach with a new library preparation method for identifying and removing PCR duplicates from 3RAD libraries, which allows researchers to process RADseq data using traditional pipelines, and we tested the RADcap method by genotyping sets of 96-384 Wisteria plants. Our results demonstrate that our RADcap method: (i) methodologically reduces (to 90% of individuals at up to 99.8% of the targeted loci, (iv) produces consistently high occupancy matrices of genotypes across hundreds of individuals and (v) costs significantly less than current approaches.

Journal ArticleDOI
TL;DR: It is found that TagSeq measured the control RNA distribution more accurately than NEBNext®, for a fraction of the cost per sample, and was particularly apparent for transcripts of moderate to low abundance.
Abstract: RNAseq is a relatively new tool for ecological genetics that offers researchers insight into changes in gene expression in response to a myriad of natural or experimental conditions. However, standard RNAseq methods (e.g., Illumina TruSeq® or NEBNext® ) can be cost prohibitive, especially when study designs require large sample sizes. Consequently, RNAseq is often underused as a method, or is applied to small sample sizes that confer poor statistical power. Low cost RNAseq methods could therefore enable far greater and more powerful applications of transcriptomics in ecological genetics and beyond. Standard mRNAseq is costly partly because one sequences portions of the full length of all transcripts. Such whole-mRNA data are redundant for estimates of relative gene expression. TagSeq is an alternative method that focuses sequencing effort on mRNAs' 3' end, reducing the necessary sequencing depth per sample, and thus cost. We present a revised TagSeq library construction procedure, and compare its performance against NEBNext® , the 'gold-standard' whole mRNAseq method. We built both TagSeq and NEBNext® libraries from the same biological samples, each spiked with control RNAs. We found that TagSeq measured the control RNA distribution more accurately than NEBNext® , for a fraction of the cost per sample (~10%). The higher accuracy of TagSeq was particularly apparent for transcripts of moderate to low abundance. Technical replicates of TagSeq libraries are highly correlated, and were correlated with NEBNext® results. Overall, we show that our modified TagSeq protocol is an efficient alternative to traditional whole mRNAseq, offering researchers comparable data at greatly reduced cost.

Journal ArticleDOI
TL;DR: This work developed a method to quantify the relative proportion of native and non‐native DNA based on a single‐nucleotide polymorphism using cycling probe technology in real‐time PCR and revealed a promising method for risk assessment and management in biodiversity conservation.
Abstract: The invasion of non-native species that are closely related to native species can lead to competitive elimination of the native species and/or genomic extinction through hybridization. Such invasions often become serious before they are detected, posing unprecedented threats to biodiversity. A Japanese native strain of common carp (Cyprinus carpio) has become endangered owing to the invasion of non-native strains introduced from the Eurasian continent. Here, we propose a rapid environmental DNA-based approach to quantitatively monitor the invasion of non-native genotypes. Using this system, we developed a method to quantify the relative proportion of native and non-native DNA based on a single-nucleotide polymorphism using cycling probe technology in real-time PCR. The efficiency of this method was confirmed in aquarium experiments, where the quantified proportion of native and non-native DNA in the water was well correlated to the biomass ratio of native and non-native genotypes. This method provided quantitative estimates for the proportion of native and non-native DNA in natural rivers and reservoirs, which allowed us to estimate the degree of invasion of non-native genotypes without catching and analysing individual fish. Our approach would dramatically facilitate the process of quantitatively monitoring the invasion of non-native conspecifics in aquatic ecosystems, thus revealing a promising method for risk assessment and management in biodiversity conservation.

Journal ArticleDOI
TL;DR: The oak genome consortium describes its principal lines of work and future directions for analyses of the nature, function and evolution of the oak genome.
Abstract: The 1.5 Gbp/2C genome of pedunculate oak (Quercus robur) has been sequenced. A strategy was established for dealing with the challenges imposed by the sequencing of such a large, complex and highly heterozygous genome by a whole-genome shotgun (WGS) approach, without the use of costly and time-consuming methods, such as fosmid or BAC clone-based hierarchical sequencing methods. The sequencing strategy combined short and long reads. Over 49 million reads provided by Roche 454 GS-FLX technology were assembled into contigs and combined with shorter Illumina sequence reads from paired-end and mate-pair libraries of different insert sizes, to build scaffolds. Errors were corrected and gaps filled with Illumina paired-end reads and contaminants detected, resulting in a total of 17,910 scaffolds (>2 kb) corresponding to 1.34 Gb. Fifty per cent of the assembly was accounted for by 1468 scaffolds (N50 of 260 kb). Initial comparison with the phylogenetically related Prunus persica gene model indicated that genes for 84.6% of the proteins present in peach (mean protein coverage of 90.5%) were present in our assembly. The second and third steps in this project are genome annotation and the assignment of scaffolds to the oak genetic linkage map. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement, the oak genome data have been released into public sequence repositories in advance of publication. In this presubmission paper, the oak genome consortium describes its principal lines of work and future directions for analyses of the nature, function and evolution of the oak genome.

Journal ArticleDOI
TL;DR: This novel high‐density SNP panel will be very useful for the dissection of economically and ecologically relevant traits, enhancing breeding programmes through genomic selection as well as supporting genetic studies in both wild and farmed populations of Atlantic salmon using high‐resolution genomewide information.
Abstract: A considerable number of single nucleotide polymorphisms (SNPs) are required to elucidate genotype-phenotype associations and determine the molecular basis of important traits. In this work, we carried out de novo SNP discovery accounting for both genome duplication and genetic variation from American and European salmon populations. A total of 9 736 473 nonredundant SNPs were identified across a set of 20 fish by whole-genome sequencing. After applying six bioinformatic filtering steps, 200 K SNPs were selected to develop an Affymetrix Axiom(®) myDesign Custom Array. This array was used to genotype 480 fish representing wild and farmed salmon from Europe, North America and Chile. A total of 159 099 (79.6%) SNPs were validated as high quality based on clustering properties. A total of 151 509 validated SNPs showed a unique position in the genome. When comparing these SNPs against 238 572 markers currently available in two other Atlantic salmon arrays, only 4.6% of the SNP overlapped with the panel developed in this study. This novel high-density SNP panel will be very useful for the dissection of economically and ecologically relevant traits, enhancing breeding programmes through genomic selection as well as supporting genetic studies in both wild and farmed populations of Atlantic salmon using high-resolution genomewide information.

Journal ArticleDOI
TL;DR: It is shown that eDNA technology can be effectively used in tropical ecosystems to detect invasive fish species and established a minimum detection limit for tilapia, and high water temperatures did not affect eDNA degradation rates.
Abstract: Invasive species pose a major threat to aquatic ecosystems. Their impact can be particularly severe in tropical regions, like those in northern Australia, where >20 invasive fish species are recorded. In temperate regions, environmental DNA (eDNA) technology is gaining momentum as a tool to detect aquatic pests, but the technology's effectiveness has not been fully explored in tropical systems with their unique climatic challenges (i.e. high turbidity, temperatures and ultraviolet light). In this study, we modified conventional eDNA protocols for use in tropical environments using the invasive fish, Mozambique tilapia (Oreochromis mossambicus) as a detection model. We evaluated the effects of high water temperatures and fish density on the detection of tilapia eDNA, using filters with larger pores to facilitate filtration. Large-pore filters (20 μm) were effective in filtering turbid waters and retaining sufficient eDNA, whilst achieving filtration times of 2-3 min per 2-L sample. High water temperatures, often experienced in the tropics (23, 29, 35 °C), did not affect eDNA degradation rates, although high temperatures (35 °C) did significantly increase fish eDNA shedding rates. We established a minimum detection limit for tilapia (1 fish/0.4 megalitres/after 4 days) and found that low water flow (3.17 L/s) into ponds with high fish density (>16 fish/0.4 megalitres) did not affect eDNA detection. These results demonstrate that eDNA technology can be effectively used in tropical ecosystems to detect invasive fish species. © 2016 John Wiley & Sons Ltd.

Journal ArticleDOI
TL;DR: Compared to traditional field sampling, eDNA provided improved occupancy parameter estimates and can be applied to increase management efficiency across a broad spatial range and within a diversity of habitats.
Abstract: Environmental DNA (eDNA) monitoring approaches promise to greatly improve detection of rare, endangered and invasive species in comparison with traditional field approaches. Herein, eDNA approaches and traditional seining methods were applied at 29 research locations to compare method-specific estimates of detection and occupancy probabilities for endangered tidewater goby (Eucyclogobius newberryi). At each location, multiple paired seine hauls and water samples for eDNA analysis were taken, ranging from two to 23 samples per site, depending upon habitat size. Analysis using a multimethod occupancy modelling framework indicated that the probability of detection using eDNA was nearly double (0.74) the rate of detection for seining (0.39). The higher detection rates afforded by eDNA allowed determination of tidewater goby occupancy at two locations where they have not been previously detected and at one location considered to be locally extirpated. Additionally, eDNA concentration was positively related to tidewater goby catch per unit effort, suggesting eDNA could potentially be used as a proxy for local tidewater goby abundance. Compared to traditional field sampling, eDNA provided improved occupancy parameter estimates and can be applied to increase management efficiency across a broad spatial range and within a diversity of habitats.

Journal ArticleDOI
TL;DR: Amplisas is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users and is successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies.
Abstract: Next-generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus-specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post-processing of NGS data. Amplicon Sequence Assignment (AMPLISAS) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. AMPLISAS is designed as a three-step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. AMPLISAS performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies.

Journal ArticleDOI
TL;DR: Next‐generation sequencing is employed to recover sequences for the barcode region of the cytochrome c oxidase 1 gene from small amounts of template DNA in type specimens of Lepidoptera to anticipate a future where barcode sequences are available from most type specimens.
Abstract: Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen. Many other individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavourable for DNA preservation, success in sequence recovery has been uncertain. This study addresses this challenge by employing next-generation sequencing (NGS) to recover sequences for the barcode region of the cytochrome c oxidase 1 gene from small amounts of template DNA. DNA quality was first screened in more than 1800 century-old type specimens of Lepidoptera by attempting to recover 164-bp and 94-bp reads via Sanger sequencing. This analysis permitted the assignment of each specimen to one of three DNA quality categories – high (164-bp sequence), medium (94-bp sequence) or low (no sequence). Ten specimens from each category were subsequently analysed via a PCR-based NGS protocol requiring very little template DNA. It recovered sequence information from all specimens with average read lengths ranging from 458 bp to 610 bp for the three DNA categories. By sequencing ten specimens in each NGS run, costs were similar to Sanger analysis. Future increases in the number of specimens processed in each run promise substantial reductions in cost, making it possible to anticipate a future where barcode sequences are available from most type specimens.

Journal ArticleDOI
TL;DR: Two high‐throughput sequencing (HTS)‐based methods were evaluated: amplicon metabarcoding of the cytochrome C oxidase subunit I (COI) mitochondrial gene and gene enrichment using MYbaits (targeting nine different genes including COI).
Abstract: Recent studies have advocated biomonitoring using DNA techniques. In this study, two high-throughput sequencing (HTS)-based methods were evaluated: amplicon metabarcoding of the cytochrome C oxidase subunit I (COI) mitochondrial gene and gene enrichment using MYbaits (targeting nine different genes including COI). The gene-enrichment method does not require PCR amplification and thus avoids biases associated with universal primers. Macroinvertebrate samples were collected from 12 New Zealand rivers. Macroinvertebrates were morphologically identified and enumerated, and their biomass determined. DNA was extracted from all macroinvertebrate samples and HTS undertaken using the illumina miseq platform. Macroinvertebrate communities were characterized from sequence data using either six genes (three of the original nine were not used) or just the COI gene in isolation. The gene-enrichment method (all genes) detected the highest number of taxa and obtained the strongest Spearman rank correlations between the number of sequence reads, abundance and biomass in 67% of the samples. Median detection rates across rare ( 5%) taxa were highest using the gene-enrichment method (all genes). Our data indicated primer biases occurred during amplicon metabarcoding with greater than 80% of sequence reads originating from one taxon in several samples. The accuracy and sensitivity of both HTS methods would be improved with more comprehensive reference sequence databases. The data from this study illustrate the challenges of using PCR amplification-based methods for biomonitoring and highlight the potential benefits of using approaches, such as gene enrichment, which circumvent the need for an initial PCR step.

Journal ArticleDOI
TL;DR: If most BIN splits detected in this study reflect cryptic taxa, the true species count for Canadian spiders could be 30–50% higher than currently recognized, and DNA barcodes discriminated 98% of the 1018 species.
Abstract: Approximately 1460 species of spiders have been reported from Canada, 3% of the global fauna. This study provides a DNA barcode reference library for 1018 of these species based upon the analysis of more than 30,000 specimens. The sequence results show a clear barcode gap in most cases with a mean intraspecific divergence of 0.78% vs. a minimum nearest-neighbour (NN) distance averaging 7.85%. The sequences were assigned to 1359 Barcode index numbers (BINs) with 1344 of these BINs composed of specimens belonging to a single currently recognized species. There was a perfect correspondence between BIN membership and a known species in 795 cases, while another 197 species were assigned to two or more BINs (556 in total). A few other species (26) were involved in BIN merges or in a combination of merges and splits. There was only a weak relationship between the number of specimens analysed for a species and its BIN count. However, three species were clear outliers with their specimens being placed in 11-22 BINs. Although all BIN splits need further study to clarify the taxonomic status of the entities involved, DNA barcodes discriminated 98% of the 1018 species. The present survey conservatively revealed 16 species new to science, 52 species new to Canada and major range extensions for 426 species. However, if most BIN splits detected in this study reflect cryptic taxa, the true species count for Canadian spiders could be 30-50% higher than currently recognized.

Journal ArticleDOI
TL;DR: The effects of phylogenetic distance on capture sensitivity, specificity, and missing data are tested, and a baseline estimate of expectations for these metrics is provided based on a priori knowledge of nuclear pairwise differences among samples.
Abstract: Custom sequence capture experiments are becoming an efficient approach for gathering large sets of orthologous markers in nonmodel organisms. Transcriptome-based exon capture utilizes transcript sequences to design capture probes, typically using a reference genome to identify intron-exon boundaries to exclude shorter exons ( 200 bp), with no major reduction in coverage towards the ends of exons. We observed significant differences in the performance of blocking oligos for target enrichment and nontarget depletion during captures, and differences in PCR duplication rates resulting from the number of individuals pooled for capture reactions. We explicitly tested the effects of phylogenetic distance on capture sensitivity, specificity, and missing data, and provide a baseline estimate of expectations for these metrics based on a priori knowledge of nuclear pairwise differences among samples. We provide recommendations for transcriptome-based exon capture design based on our results, cost estimates and offer multiple pipelines for data assembly and analysis.

Journal ArticleDOI
TL;DR: Netview p as mentioned in this paper combines data quality control with the construction of population networks through mutual k-nearest neighbours thresholds applied to genome-wide SNPs, which effectively visualize large and fine-scale genetic structure within and between populations, including family-level structure and relationships.
Abstract: Network-based approaches are emerging as valuable tools for the analysis of complex genetic structure in wild and captive populations. netview p combines data quality control with the construction of population networks through mutual k-nearest neighbours thresholds applied to genome-wide SNPs. The program is cross-platform compatible, open-source and efficiently operates on data ranging from hundreds to hundreds of thousands of SNPs. The pipeline was used for the analysis of pedigree data from simulated (n = 750, SNPs = 1279) and captive silver-lipped pearl oysters (n = 415, SNPs = 1107), wild populations of the European hake from the Atlantic and Mediterranean (n = 834, SNPs = 380) and grey wolves from North America (n = 239, SNPs = 78 255). The population networks effectively visualize large- and fine-scale genetic structure within and between populations, including family-level structure and relationships. netview p comprises a network-based addition to other population analysis tools and provides user-friendly access to a complex network analysis pipeline through implementation in python.

Journal ArticleDOI
TL;DR: It is suggested that the mitogenome capture approach coupled with PCR‐free shotgun sequencing could provide ecological researchers an efficient NGS method to deliver reliable biodiversity assessment.
Abstract: Biodiversity analyses based on next-generation sequencing (NGS) platforms have developed by leaps and bounds in recent years. A PCR-free strategy, which can alleviate taxonomic bias, was considered as a promising approach to delivering reliable species compositions of targeted environments. The major impediment of such a method is the lack of appropriate mitochondrial DNA enrichment ways. Because mitochondrial genomes (mitogenomes) make up only a small proportion of total DNA, PCR-free methods will inevitably result in a huge excess of data (>99%). Furthermore, the massive volume of sequence data is highly demanding on computing resources. Here, we present a mitogenome enrichment pipeline via a gene capture chip that was designed by virtue of the mitogenome sequences of the 1000 Insect Transcriptome Evolution project (1KITE, www.1kite.org). A mock sample containing 49 species was used to evaluate the efficiency of the mitogenome capture method. We demonstrate that the proportion of mitochondrial DNA can be increased by approximately 100-fold (from the original 0.47% to 42.52%). Variation in phylogenetic distances of target taxa to the probe set could in principle result in bias in abundance. However, the frequencies of input taxa were largely maintained after capture (R(2) = 0.81). We suggest that our mitogenome capture approach coupled with PCR-free shotgun sequencing could provide ecological researchers an efficient NGS method to deliver reliable biodiversity assessment.

Journal ArticleDOI
TL;DR: An automated and interactive script is created to select hundreds of orthologous low‐copy nuclear (LCN) loci for phylogenetics in nonmodel organisms by a comparison between transcriptome and genome skim data, which indicates that organellar phylogenies alone are unlikely to represent the species tree and stresses the utility of Hyb‐Seq in phylogenetics.
Abstract: Phylogenetics benefits from using a large number of putatively independent nuclear loci and their combination with other sources of information, such as the plastid and mitochondrial genomes. To facilitate the selection of orthologous low-copy nuclear (LCN) loci for phylogenetics in nonmodel organisms, we created an automated and interactive script to select hundreds of LCN loci by a comparison between transcriptome and genome skim data. We used our script to obtain LCN genes for southern African Oxalis (Oxalidaceae), a speciose plant lineage in the Greater Cape Floristic Region. This resulted in 1164 LCN genes greater than 600 bp. Using target enrichment combined with genome skimming (Hyb-Seq), we obtained on average 1141 LCN loci, nearly the whole plastid genome and the nrDNA cistron from 23 southern African Oxalis species. Despite a wide range of gene trees, the phylogeny based on the LCN genes was very robust, as retrieved through various gene and species tree reconstruction methods as well as concatenation. Cytonuclear discordance was strong. This indicates that organellar phylogenies alone are unlikely to represent the species tree and stresses the utility of Hyb-Seq in phylogenetics.