scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Ecology Resources in 2020"


Journal ArticleDOI
TL;DR: PhyloSuite is designed for both beginners and experienced researchers, allowing the former to quick‐start their way into phylogenetic analysis, and the latter to conduct, store and manage their work in a streamlined way, and spend more time investigating scientific questions instead of wasting it on transferring files from one software program to another.
Abstract: Multigene and genomic data sets have become commonplace in the field of phylogenetics, but many existing tools are not designed for such data sets, which often makes the analysis time-consuming and tedious. Here, we present PhyloSuite, a (cross-platform, open-source, stand-alone Python graphical user interface) user-friendly workflow desktop platform dedicated to streamlining molecular sequence data management and evolutionary phylogenetics studies. It uses a plugin-based system that integrates several phylogenetic and bioinformatic tools, thereby streamlining the entire procedure, from data acquisition to phylogenetic tree annotation (in combination with iTOL). It has the following features: (a) point-and-click and drag-and-drop graphical user interface; (b) a workplace to manage and organize molecular sequence data and results of analyses; (c) GenBank entry extraction and comparative statistics; and (d) a phylogenetic workflow with batch processing capability, comprising sequence alignment (mafft and macse), alignment optimization (trimAl, HmmCleaner and Gblocks), data set concatenation, best partitioning scheme and best evolutionary model selection (PartitionFinder and modelfinder), and phylogenetic inference (MrBayes and iq-tree). PhyloSuite is designed for both beginners and experienced researchers, allowing the former to quick-start their way into phylogenetic analysis, and the latter to conduct, store and manage their work in a streamlined way, and spend more time investigating scientific questions instead of wasting it on transferring files from one software program to another.

1,144 citations


Journal ArticleDOI
TL;DR: MitoFinder, a user‐friendly bioinformatic pipeline, to efficiently assemble and annotate mitogenomic data from hundreds of UCE libraries, and shows that metagenomic assemblers, in particular MetaSPAdes, are well suited to assemble both UCEs and mtDNA.
Abstract: Thanks to the development of high-throughput sequencing technologies, target enrichment sequencing of nuclear ultraconserved DNA elements (UCEs) now allows routine inference of phylogenetic relationships from thousands of genomic markers. Recently, it has been shown that mitochondrial DNA (mtDNA) is frequently sequenced alongside the targeted loci in such capture experiments. Despite its broad evolutionary interest, mtDNA is rarely assembled and used in conjunction with nuclear markers in capture-based studies. Here, we developed MitoFinder, a user-friendly bioinformatic pipeline, to efficiently assemble and annotate mitogenomic data from hundreds of UCE libraries. As a case study, we used ants (Formicidae) for which 501 UCE libraries have been sequenced whereas only 29 mitogenomes are available. We compared the efficiency of four different assemblers (IDBA-UD, MEGAHIT, MetaSPAdes, and Trinity) for assembling both UCE and mtDNA loci. Using MitoFinder, we show that metagenomic assemblers, in particular MetaSPAdes, are well suited to assemble both UCEs and mtDNA. Mitogenomic signal was successfully extracted from all 501 UCE libraries, allowing us to confirm species identification using CO1 barcoding. Moreover, our automated procedure retrieved 296 cases in which the mitochondrial genome was assembled in a single contig, thus increasing the number of available ant mitogenomes by an order of magnitude. By utilizing the power of metagenomic assemblers, MitoFinder provides an efficient tool to extract complementary mitogenomic data from UCE libraries, allowing testing for potential mitonuclear discordance. Our approach is potentially applicable to other sequence capture methods, transcriptomic data and whole genome shotgun sequencing in diverse taxa. The MitoFinder software is available from GitHub (https://github.com/RemiAllio/MitoFinder).

427 citations


Journal ArticleDOI
TL;DR: This version presents a major update from the previous version and now offers a wide spectrum of different types of analyses, including multiple statistics for estimating population differentiation, analysis of molecular variance‐based K‐means clustering, Hardy–Weinberg equilibrium, hybrid index, population assignment, clone assignment, Mantel test, Spatial Autocorrelation.
Abstract: genodive version 3.0 is a user-friendly program for the analysis of population genetic data. This version presents a major update from the previous version and now offers a wide spectrum of different types of analyses. genodive has an intuitive graphical user interface that allows direct manipulation of the data through transformation, imputation of missing data, and exclusion and inclusion of individuals, population and/or loci. Furthermore, genodive seamlessly supports 15 different file formats for importing or exporting data from or to other programs. One major feature of genodive is that it supports both diploid and polyploid data, up to octaploidy (2n = 8x) for some analyses, but up to hexadecaploidy (2n = 16x) for other analyses. The different types of analyses offered by genodive include multiple statistics for estimating population differentiation (φST , FST , F'ST , GST , G'ST , G''ST , Dest , RST , ρ), analysis of molecular variance-based K-means clustering, Hardy-Weinberg equilibrium, hybrid index, population assignment, clone assignment, Mantel test, Spatial Autocorrelation, 23 ways of calculating genetic distances, and both principal components and principal coordinates analyses. A unique feature of genodive is that it can also open data sets with nongenetic variables, for example environmental data or geographical coordinates that can be included in the analysis. In addition, genodive makes it possible to run several external programs (lfmm, structure, instruct and vegan) directly from its own user interface, avoiding the need for data reformatting and use of the command line. genodive is available for computers running Mac OS X 10.7 or higher and can be downloaded freely from: http://www.patrickmeirmans.com/software.

181 citations


Journal ArticleDOI
TL;DR: The assembly of a 390.38‐Mb chromosome‐level genome of fall armyworm derived from south‐central Africa using Pacific Bioscience and Hi‐C sequencing technologies is reported, containing 22,260 annotated protein‐coding genes.
Abstract: The rapid wide-scale spread of fall armyworm (Spodoptera frugiperda) has caused serious crop losses34 globally However, differences in the genetic background of subpopulations and the mechanisms of rapid adaptation behind the invasion are still not well understood Here we report the assembly of a 39038-M chromosome-level genome of fall armyworm using Pacific Bioscience (PacBio) and Hi-C sequencing technologies with scaffold N50 of 127 M consisting of 22260 annotated protein-coding genes Genome-wide resequencing of 103 samples from 16 provinces in China revealed that the fall armyworm population comprises a complex inter-strain hybrid, mainly with the corn-strain genetic background and less of the rice-strain, which highlights the inaccuracy of strain identification using mitochondrial or Triosephosphate isomerase (Tpi) genes Analysis of genes related to pesticide- and Bt-resistance showed that the risk of fall armyworm developing resistance to conventional pesticides is very high Laboratory bioassay results showed that insects invading China carry resistance to organophosphate and pyrethroid pesticides, but are sensitive to genetically modified maize expressing Bacillus thuringiensis (Bt) toxins Cry1Ab in field experiments Additionally, two mitochondrial fragments are inserted into the nuclear genome, and the insertion event occurred after the differentiation of the two strains This study represents a valuable advance toward improving management strategies for fall armyworm

96 citations


Journal ArticleDOI
TL;DR: Barcodes are used, short sequences that are ligated to both ends of the DNA insert, to directly quantify the rate of index hopping in 100‐year old museum‐preserved gorilla (Gorilla beringei) samples, and it is shown that sample‐specific quantity of misassigned reads depends on the number of reads that any given sample contributes to the total sequencing pool, so that samples with few sequenced reads receive the greatest proportion of misAssigned reads.
Abstract: The high-throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for lllumina sequencing machines with exclusion amplification chemistry. This may make use of these platforms prohibitive, particularly in studies that rely on low-quantity and low-quality samples, such as historical and archaeological specimens. Here, we use barcodes, short sequences that are ligated to both ends of the DNA insert, to directly quantify the rate of index hopping in 100-year old museum-preserved gorilla (Gorilla beringei) samples. Correcting for multiple sources of noise, we identify on average 0.470% of reads containing a hopped index. We show that sample-specific quantity of misassigned reads depends on the number of reads that any given sample contributes to the total sequencing pool, so that samples with few sequenced reads receive the greatest proportion of misassigned reads. This particularly affects ancient DNA samples, as these frequently differ in their DNA quantity and endogenous content. Through simulations we show that even low rates of index hopping, as reported here, can lead to biases in ancient DNA studies when multiplexing samples with vastly different quantities of endogenous material.

81 citations


Journal ArticleDOI
TL;DR: The high‐quality of the fall armyworm genome provides an important genomic resource for further explorations of the mechanisms of polyphagia and insecticide resistance, as well as for pest management of fall armyworms.
Abstract: The fall armyworm (Spodoptera frugiperda) is a lepidopteran insect pest that causes huge economic losses. This notorious insect pest has rapidly spread over the world in the past few years. However, the mechanisms of rapid dispersal are not well understood. Here, we report a chromosome-level assembled genome of the fall armyworm, named the ZJ-version, using PacBio and Hi-C technology. The sequenced individual was a female collected from the Zhejiang province of China and had high heterozygosity. The assembled genome size of ZJ-version was 486 Mb, containing 361 contigs with an N50 of 1.13 Mb. Hi-C scaffolding further assembled the genome into 31 chromosomes and a portion of W chromosome, representing 97.4% of all contigs and resulted in a chromosome-level genome with scaffold N50 of 16.3 Mb. The sex chromosomes were identified by genome resequencing of a single male pupa and a single female pupa. About 28% of the genome was annotated as repeat sequences, and 22,623 protein-coding genes were identified. Comparative genomics revealed the expansion of the detoxification-associated gene families, chemoreception-associated gene families, nutrition metabolism and transport system gene families in the fall armyworm. Transcriptomic and phylogenetic analyses focused on these gene families revealed the potential roles of the genes in polyphagia and invasion of fall armyworm. The high-quality of the fall armyworm genome provides an important genomic resource for further explorations of the mechanisms of polyphagia and insecticide resistance, as well as for pest management of fall armyworm.

77 citations


Journal ArticleDOI
TL;DR: EnTAP (Eukaryotic Non‐Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non‐model eukaryotes.
Abstract: EnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates of protein-coding transcripts. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the reduced set of translated proteins. Downstream features include fast similarity search across five repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology (GO) term assignment. The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness. Researchers have the option to include additional filters to identify and remove contaminants, identify associated pathways, and prepare the transcripts for enrichment analysis. This fully featured pipeline is easy to install, configure, and runs significantly faster than comparable annotation packages. EnTAP is optimized to generate extensive functional information for the gene space of organisms with limited or poorly characterized genomic resources.

76 citations


Journal ArticleDOI
TL;DR: This work generated a high‐continuity chromosome‐scale yellow perch genome assembly that contains a male‐specific duplicate of the anti‐Mullerian hormone type II receptor gene inserted at the proximal end of the Y chromosome, and developed a simple PCR genotyping assay which accurately differentiates XY genetic males from XX genetic females.
Abstract: Yellow perch, Perca flavescens, is an ecologically and economically important species native to a large portion of the northern United States and southern Canada and is also a promising candidate species for aquaculture. However, no yellow perch reference genome has been available to facilitate improvements in both fisheries and aquaculture management practices. By combining Oxford Nanopore Technologies long-reads, 10X Genomics Illumina short linked reads and a chromosome contact map produced with Hi-C, we generated a high-continuity chromosome-scale yellow perch genome assembly of 877.4 Mb. It contains, in agreement with the known diploid chromosome yellow perch count, 24 chromosome-size scaffolds covering 98.8% of the complete assembly (N50 = 37.4 Mb, L50 = 11). We also provide a first characterization of the yellow perch sex determination locus that contains a male-specific duplicate of the anti-Mullerian hormone type II receptor gene (amhr2by) inserted at the proximal end of the Y chromosome (chromosome 9). Using this sex-specific information, we developed a simple PCR genotyping assay which accurately differentiates XY genetic males (amhr2by+ ) from XX genetic females (amhr2by- ). Our high-quality genome assembly is an important genomic resource for future studies on yellow perch ecology, toxicology, fisheries and aquaculture research. In addition, characterization of the amhr2by gene as a candidate sex-determining gene in yellow perch provides a new example of the recurrent implication of the transforming growth factor beta pathway in fish sex determination, and highlights gene duplication as an important genomic mechanism for the emergence of new master sex determination genes.

69 citations


Journal ArticleDOI
TL;DR: A nested probe set strategy increases confidence in taxonomic identification because targets are confirmed with two or more probes, reducing false positives, and this research provides a valuable resource to investigate biofilm development, succession and associations between specific microscopic taxa at micrometre scales.
Abstract: Plastic marine debris (PMD) affects spatial scales of life from microbes to whales. However, understanding interactions between plastic and microbes in the "Plastisphere"-the thin layer of life on the surface of PMD-has been technology-limited. Research into microbe-microbe and microbe-substrate interactions requires knowledge of community phylogenetic composition but also tools to visualize spatial distributions of intact microbial biofilm communities. We developed a CLASI-FISH (combinatorial labelling and spectral imaging - fluorescence in situ hybridization) method using confocal microscopy to study Plastisphere communities. We created a probe set consisting of three existing phylogenetic probes (targeting all Bacteria, Alpha-, and Gammaproteobacteria) and four newly designed probes (targeting Bacteroidetes, Vibrionaceae, Rhodobacteraceae and Alteromonadaceae) labelled with a total of seven fluorophores and validated this probe set using pure cultures. Our nested probe set strategy increases confidence in taxonomic identification because targets are confirmed with two or more probes, reducing false positives. We simultaneously identified and visualized these taxa and their spatial distribution within the microbial biofilms on polyethylene samples in colonization time series experiments in coastal environments from three different biogeographical regions. Comparing the relative abundance of 16S rRNA gene amplicon sequencing data with cell-count abundance data retrieved from the microscope images of the same samples showed a good agreement in bacterial composition. Microbial communities were heterogeneous, with direct spatial relationships between bacteria, cyanobacteria and eukaryotes such as diatoms but also micro-metazoa. Our research provides a valuable resource to investigate biofilm development, succession and associations between specific microscopic taxa at micrometre scales.

68 citations


Journal ArticleDOI
TL;DR: This work amplified soil DNA and used PacBio Circular Consensus Sequencing to obtain an ~4500‐bp region spanning most of the eukaryotic small sub unit (18S) and large subunit (28S) ribosomal DNA genes, allowing it to accurately derive the evolutionary origin of environmental diversity.
Abstract: High-throughput DNA metabarcoding of amplicon sizes below 500 bp has revolutionized the analysis of environmental microbial diversity. However, these short regions contain limited phylogenetic signal, which makes it impractical to use environmental DNA in full phylogenetic inferences. This lesser phylogenetic resolution of short amplicons may be overcome by new long-read sequencing technologies. To test this idea, we amplified soil DNA and used PacBio Circular Consensus Sequencing (CCS) to obtain an ~4500-bp region spanning most of the eukaryotic small subunit (18S) and large subunit (28S) ribosomal DNA genes. We first treated the CCS reads with a novel curation workflow, generating 650 high-quality operational taxonomic units (OTUs) containing the physically linked 18S and 28S regions. To assign taxonomy to these OTUs, we developed a phylogeny-aware approach based on the 18S region that showed greater accuracy and sensitivity than similarity-based methods. The taxonomically annotated OTUs were then combined with available 18S and 28S reference sequences to infer a well-resolved phylogeny spanning all major groups of eukaryotes, allowing us to accurately derive the evolutionary origin of environmental diversity. A total of 1,019 sequences were included, of which a majority (58%) corresponded to the new long environmental OTUs. The long reads also allowed us to directly investigate the relationships among environmental sequences themselves, which represents a key advantage over the placement of short reads on a reference phylogeny. Together, our results show that long amplicons can be treated in a full phylogenetic framework to provide greater taxonomic resolution and a robust evolutionary perspective to environmental DNA.

63 citations


Journal ArticleDOI
TL;DR: Practical guidelines for the standardized development of reduced single nucleotide polymorphism (SNP) panels applicable for microfluidic genotyping of degraded DNA samples, such as faeces or hairs are provided.
Abstract: The genomic era has led to an unprecedented increase in the availability of genome-wide data for a broad range of taxa. Wildlife management strives to make use of these vast resources to enable refined genetic assessments that enhance biodiversity conservation. However, as new genomic platforms emerge, problems remain in adapting the usually complex approaches for genotyping of noninvasively collected wildlife samples. Here, we provide practical guidelines for the standardized development of reduced single nucleotide polymorphism (SNP) panels applicable for microfluidic genotyping of degraded DNA samples, such as faeces or hairs. We demonstrate how microfluidic SNP panels can be optimized to efficiently monitor European wildcat (Felis silvestris S.) populations. We show how panels can be set up in a modular fashion to accommodate informative markers for relevant population genetics questions, such as individual identification, hybridization assessment and the detection of population structure. We discuss various aspects regarding the implementation of reduced SNP panels and provide a framework that will allow both molecular ecologists and practitioners to help bridge the gap between genomics and applied wildlife conservation.

Journal ArticleDOI
TL;DR: Two new eDNA aggregation approaches are developed that overcome the challenges of above‐ground terrestrial sampling and eliminate the dependency on creating or utilizing pre‐existing water bodies to conduct eDNA sampling and represent a novel strategy for surveying terrestrial biodiversity.
Abstract: The use of environmental DNA (eDNA) surveys to monitor terrestrial species has been relatively limited, with successful implementations still confined to sampling DNA from natural or artificial water bodies and soil. Sampling water for eDNA depends on proximity to or availability of water, whereas eDNA from soil is limited in its spatial scale due to the large quantities necessary for processing and difficulty in doing so. These challenges limit the widespread use of eDNA in several systems, such as surveying forests for invasive insects. We developed two new eDNA aggregation approaches that overcome the challenges of above-ground terrestrial sampling and eliminate the dependency on creating or utilizing pre-existing water bodies to conduct eDNA sampling. The first, "spray aggregation," uses spray action to remove eDNA from surface substrates and was developed for shrubs and other understorey vegetation, while the second, "tree rolling," uses physical transfer via a roller to remove eDNA from the surface of tree trunks and large branches. We tested these approaches by surveying for spotted lanternfly, Lycorma delicatula, a recent invasive pest of northeastern USA that is considered a significant ecological and economic threat to forests and agriculture. We found that our terrestrial eDNA surveys matched visual surveys, but also detected L. delicatula presence ahead of visual surveys, indicating increased sensitivity of terrestrial eDNA surveys over currently used methodology. The terrestrial eDNA approaches we describe can be adapted for use in surveying a variety of forest insects and represent a novel strategy for surveying terrestrial biodiversity.

Journal ArticleDOI
TL;DR: The overall diversity reported by metagenomics was similar to that obtained by amplicon sequencing of the V4 and V9 regions of the 18S rRNA gene, although either one or both of these amplicon surveys performed poorly for groups like Excavata, Amoebozoa, Fungi and Haptophyta.
Abstract: Surveying microbial diversity and function is accomplished by combining complementary molecular tools. Among them, metagenomics is a PCR free approach that contains all genetic information from microbial assemblages and is today performed at a relatively large scale and reasonable cost, mostly based on very short reads. Here, we investigated the potential of metagenomics to provide taxonomic reports of marine microbial eukaryotes. We prepared a curated database with reference sequences of the V4 region of 18S rDNA clustered at 97% similarity and used this database to extract and classify metagenomic reads. More than half of them were unambiguously affiliated to a unique reference whilst the rest could be assigned to a given taxonomic group. The overall diversity reported by metagenomics was similar to that obtained by amplicon sequencing of the V4 and V9 regions of the 18S rRNA gene, although either one or both of these amplicon surveys performed poorly for groups like Excavata, Amoebozoa, Fungi and Haptophyta. We then studied the diversity of picoeukaryotes and nanoeukaryotes using 91 metagenomes from surface down to bathypelagic layers in different oceans, unveiling a clear taxonomic separation between size fractions and depth layers. Finally, we retrieved long rDNA sequences from assembled metagenomes that improved phylogenetic reconstructions of particular groups. Overall, this study shows metagenomics as an excellent resource for taxonomic exploration of marine microbial eukaryotes.

Journal ArticleDOI
TL;DR: SPIKEPIPE provides cost‐efficient and reliable quantification of eukaryotic communities and achieves a strikingly high accuracy of intraspecific abundance estimates from samples of known composition and a high repeatability across environmental‐sample replicates.
Abstract: The accurate quantification of eukaryotic species abundances from bulk samples remains a key challenge for community ecology and environmental biomonitoring. We resolve this challenge by combining shotgun sequencing, mapping to reference DNA barcodes or to mitogenomes, and three correction factors: (a) a percent-coverage threshold to filter out false positives, (b) an internal-standard DNA spike-in to correct for stochasticity during sequencing, and (c) technical replicates to correct for stochasticity across sequencing runs. The SPIKEPIPE pipeline achieves a strikingly high accuracy of intraspecific abundance estimates (in terms of DNA mass) from samples of known composition (mapping to barcodes R2 = .93, mitogenomes R2 = .95) and a high repeatability across environmental-sample replicates (barcodes R2 = .94, mitogenomes R2 = .93). As proof of concept, we sequence arthropod samples from the High Arctic, systematically collected over 17 years, detecting changes in species richness, species-specific abundances, and phenology. SPIKEPIPE provides cost-efficient and reliable quantification of eukaryotic communities.

Journal ArticleDOI
TL;DR: The most common techniques for the collection, preservation and extraction of metazoan eDNA from water samples are reviewed, focusing on experimental studies that compare various methods and outline the numerous challenges associated with eDNA.
Abstract: Environmental DNA (eDNA) is rapidly growing in popularity as a tool for community assessments and species detection. While eDNA approaches are now widely applied, there is not yet agreement on best practices for sample collection and processing. Investigators looking to integrate eDNA approaches into their research programme are required to examine a growing collection of disparate studies to make an often uncertain decision about which protocols best fit their needs. To promote the application of eDNA approaches and to encourage the generation of high-quality data, here we review the most common techniques for the collection, preservation and extraction of metazoan eDNA from water samples. Specifically, we focus on experimental studies that compare various methods and outline the numerous challenges associated with eDNA. While the diverse applications of eDNA do not lend themselves to a one-size-fits-all recommendation, in most cases, capture/concentration of eDNA on cellulose nitrate filters (with pore size determined by water turbidity), followed by storage of filters in Longmire's buffer and extraction with a DNeasy Blood & Tissue Kit (or similar) has been shown to provide sufficient, high-quality DNA. However, we also emphasize the importance of testing and optimizing protocols for the system of interest.

Journal ArticleDOI
TL;DR: Test multiple sample substrates (soil, scat, plant material and bulk arthropods) to determine what organisms can be detected from each and where they overlap demonstrate the importance of selecting appropriate metabarcoding substrates when undertaking terrestrial surveys.
Abstract: Biological surveys based on visual identification of the biota are challenging, expensive and time consuming, yet crucial for effective biomonitoring. DNA metabarcoding is a rapidly developing technology that can also facilitate biological surveys. This method involves the use of next generation sequencing technology to determine the community composition of a sample. However, it is uncertain as to what biological substrate should be the primary focus of metabarcoding surveys. This study aims to test multiple sample substrates (soil, scat, plant material and bulk arthropods) to determine what organisms can be detected from each and where they overlap. Samples (n = 200) were collected in the Pilbara (hot desert climate) and Swan Coastal Plain (hot Mediterranean climate) regions of Western Australia. Soil samples yielded little plant or animal DNA, especially in the Pilbara, probably due to conditions not conducive to long-term preservation. In contrast, scat samples contained the highest overall diversity with 131 plant, vertebrate and invertebrate families detected. Invertebrate and plant sequences were detected in the plant (86 families), pitfall (127 families) and vane trap (126 families) samples. In total, 278 families were recovered from the survey, 217 in the Swan Coastal Plain and 156 in the Pilbara. Aside from soil, 22%-43% of the families detected were unique to the particular substrate, and community composition varied significantly between substrates. These results demonstrate the importance of selecting appropriate metabarcoding substrates when undertaking terrestrial surveys. If the aim is to broadly capture all biota then multiple substrates will be required.

Journal ArticleDOI
TL;DR: It was demonstrated that the three primer sets could not reach a consensus onFungal community composition or diversity, and that primer selection, not experimental treatment, determines observed soil fungal community diversity and composition.
Abstract: With the continual improvement in high-throughput sequencing technology and constant updates to fungal reference databases, the use of amplicon-based DNA markers as a tool to reveal fungal diversity and composition in various ecosystems has become feasible. However, both primer selection and the experimental procedure require meticulous verification. Here, we computationally and experimentally evaluated the accuracy and specificity of three widely used or newly designed internal transcribed spacer (ITS) primer sets (ITS1F/ITS2, gITS7/ITS4 and 5.8S-Fun/ITS4-Fun). In silico evaluation revealed that primer coverage varied at different taxonomic levels due to differences in degeneracy and the location of primer sets. Using even and staggered mock community standards, we identified different proportions of chimeric and mismatch reads generated by different primer sets, as well as great variation in species abundances, suggesting that primer selection would affect the results of amplicon-based metabarcoding studies. Choosing proofreading and high-fidelity polymerase (KAPA HiFi) could significantly reduce the percentage of chimeric and mismatch sequences, further reducing inflation of operational taxonomic units. Moreover, for two types of environmental fungal communities, plant endophytic and soil fungi, it was demonstrated that the three primer sets could not reach a consensus on fungal community composition or diversity, and that primer selection, not experimental treatment, determines observed soil fungal community diversity and composition. Future DNA marker surveys should pay greater attention to potential primer effects and improve the experimental scheme to increase credibility and accuracy.

Journal ArticleDOI
TL;DR: The findings enable conservation managers to have confidence in RRS data while understanding its limitations, and provide avenues for further investigation into which processes underlie variation in breeding success in captive Tasmanian devils.
Abstract: As species extinction rates increase, genomics provides a powerful tool to support intensive management of threatened species We use the Tasmanian devil (Sarcophilus harrisii) to demonstrate how conservation genomics can be implemented in threatened species management We conducted whole genome sequencing (WGS) of 25 individuals from the captive breeding programme and reduced-representation sequencing (RRS) of 98 founders of the same programme A subset of the WGS samples was also sequenced by RRS, allowing us to directly compare genome-wide heterozygosity with estimates from RRS data We found good congruence in interindividual variation and gene-ontology classifications between the two data sets, indicating that our RRS data reflect the genome well We also attempted genome-wide association studies with both data sets (regarding breeding success), but the genomic data suffered from small sample size, while the RRS data suffered from lack of precision, highlighting a key trade-off in the design of conservation genomic research Nevertheless, we identified a number of candidate genes that may be associated with variation in breeding success Individual heterozygosity, as measured by WGS or RRS, was not associated with breeding success in captivity but was negatively associated with litter sizes of breeding females in the RRS data set Our findings enable conservation managers to have confidence in RRS data while understanding its limitations, and provide avenues for further investigation into which processes underlie variation in breeding success in captive Tasmanian devils We caution, however, that deep functional insights using RRS may be impaired by a lack of precision, especially when marker density is low

Journal ArticleDOI
TL;DR: A database of functional traits for two widespread and ecologically important groups of protists, Cercozoa and Endomyxa (Rhizaria), intended as a common reference for the molecular ecology community and will boost the understanding of ecosystem functions, especially those driven by biological interactions.
Abstract: We have compiled a database of functional traits for two widespread and ecologically important groups of protists, Cercozoa and Endomyxa (Rhizaria). The functional traits of microorganisms are crucially important for interpreting results from environmental sequencing surveys. Linking morphological and ecological traits to environmental factors is common practice in studies involving micro- and macroorganisms, but is rarely applied to protists. Our database provides functional and ecologically significant traits linked to morphology, nutrition, locomotion and habitats. We discuss how the use of functional traits may help to unveil underlying ecosystem processes. This database is intended as a common reference for the molecular ecology community and will boost the understanding of ecosystem functions, especially those driven by biological interactions.

Journal ArticleDOI
TL;DR: The results support the validity of a shoreline sampling strategy for eDNA‐based fish community surveys in lentic systems but also suggest that a spatially comprehensive sampling design can reveal finer distribution patterns of individual species.
Abstract: Freshwater fish biodiversity is quickly decreasing and requires effective monitoring and conservation. Environmental DNA (eDNA)-based methods have been shown to be highly sensitive and cost-efficient for aquatic biodiversity surveys, but few studies have systematically investigated how spatial sampling design affects eDNA-detected fish communities across lentic systems of different sizes. We compared the spatial patterns of fish diversity determined using eDNA in three lakes of small (SL; 3 ha), medium (ML; 122 ha) and large (LL; 4,343 ha) size using a spatially explicit grid sampling method. A total of 100 water samples (including nine, 17 and 18 shoreline samples and six, 14 and 36 interior samples from SL, ML and LL, respectively) were collected, and fish communities were analysed using eDNA metabarcoding of the mitochondrial 12S region. Together, 30, 35 and 41 fish taxa were detected in samples from SL, ML, and LL, respectively. We observed that eDNA from shoreline samples effectively captured the majority of the fish diversity of entire waterbodies, and pooled samples recovered fewer species than individually processed samples. Significant spatial autocorrelations between fish communities within 250 m and 2 km of each other were detected in ML and LL, respectively. Additionally, the relative sequence abundances of many fish species exhibited spatial distribution patterns that correlated with their typical habitat occupation. Overall, our results support the validity of a shoreline sampling strategy for eDNA-based fish community surveys in lentic systems but also suggest that a spatially comprehensive sampling design can reveal finer distribution patterns of individual species.

Journal ArticleDOI
TL;DR: This high‐quality genome of P. euphratica represents a valuable resource for poplar breeding and genetic improvement in the future, as well as comparative genomic analysis with other Salicaceae species.
Abstract: Populus euphratica is well adapted to extreme desert environments and is an important model species for elucidating the mechanisms of abiotic stress resistance in trees. The current assembly of P. euphratica genome is highly fragmented with many gaps and errors, thereby impeding downstream applications. Here, we report an improved chromosome-level reference genome of P. euphratica (v2.0) using single-molecule sequencing and chromosome conformation capture (Hi-C) technologies. Relative to the previous reference genome, our assembly represents a nearly 60-fold improvement in contiguity, with a scaffold N50 size of 28.59 Mb. Using this genome, we have found that extensive expansion of Gypsy elements in P. euphratica led to its rapid increase in genome size compared to any other Salicaceae species studied to date, and potentially contributed to adaptive divergence driven by insertions near genes involved in stress tolerance. We also detected a wide range of unique structural rearrangements in P. euphratica, including 2,549 translocations, 454 inversions, 121 tandem and 14 segmental duplications. Several key genes likely to be involved in tolerance to abiotic stress were identified within these regions. This high-quality genome represents a valuable resource for poplar breeding and genetic improvement in the future, as well as comparative genomic analysis with other Salicaceae species.

Journal ArticleDOI
TL;DR: Overall, it is found that historical bird museum specimens contain substantial amounts of DNA for genomic studies under most extraction scenarios, but that a phenol–chloroform protocol consistently provides the high quantities of DNA required for most current genomic protocols.
Abstract: Next-generation sequencing has greatly expanded the utility and value of museum collections by revealing specimens as genomic resources. As the field of museum genomics grows, so does the need for extraction methods that maximize DNA yields. For avian museum specimens, the established method of extracting DNA from toe pads works well for most specimens. However, for some specimens, especially those of birds that are very small or very large, toe pads can be a poor source of DNA. In this study, we apply two DNA extraction methods (phenol-chloroform and silica column) to three different sources of DNA (toe pad, skin punch and bone) from 10 historical avian museum specimens. We show that a modified phenol-chloroform protocol yielded significantly more DNA than a silica column protocol (e.g., Qiagen DNeasy Blood & Tissue Kit) across all tissue types. However, extractions using the silica column protocol contained longer fragments on average than those using the phenol-chloroform protocol, probably as a result of loss of small fragments through the silica column. While toe pads yielded more DNA than skin punches and bone fragments, skin punches proved to be a reliable alternative source of DNA and might be especially appealing when toe pad extractions are impractical. Overall, we found that historical bird museum specimens contain substantial amounts of DNA for genomic studies under most extraction scenarios, but that a phenol-chloroform protocol consistently provides the high quantities of DNA required for most current genomic protocols.

Journal ArticleDOI
TL;DR: The orthologous analysis on those gene families associated with animal cold tolerance provided the first genomic evidence revealing specific cold‐tolerant strategies in C. suppressalis, including those involved in glucose‐originated glycerol biosynthesis, triacylglycerol‐originate glycerolsynthesis, fatty acid synthesis and trehalose transport‐intermediate cold tolerance.
Abstract: The rice stem borer, Chilo suppressalis, is one of the most damaging insect pests to rice production worldwide. Although C. suppressalis has been the focus of numerous studies examining cold tolerance and diapause, plant-insect interactions, pesticide targets and resistance, and the development of RNAi-mediated pest management, the absence of a high-quality genome has limited deeper insights. To address this limitation, we generated a draft C. suppressalis genome constructed from both Illumina and PacBio sequences. The assembled genome size was 824.35 Mb with a contig N50 of 307 kb and a scaffold N50 of 1.75 Mb. Hi-C scaffolding assigned 99.2% of the bases to one of 29 chromosomes. Based on universal single-copy orthologues (BUSCO), the draft genome assembly was estimated to be 97% complete and is predicted to encompass 15,653 protein-coding genes. Cold tolerance is an extreme survival strategy found in animals. However, little is known regarding the genetic basis of the winter ecology of C. suppressalis. Here, we focused our orthologous analysis on those gene families associated with animal cold tolerance. Our finding provided the first genomic evidence revealing specific cold-tolerant strategies in C. suppressalis, including those involved in glucose-originated glycerol biosynthesis, triacylglycerol-originated glycerol biosynthesis, fatty acid synthesis and trehalose transport-intermediate cold tolerance. The high-quality C. suppressalis genome provides a valuable resource for research into a broad range of areas in molecular ecology, and subsequently benefits developing modern pest control strategies.

Journal ArticleDOI
TL;DR: A standardized approach is necessary to analyse microbiomes within and between source compartments along food chains in the context of the One Health framework, and comparing microbiomes using 515F‐806R revealed that soil and root samples have the highest estimates of species richness, while lowest richness was observed in human faeces.
Abstract: The 'One Health' framework emphasizes the ecological relationships between soil, plant, animal and human health. Microbiomes play important roles in these relationships, as they modify the health and performance of the different compartments and influence the transfer of energy, matter and chemicals between them. Standardized methods to characterize microbiomes along food chains are, however, currently lacking. To address this methodological gap, we evaluated the performance of DNA extraction kits and commonly recommended primer pairs targeting different hypervariable regions (V3-V4, V4, V5-V6, V5-V6-V7) of the 16S rRNA gene, on microbiome samples along a model food chain, including soils, maize roots, cattle rumen, and cattle and human faeces. We also included faeces from gnotobiotic mice colonized with defined bacterial taxa and mock communities to confirm the robustness of our molecular and bioinformatic approaches on these defined low microbial diversity samples. Based on Amplicon Sequence Variants, the primer pair 515F-806R led to the highest estimates of species richness and diversity in all sample types and offered maximum diversity coverage of reference databases in in silico primer analysis. The influence of the DNA extraction kits was negligible compared to the influence of the choice of primer pairs. Comparing microbiomes using 515F-806R revealed that soil and root samples have the highest estimates of species richness, while lowest richness was observed in human faeces. Primer pair choice directly influenced the estimation of community changes within and across compartments and may give rise to preferential detection of specific taxa. This work demonstrates why a standardized approach is necessary to analyse microbiomes within and between source compartments along food chains in the context of the One Health framework.

Journal ArticleDOI
TL;DR: A specialized target‐capture probe set for spiders that contains over 2,000 ultraconserved elements (UCEs) is developed and monophyly of the ‘symphytognathoids’ (the miniature orb weavers), which in previous studies has only been supported by a combination of morphological and behavioural characters is suggested.
Abstract: Phylogenomic methods have proven useful for resolving deep nodes and recalcitrant groups in the spider tree of life. Across arachnids, transcriptomic approaches may generate thousands of loci, and target-capture methods, using the previously designed arachnid-specific probe set, can target a maximum of about 1,000 loci. Here, we develop a specialized target-capture probe set for spiders that contains over 2,000 ultraconserved elements (UCEs) and then demonstrate the utility of this probe set through sequencing and phylogenetic analysis. We designed the 'spider-specific' probe set using three spider genomes (Loxosceles, Parasteatoda and Stegodyphus) and ensured that the newly designed probe set includes UCEs from the previously designed Arachnida probe set. The new 'spider-specific' probes were used to sequence UCE loci in 51 specimens. The remaining samples included five spider genomes and taxa that were enriched using Arachnida probe set. The 'spider-specific' probes were also used to gather loci from a total of 84 representative taxa across Araneae. On mapping these 84 taxa to the Arachnida probe set, we captured at most 710 UCE loci, while the spider-specific probe set captured up to 1,547 UCE loci from the same taxon sample. Phylogenetic analyses using maximum likelihood and coalescent methods corroborate most nodes resolved by recent transcriptomic analyses, but not all (e.g. UCE data suggest monophyly of 'symphytognathoids'). Our preferred hypothesis based on topology tests, suggests monophyly of the 'symphytognathoids' (the miniature orb weavers), which in previous studies has only been supported by a combination of morphological and behavioural characters.

Journal ArticleDOI
TL;DR: The phylogenetic analysis showed that P. japonica diverged from the ancestor of Anoplophora glabripennis and Tribolium castaneum ~ 236.21 million years ago, and some important gene families involved in detoxification of pesticides and tolerance to heat stress were expanded in P. Japonica.
Abstract: The ladybird beetle Propylea japonica is an important natural enemy in agro-ecological systems. Studies on the strong tolerance of P. japonica to high temperatures and insecticides, and its population and phenotype diversity have recently increased. However, abundant genome resources for obtaining insights into stress-resistance mechanisms and genetic intra-species diversity for P. japonica are lacking. Here, we constructed the P. japonica genome maps using Pacific Bioscience (PacBio) and Illumina sequencing technologies. The genome size was 850.90 Mb with a contig N50 of 813.13 kb. The Hi-C sequence data were used to upgrade draft genome assemblies; 4,777 contigs were assembled to 10 chromosomes; and the final draft genome assembly was 803.93 Mb with a contig N50 of 813.98 kb and a scaffold N50 of 100.34 Mb. Approximately 495.38 Mb of repeated sequences was annotated. The 18,018 protein-coding genes were predicted, of which 95.78% were functionally annotated, and 1,407 genes were species-specific. The phylogenetic analysis showed that P. japonica diverged from the ancestor of Anoplophora glabripennis and Tribolium castaneum ~ 236.21 million years ago. We detected that some important gene families involved in detoxification of pesticides and tolerance to heat stress were expanded in P. japonica, especially cytochrome P450 and Hsp70 genes. Overall, the high-quality draft genome sequence of P. japonica will provide invaluable resource for understanding the molecular mechanisms of stress resistance and will facilitate the research on population genetics, evolution and phylogeny of Coccinellidae. This genome will also provide new avenues for conserving the diversity of predator insects.

Journal ArticleDOI
TL;DR: Comparability of findings from qPCR‐based telomere studies is hampered by such measurement results being assay‐specific, precluding a direct quantitative comparisons of observed differences and/or slopes of associations between studies.
Abstract: Comparability of findings from qPCR-based telomere studies is hampered by such measurement results being assay-specific, precluding a direct quantitative comparisons of observed differences and/or slopes of associations between studies. It is proposed that this can be partially alleviated by expressing qPCR-based telomere data as Z-scores.

Journal ArticleDOI
TL;DR: The development of the first piscine epigenetic clock is reported, paving the way for similar studies in other species, and indicates that the clock is able to predict the chronological age independently of environmentally‐driven perturbations.
Abstract: Age-related changes in DNA methylation do occur. Taking advantage of this, mammalian and avian epigenetic clocks have been constructed to predict age. In fish, studies on age-related DNA methylation changes are scarce and no epigenetic clocks have been constructed. However, in fisheries and population dynamics studies there is a need for accurate estimation of age, something that is often impossible for some economically important species with the currently available methods. Here, we used the European sea bass, a marine fish the age of which can be determined with accuracy, to construct a piscine epigenetic clock, the first one in a cold-blooded vertebrate. We used targeted bisulfite sequencing to amplify 48 CpGs from four genes in muscle samples and applied penalized regressions to predict age. We thus developed an age predictor in fish that is highly accurate (0.824) and precise (2.149 years). In juvenile fish, accelerated growth due to elevated temperatures had no effect on age prediction, indicating that the clock is able to predict the chronological age independently of environmentally-driven perturbations. An epigenetic clock developed using muscle samples accurately predicted age in samples of testis but not ovaries, possibly reflecting the reproductive biology of fish. In conclusion, we report the development of the first piscine epigenetic clock, paving the way for similar studies in other species. Piscine epigenetic clocks should be of great utility for fisheries management and conservation purposes, where age determination is of crucial importance.

Journal ArticleDOI
TL;DR: The genome and transcriptome of C. hongkongensis provide valuable resources for future molecular studies, genetic improvement and genome‐assisted breeding of oysters, and potential genes and pathways related to sex determination and gonad development were identified.
Abstract: Crassostrea hongkongensis is a popular and important native oyster species that is cultured mainly along the coast of the South China Sea. However, the absence of a reference genome has restricted genetic studies and the development of molecular breeding schemes for this species. Here, we combined PacBio and 10 × Genomics technologies to create a C. hongkongensis genome assembly, which has a size of 610 Mb, and is close to that estimated by flow cytometry (~650 Mb). Contig and scaffold N50 are 2.57 and 4.99 Mb, respectively, and BUSCO analysis indicates that 95.8% of metazoan conserved genes are completely represented. Using a high-density linkage map of its closest related species, C. gigas, a total of 521 Mb (85.4%) was anchored to 10 haploid chromosomes. Comparative genomic analyses with other molluscs reveal that several immune- or stress response-related genes extensively expanded in bivalves by tandem duplication, including C1q, Toll-like receptors and Hsp70, which are associated with their adaptation to filter-feeding and sessile lifestyles in shallow sea and/or deep-sea ecosystems. Through transcriptome sequencing, potential genes and pathways related to sex determination and gonad development were identified. The genome and transcriptome of C. hongkongensis provide valuable resources for future molecular studies, genetic improvement and genome-assisted breeding of oysters.

Journal ArticleDOI
TL;DR: A DNA sequence‐capture method was tested in parallel with the metabarcoding approach to reveal possible advantages of one method over the other, suggesting that clustering reads into OTUs could bias diversity richness especially using 18S with relaxed thresholds.
Abstract: Environmental DNA studies targeting multiple taxa using metabarcoding provide remarkable insights into levels of species diversity in any habitat. The main drawbacks are the presence of primer bias and difficulty in identifying rare species. We tested a DNA sequence-capture method in parallel with the metabarcoding approach to reveal possible advantages of one method over the other. Both approaches were performed using the same eDNA samples and the same 18S and COI regions, followed by high throughput sequencing. Metabarcoded eDNA libraries were PCR amplified with one primer pair from 18S and COI genes. DNA sequence-capture libraries were enriched with 3,639 baits targeting the same gene regions. We tested amplicon sequence variants (ASVs) and operational taxonomic units (OTUs) in silico approaches for both markers and methods, using for this purpose the metabarcoding data set. ASVs methods uncovered more species for the COI gene, whereas the opposite occurred for the 18S gene, suggesting that clustering reads into OTUs could bias diversity richness especially using 18S with relaxed thresholds. Additionally, metabarcoding and DNA sequence-capture recovered 80%-90% of the control sample species. DNA sequence-capture was 8x more expensive, nonetheless it identified 1.5x more species for COI and 13x more genera for 18S than metabarcoding. Both approaches offer reliable results, sharing ca. 40% species and 72% families and retrieve more taxa when nuclear and mitochondrial markers are combined. eDNA metabarcoding is quite well established and low-cost, whereas DNA-sequence capture for biodiversity assessment is still in its infancy, is more time-consuming but provides more taxonomic assignments.