scispace - formally typeset
Search or ask a question

Showing papers on "Hybrid genome assembly published in 2019"


Journal ArticleDOI
TL;DR: This study evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences data sets from different taxonomic categories with considerable differences in genome size to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects.
Abstract: Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms.

77 citations


Journal ArticleDOI
TL;DR: A marked expansion of gene families related to histone and the hormone auxin but loss of disease resistance genes in P. alba if compared with the closely related P. trichocarpa.
Abstract: Populus alba is widely distributed and cultivated in Europe and Asia. This species has been used for diverse studies. In this study, we assembled a de novo genome sequence of P. alba var. pyramidalis (= P. bolleana) and confirmed its high transformation efficiency and short transformation time by experiments. Through a process of hybrid genome assembly, a total of 464 M of the genome was assembled. Annotation analyses predicted 37 901 protein-coding genes. This genome is highly collinear to that of P. trichocarpa, with most genes having orthologs in the two species. We found a marked expansion of gene families related to histone and the hormone auxin but loss of disease resistance genes in P. alba if compared with the closely related P. trichocarpa. The genome sequence presented here represents a valuable resource for further molecular functional analyses of this species as a new tree model, poplar breeding practices and comparative genomic analyses across different poplars.

55 citations


Journal ArticleDOI
TL;DR: BrownieCorrector, an error correction tool for Illumina sequencing data that focuses on the correction of only those reads that overlap short DNA patterns that are highly repetitive in the genome, leads to the best assembly results in most cases even though less than 2% of the reads within a dataset are corrected.
Abstract: Several standalone error correction tools have been proposed to correct sequencing errors in Illumina data in order to facilitate de novo genome assembly. However, in a recent survey, we showed that state-of-the-art assemblers often did not benefit from this pre-correction step. We found that many error correction tools introduce new errors in reads that overlap highly repetitive DNA regions such as low-complexity patterns or short homopolymers, ultimately leading to a more fragmented assembly. We propose BrownieCorrector, an error correction tool for Illumina sequencing data that focuses on the correction of only those reads that overlap short DNA patterns that are highly repetitive in the genome. BrownieCorrector extracts all reads that contain such a pattern and clusters them into different groups using a community detection algorithm that takes into account both the sequence similarity between overlapping reads and their respective paired-end reads. Each cluster holds reads that originate from the same genomic region and hence each cluster can be corrected individually, thus providing a consistent correction for all reads within that cluster. BrownieCorrector is benchmarked using six real Illumina datasets for different eukaryotic genomes. The prior use of BrownieCorrector improves assembly results over the use of uncorrected reads in all cases. In comparison with other error correction tools, BrownieCorrector leads to the best assembly results in most cases even though less than 2% of the reads within a dataset are corrected. Additionally, we investigate the impact of error correction on hybrid assembly where the corrected Illumina reads are supplemented with PacBio data. Our results confirm that BrownieCorrector improves the quality of hybrid genome assembly as well. BrownieCorrector is written in standard C++11 and released under GPL license. BrownieCorrector relies on multithreading to take advantage of multi-core/multi-CPU systems. The source code is available at https://github.com/biointec/browniecorrector .

21 citations


Journal ArticleDOI
TL;DR: The D. silvatica assembly is the first representative of the superfamily Dysderoidea, and just the second available genome of Synspermiata, one of the major evolutionary lineages of the “true spiders” (Araneomorphae).
Abstract: Background We present the draft genome sequence of Dysdera silvatica, a nocturnal ground-dwelling spider from a genus that has undergone a remarkable adaptive radiation in the Canary Islands. Results The draft assembly was obtained using short (Illumina) and long (PaciBio and Nanopore) sequencing reads. Our de novo assembly (1.36 Gb), which represents 80% of the genome size estimated by flow cytometry (1.7 Gb), is constituted by a high fraction of interspersed repetitive elements (53.8%). The assembly completeness, using BUSCO and core eukaryotic genes, ranges from 90% to 96%. Functional annotations based on both ab initio and evidence-based information (including D. silvatica RNA sequencing) yielded a total of 48,619 protein-coding sequences, of which 36,398 (74.9%) have the molecular hallmark of known protein domains, or sequence similarity with Swiss-Prot sequences. The D. silvatica assembly is the first representative of the superfamily Dysderoidea, and just the second available genome of Synspermiata, one of the major evolutionary lineages of the "true spiders" (Araneomorphae). Conclusions Dysderoids, which are known for their numerous instances of adaptation to underground environments, include some of the few examples of trophic specialization within spiders and are excellent models for the study of cryptic female choice. This resource will be therefore useful as a starting point to study fundamental evolutionary and functional questions, including the molecular bases of the adaptation to extreme environments and ecological shifts, as well of the origin and evolution of relevant spider traits, such as the venom and silk.

20 citations


Journal ArticleDOI
01 Nov 2019
TL;DR: Complete IS elements could be identified upstream of AMR genes; however, there was not complete correlation between the absence of IS elements and antimicrobial susceptibility, and further research is needed before implementing AMR prediction for B. fragilis from whole-genome sequencing.
Abstract: Bacteroides fragilis constitutes a significant part of the normal human gut microbiota and can also act as an opportunistic pathogen. Antimicrobial resistance (AMR) and the prevalence of AMR genes are increasing, and prediction of antimicrobial susceptibility based on sequence information could support targeted antimicrobial therapy in a clinical setting. Complete identification of insertion sequence (IS) elements carrying promoter sequences upstream of resistance genes is necessary for prediction of AMR. However, de novo assemblies from short reads alone are often fractured due to repeat regions and the presence of multiple copies of identical IS elements. Identification of plasmids in clinical isolates can aid in the surveillance of the dissemination of AMR, and comprehensive sequence databases support microbiome and metagenomic studies. We tested several short-read, hybrid and long-lead assembly pipelines by assembling the type strain B. fragilis CCUG4856T (=ATCC25285=NCTC9343) with Illumina short reads and long reads generated by Oxford Nanopore Technologies (ONT) MinION sequencing. Hybrid assembly with Unicycler, using quality filtered Illumina reads and Filtlong filtered and Canu-corrected ONT reads, produced the assembly of highest quality. This approach was then applied to six clinical multidrug-resistant B. fragilis isolates and, with minimal manual finishing of chromosomal assemblies of three isolates, complete, circular assemblies of all isolates were produced. Eleven circular, putative plasmids were identified in the six assemblies, of which only three corresponded to a known cultured Bacteroides plasmid. Complete IS elements could be identified upstream of AMR genes; however, there was not complete correlation between the absence of IS elements and antimicrobial susceptibility. As our knowledge on factors that increase expression of resistance genes in the absence of IS elements is limited, further research is needed prior to implementing AMR prediction for B. fragilis from whole-genome sequencing.

20 citations


Journal ArticleDOI
TL;DR: This dataset will pave the way for molecular research and targeted genetic manipulation of this novel model organism, Danionella translucida, as a model organism for investigating neural network interactions in adult individuals.
Abstract: Studying neuronal circuits at cellular resolution is very challenging in vertebrates due to the size and optical turbidity of their brains. Danionella translucida, a close relative of zebrafish, was recently introduced as a model organism for investigating neural network interactions in adult individuals. Danionella remains transparent throughout its life, has the smallest known vertebrate brain and possesses a rich repertoire of complex behaviours. Here we sequenced, assembled and annotated the Danionella translucida genome employing a hybrid Illumina/Nanopore read library as well as RNA-seq of embryonic, larval and adult mRNA. We achieved high assembly continuity using low-coverage long-read data and annotated a large fraction of the transcriptome. This dataset will pave the way for molecular research and targeted genetic manipulation of this novel model organism.

20 citations


Journal ArticleDOI
TL;DR: It is shown that even low coverage of long reads can add significantly to overall genome contiguity and may provide a mechanistic explanation for the high diversity observed in C. levior CHC profiles.
Abstract: The success of social insects is largely intertwined with their highly advanced chemical communication system that facilitates recognition and discrimination of species and nest-mates, recruitment, and division of labor. Hydrocarbons, which cover the cuticle of insects, not only serve as waterproofing agents but also constitute a major component of this communication system. Two cryptic Crematogaster species, which share their nest with Camponotus ants, show striking diversity in their cuticular hydrocarbon (CHC) profile. This mutualistic system therefore offers a great opportunity to study the genetic basis of CHC divergence between sister species. As a basis for further genome-wide studies high-quality genomes are needed. Here, we present the annotated draft genome for Crematogaster levior A. By combining the three most commonly used sequencing techniques-Illumina, PacBio, and Oxford Nanopore-we constructed a high-quality de novo ant genome. We show that even low coverage of long reads can add significantly to overall genome contiguity. Annotation of desaturase and elongase genes, which play a role in CHC biosynthesis revealed one of the largest repertoires in ants and a higher number of desaturases in general than in other Hymenoptera. This may provide a mechanistic explanation for the high diversity observed in C. levior CHC profiles.

11 citations


Posted ContentDOI
03 Feb 2019-bioRxiv
TL;DR: The Danionella translucida genome is sequenced, assembled and annotated employing a hybrid Illumina/Nanopore read library as well as RNA-seq of embryonic, larval and adult mRNA and a large fraction of the transcriptome is annotated.
Abstract: Studying the activity of distributed neuronal circuits at a cellular resolution in vertebrates is very challenging due to the size and optical turbidity of their brains. We recently presented Danionella translucida, a close relative of zebrafish, as a model organism suited for studying large-scale neural network interactions in adult individuals. Danionella remains transparent throughout its life, has the smallest known vertebrate brain and possesses a rich repertoire of complex behaviours. Here we sequenced, assembled and annotated the Danionella translucida genome employing a hybrid Illumina/Nanopore read library as well as RNA-seq of embryonic, larval and adult mRNA. We achieved high assembly continuity using low-coverage long-read data and annotated a large fraction of the transcriptome. This dataset will pave the way for molecular research and targeted genetic manipulation of the smallest known vertebrate brain.

10 citations


Posted ContentDOI
28 Aug 2019-bioRxiv
TL;DR: This study reviews and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a non-model cactophilic Drosophile, D. mojavensis, and shows that it is possible to achieve excellent contiguity on this non- model organism using the DBG2OLC pipeline.
Abstract: The emergence of third generation sequencing (3GS; long-reads) is making closer the goal of chromosome-size fragments in de novo genome assemblies. This allows the exploration of new and broader questions on genome evolution for a number of non-model organisms. However, long-read technologies result in higher sequencing error rates and therefore impose an elevated cost of sufficient coverage to achieve high enough quality. In this context, hybrid assemblies, combining short-reads and long-reads provide an alternative efficient and cost-effective approach to generate de novo, chromosome-level genome assemblies. The array of available software programs for hybrid genome assembly, sequence correction and manipulation is constantly being expanded and improved. This makes it difficult for non-experts to find efficient, fast and tractable computational solutions for genome assembly, especially in the case of non-model organisms lacking a reference genome or one from a closely related species. In this study, we review and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a non-model cactophilic Drosophila, D. mojavensis. We show that it is possible to achieve excellent contiguity on this non-model organism using the DBG2OLC pipeline.

5 citations


Posted ContentDOI
11 Jun 2019-bioRxiv
TL;DR: A hybrid genome assembly from short and long sequence reads is presented that make C. riparius’ genome one of the most contiguous Dipteran genomes published, the first complete mitochondrial genome of the species and the respective recombination rate as one- of the first insect recombination rates at all.
Abstract: Background Chironomus riparius is of great importance as a study species in various fields like ecotoxicology, molecular genetics, developmental biology and ecology. However, only a fragmented draft genome exists to date, hindering the recent rush of population genomic studies in this species. Findings Making use of 50 NGS datasets, we present a hybrid genome assembly from short and long sequence reads that make C. riparius’ genome one of the most contiguous Dipteran genomes published, the first complete mitochondrial genome of the species and the respective recombination rate as one of the first insect recombination rates at all. Conclusions The genome and associated resources will be highly valuable to the broad community working with dipterans in general and chironomids in detail. The estimated recombination rate will help evolutionary biologist gain a better understanding of commonalities and differences of genomic patterns in insects.

5 citations


Journal ArticleDOI
TL;DR: Multiple IS family transposases specific for all non-fermenting Gram-negative bacteria (NFGNBs)—especially IS3 and IS5, which facilitate mobilization of extended-spectrum β-lactamase (ESBL) and carbapenemase genes—were carried in these genomes, which adds to the complexity of gene transmission.
Abstract: Melioidosis caused by Burkholderia pseudomallei has become an important clinical threat, especially in Northern Australia and Southeast Asia. However, the genome information on this pathogen is limited. B. pseudomallei isolates identified from bloodstream infections from inpatients were subjected to whole-genome sequencing by IonTorrent PGM and MinION Oxford Nanopore sequencing technologies. Highly accurate complete genomes of two strains, VB3253 and VB2514, were obtained by a hybrid genome assembly method using both short and long DNA reads. Both isolates carried blaPenI and carbapenemase-encoding blaOXA-57 genes, although the isolates were susceptible to imipenem by E-test method with MIC 1 μg/mL. Multiple IS family transposases specific for all non-fermenting Gram-negative bacteria (NFGNBs)—especially IS3 and IS5, which facilitate mobilization of extended-spectrum β-lactamase (ESBL) and carbapenemase genes—were carried in these genomes. This further adds to the complexity of gene transmission. These IS families were identified only upon hybrid genome assembly and would otherwise be missed.

Posted ContentDOI
18 Mar 2019-bioRxiv
TL;DR: In this paper, the authors sequenced, assembled and annotated the Danionella translucida genome employing a hybrid Illumina/ Nanopore read library as well as RNA-seq of embryonic, larval and adult mRNA.
Abstract: Studying neuronal circuits at cellular resolution is very challenging in vertebrates due to the size and optical turbidity of their brains. Danionella translucida, a close relative of zebrafish, was recently introduced as a model organism for investigating neural network interactions in adult individuals. Danionella remains transparent throughout its life, has the smallest known vertebrate brain and possesses a rich repertoire of complex behaviours. Here we sequenced, assembled and annotated the Danionella translucida genome employing a hybrid Illumina/ Nanopore read library as well as RNA-seq of embryonic, larval and adult mRNA. We achieved high assembly continuity using low-coverage long-read data and annotated a large fraction of the transcriptome. This dataset will pave the way for molecular research and targeted genetic manipulation of this novel model organism.

Posted ContentDOI
24 Jun 2019-bioRxiv
TL;DR: Ultraplexing requires the availability of Illumina data and uses inter-sample genetic variability to assign reads to isolates, which obviates the need for molecular barcoding, and can enable significant sequencing and labor cost reductions in large-scale bacterial genome projects.
Abstract: Hybrid genome assembly has emerged as an important technique in bacterial genomics, but cost and labor requirements limit large-scale application. We present Ultraplexing, a method to improve per-sample sequencing cost and hands-on-time of Nanopore sequencing for hybrid assembly by at least 50%, compared to molecular barcoding while maintaining high assembly quality (Quality Value; QV ≥ 42). Ultraplexing requires the availability of Illumina data and uses inter-sample genetic variability to assign reads to isolates, which obviates the need for molecular barcoding. Thus, Ultraplexing can enable significant sequencing and labor cost reductions in large-scale bacterial genome projects.