scispace - formally typeset
Search or ask a question

Showing papers on "Hybrid genome assembly published in 2020"


Journal ArticleDOI
TL;DR: A hybrid genome assembly from short and long sequence reads is presented that make C. riparius’ genome one of the most contiguous Dipteran genomes published, the first complete mitochondrial genome of the species, and the respective recombination rate among the first insect recombination rates at all.
Abstract: Chironomus riparius is of great importance as a study species in various fields like ecotoxicology, molecular genetics, developmental biology and ecology. However, only a fragmented draft genome exists to date, hindering the recent rush of population genomic studies in this species. Making use of 50 NGS datasets, we present a hybrid genome assembly from short and long sequence reads that make C. riparius' genome one of the most contiguous Dipteran genomes published, the first complete mitochondrial genome of the species, and the respective recombination rate among the first insect recombination rates at all. The genome assembly and associated resources will be highly valuable to the broad community working with dipterans in general and chironomids in particular. The estimated recombination rate will help evolutionary biologists gaining a better understanding of commonalities and differences of genomic patterns in insects.

19 citations


Journal ArticleDOI
TL;DR: The complete genome sequence of K. pneumoniae KP58 is reported, a pandrug-resistant Klebsiella pneumoniae strain that exhibits high levels of resistance to colistin and tigecycline in China.
Abstract: Background The prevalence of multidrug-resistant Klebsiella pneumoniae is increasingly being implicated worldwide in a variety of infections with high mortalities. Here, we report the complete genome sequence of K. pneumoniae strain KP58, a pandrug-resistant K. pneumoniae strain that exhibits high levels of resistance to colistin and tigecycline in China. Methods The K. pneumoniae strain KP58 was recovered from a urine sample of a female patient hospitalized in a tertiary hospital in Hangzhou, China. Antimicrobial susceptibility testing was performed and the minimum inhibitory concentrations (MICs) were determined. Whole-genome sequencing was performed using Illumina and Oxford nanopore sequencing technologies. Genomic features, antimicrobial resistance genes and virulence genes were comprehensively analysed by various bioinformatics approaches. In addition, genomic epidemiological and phylogenetic analyses of K. pneumoniae KP58 and closely related isolates were performed using the core genome multilocus sequence typing (cgMLST) analysis in BacWGSTdb, an online bacterial whole-genome sequence typing and source tracking database. Results K. pneumoniae KP58 was resistant to all antimicrobial agents tested, including tigecycline and colistin. Combining the two sequencing technologies allowed a high-quality complete genome sequence of K. pneumoniae KP58 comprising one circular chromosome and five circular plasmids to be obtained. This strain harbours a variety of acquired antimicrobial resistance and virulence determinants. It also carried an ISKpn26-like insertion in the disrupted mgrB gene, which confers colistin resistance. The tigecycline resistance was associated with overexpression of the AcrAB efflux system. The closest relative of K. pneumoniae KP58 was another clinical isolate recovered from Hangzhou that differed by only 10 cgMLST loci. Conclusion The dataset presented in this study provides essential insights into the evolution of antimicrobial-resistant K. pneumoniae in hospital settings and assists in the development of effective control strategies. Appropriate surveillance and control measures are essential to prevent its further dissemination.

16 citations


Posted ContentDOI
23 Jun 2020-bioRxiv
TL;DR: The genome sequence of blackgram will facilitate identification of agronomically important genes and accelerate the genetic improvement of black Gram, an important Asiatic legume crop.
Abstract: Blackgram [Vigna mungo (L.) Hepper] (2n = 2x = 22), an important Asiatic legume crop, is a major source of dietary protein for the predominantly vegetarian population. Here we construct a draft genome sequence of blackgram, for the first time, by employing hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. The final de novo whole genome of blackgram is ~ 475 Mb (82 % of the genome) and has maximum scaffold length of 6.3 Mb with scaffold N50 of1.42 Mb. Genome analysis identified 18655 genes with mean coding sequence length of 970bp. Around 96.7 % of predicted genes were annotated. Nearly half of the assembled sequence is composed of repetitive elements with retrotransposons as major (47.3% of genome) transposable elements, whereas, DNA transposons made up only 2.29% of the genome. A total of 166014 SSRs, including 65180 compound SSRs, were identified and primer pairs for 34816 SSRs were designed. Out of the 18665 proteins, 678 proteins showed presence of R-gene related domains. KIN class was found in majority of the proteins (372) followed by RLK (79) and N (79). The genome sequence of blackgram will facilitate identification of agronomically important genes and accelerate the genetic improvement of blackgram.

15 citations


Journal ArticleDOI
TL;DR: The genome and transcriptome presented here provides an essential resource for comparative genomics of the commercially relevant genus Trichogramma, but also for research into molecular evolution, ecology, and breeding of T. brassicae.
Abstract: Trichogramma brassicae (Bezdenko) are egg parasitoids that are used throughout the world as biological control agents and in laboratories as model species. Despite this ubiquity, few genetic resources exist beyond COI, ITS2, and RAPD markers. Aided by a Wolbachia infection, a wild-caught strain from Germany was reared for low heterozygosity and sequenced in a hybrid de novo strategy, after which several assembling strategies were evaluated. The best assembly, derived from a DBG2OLC-based pipeline, yielded a genome of 235 Mbp made up of 1,572 contigs with an N50 of 556,663 bp. Following a rigorous ab initio-, homology-, and evidence-based annotation, 16,905 genes were annotated and functionally described. As an example of the utility of the genome, a simple ortholog cluster analysis was performed with sister species T. pretiosum, revealing over 6000 shared clusters and under 400 clusters unique to each species. The genome and transcriptome presented here provides an essential resource for comparative genomics of the commercially relevant genus Trichogramma, but also for research into molecular evolution, ecology, and breeding of T. brassicae.

15 citations


Journal ArticleDOI
TL;DR: This study reviews and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a nonmodel cactophilic Drosophile, and shows that it is possible to achieve excellent contiguity on this nonmodel organism using the dbg2olc pipeline.
Abstract: The emergence of third-generation sequencing (3GS; long-reads) is bringing closer the goal of chromosome-size fragments in de novo genome assemblies. This allows the exploration of new and broader questions on genome evolution for a number of nonmodel organisms. However, long-read technologies result in higher sequencing error rates and therefore impose an elevated cost of sufficient coverage to achieve high enough quality. In this context, hybrid assemblies, combining short-reads and long-reads, provide an alternative efficient and cost-effective approach to generate de novo, chromosome-level genome assemblies. The array of available software programs for hybrid genome assembly, sequence correction and manipulation are constantly being expanded and improved. This makes it difficult for nonexperts to find efficient, fast and tractable computational solutions for genome assembly, especially in the case of nonmodel organisms lacking a reference genome or one from a closely related species. In this study, we review and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a nonmodel cactophilic Drosophila, D. mojavensis. We show that it is possible to achieve excellent contiguity on this nonmodel organism using the dbg2olc pipeline.

14 citations


Journal ArticleDOI
TL;DR: The genomes of Citrus unshiu and Poncirus trifoliata are assembled, using hybrid de novo assembly of Illumina and PacBio sequence data, and the Mikan Genome Database (MiGD) is developed, an integrated database of genome annotation, genetic diversity, and Cleaved Amplified Polymorphic Sequence (CAPS) marker information.
Abstract: Citrus species are some of the most valuable and widely consumed fruits globally. The genome sequences of representative citrus (e.g., Citrus clementina, C. sinensis, C. grandis) species have been released but the research base for mandarin molecular breeding is still poor. We assembled the genomes of Citrus unshiu and Poncirus trifoliata, two important species for citrus industry in Japan, using hybrid de novo assembly of Illumina and PacBio sequence data, and developed the Mikan Genome Database (MiGD). The assembled genome sizes of C. unshiu and P. trifoliata are 346 and 292 Mb, respectively, similar to those of citrus species in public databases; they are predicted to possess 41,489 and 34,333 protein-coding genes in their draft genome sequences, with 9,642 and 8,377 specific genes when compared to C. clementina, respectively. MiGD is an integrated database of genome annotation, genetic diversity, and Cleaved Amplified Polymorphic Sequence (CAPS) marker information, with these contents being mutually linked by genes. MiGD facilitates access to genome sequences of interest from previously reported linkage maps through CAPS markers and obtains polymorphism information through the multiple genome browser TASUKE. The genomic resources in MiGD (https://mikan.dna.affrc.go.jp) could provide valuable information for mandarin molecular breeding in Japan.

11 citations


Journal ArticleDOI
TL;DR: This new genome draft improves current genomic resources available for M. laxa and represents a useful tool for further research into its interactions with host plants and into evolution in the Monilinia genus.
Abstract: Monilinia laxa is the causal agent of brown rot on stone fruit, and it can cause heavy yield losses during field production and postharvest storage. This article reports the draft genome assembly of the M. laxa Mlax316 strain, obtained using a hybrid genome assembly with both Illumina short-reads and PacBio long-reads sequencing technologies. The complete draft genome consists of 49 scaffolds with total size of 42.81 Mb, and scaffold N50 of 2,449.4 kb. Annotation of the M. laxa assembly identified 11,163 genes and 12,424 proteins which were functionally annotated. This new genome draft improves current genomic resources available for M. laxa and represents a useful tool for further research into its interactions with host plants and into evolution in the Monilinia genus.

11 citations


Posted ContentDOI
06 Apr 2020-bioRxiv
TL;DR: The genome and transcriptome presented here provides an essential resource for comparative genomics of the commercially relevant genus Trichogramma, but also for research into molecular evolution, ecology, and breeding of T. brassicae.
Abstract: Trichogramma brassicae (Bezdenko) are egg parasitoids that are used throughout the world as biological control agents and in laboratories as model species. Despite this ubiquity, few genetic resources exist beyond COI, ITS2, and RAPD markers. Aided by a Wolbachia infection, a wild-caught strain from Germany was reared for low heterozygosity and sequenced in a hybrid de novo strategy, after which several assembling strategies were evaluated. The best assembly, derived from a DBG2OLC-based pipeline, yielded a genome of 235 Mbp made up of 1,572 contigs with an N50 of 556,663 bp. Following a rigorous ab initio-, homology-, and evidence-based annotation, 16,905 genes were annotated and functionally described. As an example of the utility of the genome, a simple ortholog cluster analysis was performed with sister species T. pretiosum, revealing over 6000 shared clusters and under 400 clusters unique to each species. The genome and transcriptome presented here provides an essential resource for comparative genomics of the commercially relevant genus Trichogramma, but also for research into molecular evolution, ecology, and breeding of T. brassicae.

8 citations


Posted ContentDOI
23 Jun 2020-bioRxiv
TL;DR: In this paper, the authors constructed a draft genome sequence of blackgram, for the first time, by employing hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology.
Abstract: Blackgram [Vigna mungo (L.) Hepper] (2n = 2x = 22), an important Asiatic legume crop, is a major source of dietary protein for the predominantly vegetarian population. Here we construct a draft genome sequence of blackgram, for the first time, by employing hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. The final de novo whole genome of blackgram is ~ 475 Mb (82 % of the genome) and has maximum scaffold length of 6.3 Mb with scaffold N50 of1.42 Mb. Genome analysis identified 18655 genes with mean coding sequence length of 970bp. Around 96.7 % of predicted genes were annotated. Nearly half of the assembled sequence is composed of repetitive elements with retrotransposons as major (47.3% of genome) transposable elements, whereas, DNA transposons made up only 2.29% of the genome. A total of 166014 SSRs, including 65180 compound SSRs, were identified and primer pairs for 34816 SSRs were designed. Out of the 18665 proteins, 678 proteins showed presence of R-gene related domains. KIN class was found in majority of the proteins (372) followed by RLK (79) and N (79). The genome sequence of blackgram will facilitate identification of agronomically important genes and accelerate the genetic improvement of blackgram.

8 citations


Journal ArticleDOI
TL;DR: Ultraplexing requires the availability of Illumina data and uses inter-sample genetic variability to assign reads to isolates, which obviates the need for molecular barcoding, and can enable significant sequencing and labor cost reductions in large-scale bacterial genome projects.
Abstract: Hybrid genome assembly has emerged as an important technique in bacterial genomics, but cost and labor requirements limit large-scale application. We present Ultraplexing, a method to improve per-sample sequencing cost and hands-on time of Nanopore sequencing for hybrid assembly by at least 50% compared to molecular barcoding while maintaining high assembly quality. Ultraplexing requires the availability of Illumina data and uses inter-sample genetic variability to assign reads to isolates, which obviates the need for molecular barcoding. Thus, Ultraplexing can enable significant sequencing and labor cost reductions in large-scale bacterial genome projects.

6 citations


Posted ContentDOI
28 Jun 2020
TL;DR: This comparative review is unique in providing a detailed comparison of a broad spectrum of cutting-edge algorithms and identifies both the strengths and weaknesses of each method.
Abstract: Despite advances in algorithms and computational platforms, de-novo genome assembly remains a challenging process. Due to the constant innovation in sequencing technologies (Sanger , SOLiD , Illumina, 454 , PacBio and Oxford Nanopore), genome assembly has evolved to respond to the changes in input data type. This paper includes a broad and comparative review of the most recent short-read , long-read and hybrid assembly techniques. In this review, we provide (1) an algorithmic description of the important processes in the workflow that introduces fundamental concepts and improvements; (2) a review of existing software that explains possible options for genome assembly; and (3) a comparison of the accuracy and the performance of existing methods executed on the same computer using the same processing capabilities and using the same set of real and synthetic datasets. Such evaluation allows a fair and precise comparison of accuracy in all aspects. As a result, this paper identifies both the strengths and weaknesses of each method. This comparative review is unique in providing a detailed comparison of a broad spectrum of cutting-edge algorithms

Journal ArticleDOI
04 Feb 2020
TL;DR: This Hi-C genome assembly provides a detailed accurate reference genome which could be utilized to improve Jatropha and other economically important Euphorbiaceae family members.
Abstract: Jatropha curcas is one of the major sources of renewable energy due to potential use of its oil as a biofuel. The genome of this crop is constituted by the high content of repetitive elements. We employed the Hi-C proximity ligation technique to re-scaffold our existing hybrid genome assembly of an elite genotype (RJC1) developed using Illumina and Pacbio technologies. We assembled 99.81% of non-truncated reads to achieve 266.80 Mbp of the genome with an N50 value of 1.58 Mb. Furthermore, we compared the efficiency of Hi-C-augmented genome assembly with the hybrid genome assembly and observed a ~ 50% reduction in scaffolds and a tenfold increase in the N50 value. The gene ontology analysis revealed the identification of terms for molecular function (45.52%), cellular component (33.47%), and biological function (20.99%). Comparative genomic analysis of 13-plant species showed the conservation of 414 lipid metabolizing genes identified in the KEGG pathway analysis. Differential gene expression (DGE) studies were conducted in the healthy and Jatropha mosaic virus-infected leaves via RNA-seq analysis and observed gene expression changes for 2185 genes. Out of these, we observed 546 genes having more than two-fold change of transcript level and among these 259 genes were down-regulated and 287 genes were up-regulated. To validate RNA-seq data, two DEGs were selected for gene expression analysis using qRT-PCR and the data was in correlation with in silico results. RNA-seq analysis further shows the identification of some of the candidate genes and may be useful to develop JMV resistant plants after functional validation. This Hi-C genome assembly provides a detailed accurate reference genome which could be utilized to improve Jatropha and other economically important Euphorbiaceae family members.

Journal ArticleDOI
TL;DR: The results confirm that a lower depth of sequencing is enough to obtain a valuable genome sequence, using secondary scaffolding approaches and demonstrate the benefits of the scaff2link application.
Abstract: Indian fruit bats, flying fox Pteropus medius was identified as an asymptomatic natural host of recently emerged Nipah virus, which is known to induce a severe infectious disease in humans. The absence of P. medius genome sequence presents an important obstacle for further studies of virus-host interactions and better understanding of mechanisms of zoonotic viral emergence. Generation of the high-quality genome sequence is often linked to a considerable effort associated to elevated costs. Although secondary scaffolding methods have reduced sequencing expenses, they imply the development of new tools for the integration of different data sources to achieve more reliable sequencing results. We initially sequenced the P. medius genome using the combination of Illumina paired-end and Nanopore sequencing, with a depth of 57.4x and 6.1x, respectively. Then, we introduced the novel scaff2link software to integrate multiple sources of information for secondary scaffolding, allowing to remove the association with discordant information among two sources. Different quality metrics were next produced to validate the benefits from secondary scaffolding. The P. medius genome, assembled by this method, has a length of 1,985 Mb and consists of 33,613 contigs and 16,113 scaffolds with an NG50 of 19 Mb. At least 22.5% of the assembled sequences is covered by interspersed repeats already described in other species and 19,823 coding genes are annotated. Phylogenetic analysis demonstrated the clustering of P. medius genome with two other Pteropus bat species, P. alecto and P. vampyrus, for which genome sequences are currently available. SARS-CoV entry receptor ACE2 sequence of P. medius was 82.7% identical with ACE2 of Rhinolophus sinicus bats, thought to be the natural host of SARS-CoV. Altogether, our results confirm that a lower depth of sequencing is enough to obtain a valuable genome sequence, using secondary scaffolding approaches and demonstrate the benefits of the scaff2link application. The genome sequence is now available to the scientific community to (i) proceed with further genomic analysis of P. medius, (ii) to characterize the underlying mechanism allowing Nipah virus maintenance and perpetuation in its bat host, and (iii) to monitor their evolutionary pathways toward a better understanding of bats' ability to control viral infections.