Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

doi:10.1101/GR.215087.116

Open AccessJournal ArticleDOI

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

Sergey Koren, +5 more

- 15 Mar 2017 -

Genome Research

- Vol. 27, Iss: 5, pp 722-736

TLDR

Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences, is presented, demonstrating that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences or Oxford Nanopore technologies.

Abstract:

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.

Ryan R. Wick, +3 more

- 08 Jun 2017 -

PLOS Computational Biology

TL;DR: Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low.

...read moreread less

Journal ArticleDOI

High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.

Chirag Jain, +4 more

- 30 Nov 2018 -

Nature Communications

TL;DR: FastANI is developed, a method to compute ANI using alignment-free approximate sequence mapping, and it is shown 95% ANI is an accurate threshold for demarcating prokaryotic species by analyzing about 90,000 proKaryotic genomes.

...read moreread less

Journal ArticleDOI

Assembly of long, error-prone reads using repeat graphs

Mikhail Kolmogorov, +3 more

- 01 Apr 2019 -

Nature Biotechnology

TL;DR: Flye as mentioned in this paper constructs an accurate repeat graph from these error-riddled disjointigs by generating arbitrary paths in an unknown repeat graph, which can then be used for genome assembly.

...read moreread less

Posted ContentDOI

Unicycler: resolving bacterial genome assemblies from short and long sequencing reads

Ryan R. Wick, +3 more

- 22 Dec 2016 -

bioRxiv

TL;DR: Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long read depth and accuracy are low.

...read moreread less

Journal ArticleDOI

Fast and accurate de novo genome assembly from long uncorrected reads

Robert Vaser, +3 more

- 18 Jan 2017 -

Genome Research

TL;DR: It is shown that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

Daniel R. Zerbino, +1 more

- 01 May 2008 -

Genome Research

TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.

...read moreread less

Journal ArticleDOI

Base-calling of automated sequencer traces using Phred. I. accuracy assessment

Brent Ewing, +3 more

- 01 Mar 1998 -

Genome Research

TL;DR: In this article, a base-calling program for automated sequencer traces, phred, with improved accuracy was proposed. But it was not shown to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.

...read moreread less

Journal ArticleDOI

Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement

Bruce J. Walker, +10 more

- 19 Nov 2014 -

PLOS ONE

TL;DR: Pilon is a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions, which is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains.

...read moreread less

Journal ArticleDOI

Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities

Brent Ewing, +1 more

- 01 Mar 1998 -

Genome Research

TL;DR: The ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data, is developed and implemented in the base-calling program.

...read moreread less

Journal ArticleDOI

Versatile and open software for comparing large genomes

Stefan Kurtz, +6 more

- 30 Jan 2004 -

Genome Biology

TL;DR: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.

...read moreread less