Versatile genome assembly evaluation with QUAST-LG.
TLDR
This manuscript demonstrates performance of the state‐of‐the‐art genome assembly software on six eukaryotic datasets sequenced using different technologies and introduces a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness.Abstract:
Motivation The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG-a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference. Availability and implementation http://cab.spbu.ru/software/quast-lg. Supplementary information Supplementary data are available at Bioinformatics online.read more
Citations
More filters
SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Journal ArticleDOI
Assembly of long, error-prone reads using repeat graphs
TL;DR: Flye as mentioned in this paper constructs an accurate repeat graph from these error-riddled disjointigs by generating arbitrary paths in an unknown repeat graph, which can then be used for genome assembly.
Journal ArticleDOI
Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies
TL;DR: This work presents Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations, and demonstrates on both human and plant genomes that it is a fast and robust method for assembly validation.
Journal ArticleDOI
metaFlye: scalable long-read metagenome assembly using repeat graphs
Mikhail Kolmogorov,Derek M. Bickhart,Bahar Behsaz,Alexey Gurevich,Mikhail Rayko,Sung Bong Shin,Kristen L. Kuhn,Jeffrey Yuan,Evgeny Polevikov,Timothy P. L. Smith,Pavel A. Pevzner +10 more
TL;DR: MetaFlye is presented, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity, and benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long- read assemblers.
Journal ArticleDOI
Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.
Kishwar Shafin,Trevor Pesout,Ryan Lorig-Roach,Marina Haukness,Hugh E. Olsen,Colleen M. Bosworth,Joel Armstrong,Kristof Tigyi,Nicholas Maurer,Sergey Koren,Fritz J. Sedlazeck,Tobias Marschall,Simon Mayes,Vania Costa,Justin M. Zook,Kelvin J. Liu,Duncan Kilburn,Melanie Sorensen,Katherine M. Munson,Mitchell R. Vollger,Jean Monlong,Erik Garrison,Evan E. Eichler,Evan E. Eichler,Sofie R. Salama,David Haussler,Richard E. Green,Mark Akeson,Adam M. Phillippy,Karen H. Miga,Paolo Carnevali,Miten Jain,Benedict Paten +32 more
TL;DR: High contiguity human genomes can be assembled de novo in 6 h using nanopore long-read sequences and the Shasta toolkit, and the assembly performance is compared to existing methods for diploid, haploid and trio-binned human samples and report superior accuracy and speed.
References
More filters
Journal ArticleDOI
Cutadapt removes adapter sequences from high-throughput sequencing reads
TL;DR: The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.
Journal ArticleDOI
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
Anton Bankevich,Sergey Nurk,Dmitry Antipov,Alexey Gurevich,Mikhail Dvorkin,Alexander S. Kulikov,Valery M. Lesin,Sergey I. Nikolenko,Son Pham,Andrey D. Prjibelski,Alexey V. Pyshkin,Alexander Sirotkin,Nikolay Vyahhi,Glenn Tesler,Max A. Alekseyev,Pavel A. Pevzner +15 more
TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.
SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Journal ArticleDOI
Circos: An information aesthetic for comparative genomics
Martin Krzywinski,Jacqueline E. Schein,Inanc Birol,Joseph M. Connors,Randy D. Gascoyne,Doug Horsman,Steven J.M. Jones,Marco A. Marra +7 more
TL;DR: Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements.
Posted ContentDOI
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
TL;DR: BWA-MEM automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment, which is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases.