FLASH: Fast Length Adjustment of Short Reads to Improve Genome Assemblies
Tanja Magoc,Steven L. Salzberg +1 more
TLDR
FLASH is a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short and when FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.Abstract:
Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome.
Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.
Availability and Implementation: The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash.
Contact: moc.liamg@cogam.tread more
Citations
More filters
Journal ArticleDOI
Whole-organism lineage tracing by combinatorial and cumulative genome editing
Aaron McKenna,Gregory M. Findlay,James A. Gagnon,Marshall S. Horwitz,Alexander F. Schier,Jay Shendure,Jay Shendure +6 more
TL;DR: It is shown that combinatorial, cumulative genome editing of a compact barcode can be used to record lineage information in multicellular systems and that rich, systematically generated maps of organismal development will advance the understanding of development in both healthy and disease states.
Journal ArticleDOI
Tumour-associated and non-tumour-associated microbiota in colorectal cancer
Burkhardt Flemer,Denise B. Lynch,Jillian R.M. Brown,Ian B. Jeffery,Feargal J. Ryan,Marcus J. Claesson,M. G. O'riordain,Fergus Shanahan,Paul W. O'Toole +8 more
TL;DR: CRC-associated microbiota profiles differ from those in healthy subjects and are linked with distinct mucosal gene-expression profiles, which differ between distal and proximal cancers.
Journal ArticleDOI
The NASA Twins Study: A multidimensional analysis of a year-long human spaceflight.
Francine E. Garrett-Bakelman,Francine E. Garrett-Bakelman,Manjula Darshi,Stefan J. Green,Ruben C. Gur,Ling Lin,Brandon R. Macias,Miles J. McKenna,Cem Meydan,Tejaswini Mishra,Jad Nasrini,Brian D. Piening,Brian D. Piening,Lindsay F. Rizzardi,Kumar Sharma,Jamila H. Siamwala,Jamila H. Siamwala,Lynn Taylor,Martha Hotz Vitaterna,Maryam Afkarian,Ebrahim Afshinnekoo,Sara Ahadi,Aditya Ambati,Maneesh Arya,Daniela Bezdan,Colin M. Callahan,Songjie Chen,Augustine M.K. Choi,George E. Chlipala,Kévin Contrepois,Marisa Covington,Brian Crucian,Immaculata De Vivo,David F. Dinges,Douglas J. Ebert,Jason I. Feinberg,Jorge Gandara,Kerry George,John Goutsias,George Grills,Alan R. Hargens,Martina Heer,Martina Heer,Ryan P. Hillary,Andrew N. Hoofnagle,Vivian Hook,Garrett Jenkinson,Garrett Jenkinson,Peng Jiang,Ali Keshavarzian,Steven S. Laurie,Brittany Lee-McMullen,Sarah B. Lumpkins,Matthew MacKay,Mark Maienschein-Cline,Ari Melnick,Tyler M. Moore,Kiichi Nakahira,Hemal H. Patel,Robert Pietrzyk,Varsha Rao,Rintaro Saito,Rintaro Saito,Denis Salins,Jan M. Schilling,Dorothy D. Sears,Caroline Sheridan,Michael B. Stenger,Rakel Tryggvadottir,Alexander E. Urban,Tomas Vaisar,Benjamin Van Espen,Jing Zhang,Michael G. Ziegler,Sara R. Zwart,John B. Charles,Craig E. Kundrot,Graham B. I. Scott,Susan M. Bailey,Mathias Basner,Andrew P. Feinberg,Stuart M. C. Lee,Christopher E. Mason,Emmanuel Mignot,Brinda K. Rana,Scott M. Smith,Michael Snyder,Fred W. Turek,Fred W. Turek +88 more
TL;DR: Given that the majority of the biological and human health variables remained stable, or returned to baseline, after a 340-day space mission, these data suggest that human health can be mostly sustained over this duration of spaceflight.
Journal ArticleDOI
AdapterRemoval: easy cleaning of next-generation sequencing reads
TL;DR: AdaptersRemoval is shown to be good at trimming adapters from both single-end and paired-end data, and it exhibits good performance both in terms of sensitivity and specificity.
Journal ArticleDOI
FROGS: Find, Rapidly, OTUs with Galaxy Solution.
Frédéric Escudié,Lucas Auer,Maria Bernard,Mahendra Mariadassou,Laurent Cauquil,Katia Vidal,Sarah Maman,Guillermina Hernandez-Raquet,Sylvie Combes,Géraldine Pascal +9 more
TL;DR: This Galaxy‐supported pipeline, called FROGS, is designed to analyze large sets of amplicon sequences and produce abundance tables of Operational Taxonomic Units (OTUs) and their taxonomic affiliation to highlight databases conflicts and uncertainties.
References
More filters
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Journal ArticleDOI
Versatile and open software for comparing large genomes
Stefan Kurtz,Adam M. Phillippy,Arthur L. Delcher,Michael E. Smoot,Martin Shumway,Corina Antonescu,Steven L. Salzberg +6 more
TL;DR: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.
Journal ArticleDOI
De novo assembly of human genomes with massively parallel short read sequencing
Ruiqiang Li,Hongmei Zhu,Jue Ruan,Wubin Qian,Xiaodong Fang,Zhongbin Shi,Yingrui Li,Shengting Li,Gao Shan,Karsten Kristiansen,Songgang Li,Huanming Yang,Jing Wang,Jun Wang +13 more
TL;DR: The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Journal ArticleDOI
High-quality draft assemblies of mammalian genomes from massively parallel sequence data
Sante Gnerre,Iain MacCallum,Dariusz Przybylski,Filipe J. Ribeiro,Joshua N. Burton,Bruce J. Walker,Ted Sharpe,Giles Hall,Terrance Shea,Sean M. Sykes,Aaron M. Berlin,Daniel Aird,Maura Costello,Riza M. Daza,Louise Williams,Robert Nicol,Andreas Gnirke,Chad Nusbaum,Eric S. Lander,David B. Jaffe +19 more
TL;DR: The development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform, have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome.
Related Papers (5)
QIIME allows analysis of high-throughput community sequencing data.
J. Gregory Caporaso,Justin Kuczynski,Jesse Stombaugh,Kyle Bittinger,Frederic D. Bushman,Elizabeth K. Costello,Noah Fierer,Antonio Gonzalez Peña,Julia K. Goodrich,Jeffrey I. Gordon,Gavin A. Huttley,Scott T. Kelley,Dan Knights,Jeremy E. Koenig,Ruth E. Ley,Catherine A. Lozupone,Daniel McDonald,Brian D. Muegge,Meg Pirrung,Jens Reeder,Joel Sevinsky,Peter J. Turnbaugh,William A. Walters,Jeremy Widmann,Tanya Yatsunenko,Jesse R. Zaneveld,Rob Knight,Rob Knight +27 more
Trimmomatic: a flexible trimmer for Illumina sequence data
Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities
Patrick D. Schloss,Patrick D. Schloss,Sarah L. Westcott,Sarah L. Westcott,Thomas Ryabin,Justine R. Hall,Martin Hartmann,Emily B. Hollister,Ryan A. Lesniewski,Brian B. Oakley,Donovan H. Parks,Courtney J. Robinson,Jason W. Sahl,Blaz Stres,Gerhard G. Thallinger,David J. Van Horn,Carolyn F. Weber +16 more