Minimap2: pairwise alignment for nucleotide sequences
TLDR
Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.Abstract:
Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2. Supplementary information Supplementary data are available at Bioinformatics online.read more
Citations
More filters
Journal ArticleDOI
Improved reference genome of Aedes aegypti informs arbovirus vector control
Benjamin J. Matthews,Benjamin J. Matthews,Olga Dudchenko,Olga Dudchenko,Sarah B. Kingan,Sergey Koren,Igor Antoshechkin,Jacob E. Crawford,William J. Glassford,Margaret Herre,Seth Redmond,Seth Redmond,Noah H. Rose,Gareth D. Weedall,Gareth D. Weedall,Yang Wu,Yang Wu,Sanjit S. Batra,Sanjit S. Batra,Carlos A Brito-Sierra,Steven D. Buckingham,Corey L. Campbell,Saki Chan,Eric Cox,Benjamin R. Evans,Thanyalak Fansiri,Igor Filipović,Albin Fontaine,Andrea Gloria-Soria,Andrea Gloria-Soria,Richard Hall,Vinita Joardar,Andrew K. Jones,Raissa G.G. Kay,Vamsi K. Kodali,Joyce Lee,Gareth J Lycett,Sara N. Mitchell,Jill Muehling,Michael R. Murphy,Arina D. Omer,Arina D. Omer,Frederick A. Partridge,Paul Peluso,Aviva Presser Aiden,Aviva Presser Aiden,Vidya Ramasamy,Gordana Rašić,Sourav Roy,Karla Saavedra-Rodriguez,Shruti Sharan,Atashi Sharma,Melissa Smith,Joe Turner,Allison M Weakley,Zhilei Zhao,Omar S. Akbari,William C. Black,Han Cao,Alistair C. Darby,Catherine A. Hill,J. Spencer Johnston,Terence Murphy,Alexander S. Raikhel,David B. Sattelle,Igor V. Sharakhov,Igor V. Sharakhov,Bradley J. White,Li Zhao,Erez Lieberman Aiden,Erez Lieberman Aiden,Erez Lieberman Aiden,Richard S. Mann,Louis Lambrechts,Louis Lambrechts,Jeffrey R. Powell,Maria V. Sharakhova,Maria V. Sharakhova,Zhijian Tu,Hugh M. Robertson,Carolyn S. McBride,Alex Hastie,Jonas Korlach,Daniel E. Neafsey,Daniel E. Neafsey,Adam M. Phillippy,Leslie B. Vosshall,Leslie B. Vosshall +87 more
TL;DR: An improved, fully re-annotated Aedes aegypti genome assembly (AaegL5) provides insights into the sex-determining M locus, chemosensory systems that help mosquitoes to hunt humans and loci involved in insecticide resistance and will help to generate intervention strategies to fight this deadly disease vector.
Posted ContentDOI
Transcriptome assembly from long-read RNA-seq alignments with StringTie2
TL;DR: StringTie2 is a reference-guided transcriptome assembler that works with both short and long reads and includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate.
Journal ArticleDOI
NextPolish: a fast and efficient genome polishing tool for long-read assembly.
TL;DR: NextPolish is a tool that efficiently corrects sequence errors in genomes assembled with long reads by consisting of two interlinked modules designed to score and count K-mers from high quality short reads, and to polish genome assemblies containing large numbers of base errors.
Journal ArticleDOI
Piercing the dark matter: bioinformatics of long-range sequencing and mapping
TL;DR: This Review discusses bioinformatics tools that have been devised to handle the numerous characteristic features of these long-range data types, with applications in genome assembly, genetic variant detection, haplotype phasing, transcriptomics and epigenomics.
Journal ArticleDOI
RaGOO: fast and accurate reference-guided scaffolding of draft genomes
Michael Alonge,Sebastian Soyk,Srividya Ramakrishnan,Xingang Wang,Sara Goodwin,Fritz J. Sedlazeck,Zachary B. Lippman,Zachary B. Lippman,Michael C. Schatz,Michael C. Schatz +9 more
TL;DR: This work presents RaGOO, a reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in minutes and demonstrates the scalability and utility of the tool.
References
More filters
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI
Fast gapped-read alignment with Bowtie 2
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Journal ArticleDOI
STAR: ultrafast universal RNA-seq aligner
Alexander Dobin,Carrie A. Davis,Felix Schlesinger,Jorg Drenkow,Chris Zaleski,Sonali Jha,Philippe Batut,Mark Chaisson,Thomas R. Gingeras +8 more
TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.