Minimap2: pairwise alignment for nucleotide sequences
Reads0
Chats0
TLDR
Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.Abstract:
Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2. Supplementary information Supplementary data are available at Bioinformatics online.read more
Citations
More filters
Journal ArticleDOI
Improved metagenomic analysis with Kraken 2.
TL;DR: Kraken 2 improves upon Kraken 1 by reducing memory usage by 85%, allowing greater amounts of reference genomic data to be used, while maintaining high accuracy and increasing speed fivefold.
Integrative Genomics Viewer
James T. Robinson,Helga Thorvaldsdottir,Wendy Winckler,Mitchell Guttman,Eric S. Lander,Eric S. Lander,Gad Getz,Jill P. Mesirov +7 more
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Journal ArticleDOI
The Architecture of SARS-CoV-2 Transcriptome.
TL;DR: Functional investigation of the unknown transcripts and RNA modifications discovered in this study will open new directions to the understanding of the life cycle and pathogenicity of SARS-CoV-2.
Journal ArticleDOI
Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding.
Tyler N. Starr,Allison J. Greaney,Allison J. Greaney,Sarah K Hilton,Sarah K Hilton,Daniel Ellis,Katharine H.D. Crawford,Katharine H.D. Crawford,Adam S. Dingens,Mary Jane Navarro,John E. Bowen,M. Alejandra Tortorici,Alexandra C. Walls,Neil P. King,David Veesler,Jesse D. Bloom,Jesse D. Bloom,Jesse D. Bloom +17 more
TL;DR: It is found that a substantial number of mutations to the RBD are well tolerated or even enhance ACE2 binding, including at ACE2 interface residues that vary across SARS-related coronaviruses.
Journal ArticleDOI
Performance of neural network basecalling tools for Oxford Nanopore sequencing.
TL;DR: The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance, and users should consider producing a custom model using a larger neural network and/or training data from the same species.
References
More filters
Journal ArticleDOI
Nanopore sequencing and assembly of a human genome with ultra-long reads
Miten Jain,Sergey Koren,Karen H. Miga,Josh Quick,Arthur C Rand,Thomas A Sasani,John R. Tyson,Andrew D Beggs,Alexander T. Dilthey,Ian T. Fiddes,Sunir Malla,Hannah Marriott,Tom Nieto,Justin O'Grady,Hugh E. Olsen,Brent S. Pedersen,Arang Rhie,Hollian Richardson,Aaron R. Quinlan,Terrance P. Snutch,Louise Tee,Benedict Paten,Adam M. Phillippy,Jared T. Simpson,Jared T. Simpson,Nicholas J. Loman,Matthew Loose +26 more
TL;DR: Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.
Journal ArticleDOI
MUMmer4: A fast and versatile genome alignment system.
Guillaume Marçais,Arthur L. Delcher,Adam M. Phillippy,Rachel Coston,Steven L. Salzberg,Aleksey V. Zimin,Aleksey V. Zimin +6 more
TL;DR: MUMmer4 is described, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of Mummer to a 48- bit suffix array, and that offers improved speed through parallel processing of input query sequences.
Journal ArticleDOI
Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
Mark Chaisson,Glenn Tesler +1 more
TL;DR: The results indicate that it is possible to map SMS reads with high accuracy and speed, and the inferences made on the mapability of SMS reads using the combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.
Journal ArticleDOI
Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
TL;DR: A new mapper, minimap and a de novo assembler, miniasm, is presented for efficiently mapping and assembling SMRT and ONT reads without an error correction stage.
Journal ArticleDOI
Accurate detection of complex structural variations using single-molecule sequencing.
Fritz J. Sedlazeck,Philipp Rescheneder,Moritz Smolka,Han Fang,Maria Nattestad,Arndt von Haeseler,Arndt von Haeseler,Michael C. Schatz,Michael C. Schatz +8 more
TL;DR: NGMLR and Sniffles perform highly accurate alignment and structural variation detection from long-read sequencing data and can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.