Minimap2: pairwise alignment for nucleotide sequences
Reads0
Chats0
TLDR
Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.Abstract:
Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2. Supplementary information Supplementary data are available at Bioinformatics online.read more
Citations
More filters
Journal ArticleDOI
Improved metagenomic analysis with Kraken 2.
TL;DR: Kraken 2 improves upon Kraken 1 by reducing memory usage by 85%, allowing greater amounts of reference genomic data to be used, while maintaining high accuracy and increasing speed fivefold.
Integrative Genomics Viewer
James T. Robinson,Helga Thorvaldsdottir,Wendy Winckler,Mitchell Guttman,Eric S. Lander,Eric S. Lander,Gad Getz,Jill P. Mesirov +7 more
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Journal ArticleDOI
The Architecture of SARS-CoV-2 Transcriptome.
TL;DR: Functional investigation of the unknown transcripts and RNA modifications discovered in this study will open new directions to the understanding of the life cycle and pathogenicity of SARS-CoV-2.
Journal ArticleDOI
Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding.
Tyler N. Starr,Allison J. Greaney,Allison J. Greaney,Sarah K Hilton,Sarah K Hilton,Daniel Ellis,Katharine H.D. Crawford,Katharine H.D. Crawford,Adam S. Dingens,Mary Jane Navarro,John E. Bowen,M. Alejandra Tortorici,Alexandra C. Walls,Neil P. King,David Veesler,Jesse D. Bloom,Jesse D. Bloom,Jesse D. Bloom +17 more
TL;DR: It is found that a substantial number of mutations to the RBD are well tolerated or even enhance ACE2 binding, including at ACE2 interface residues that vary across SARS-related coronaviruses.
Journal ArticleDOI
Performance of neural network basecalling tools for Oxford Nanopore sequencing.
TL;DR: The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance, and users should consider producing a custom model using a larger neural network and/or training data from the same species.
References
More filters
Journal ArticleDOI
Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells
Ashley Byrne,Anna E. Beaudin,Hugh E. Olsen,Miten Jain,Charles Cole,Theron Palmer,Rebecca M. DuBois,E. Camilla Forsberg,Mark Akeson,Christopher Vollmers +9 more
TL;DR: This work investigates whether RNAseq using the long-read single-molecule Oxford Nanopore MinION sequencer is able to identify and quantify complex isoforms without sacrificing accurate gene expression quantification, and shows that it can identify and quantify complexisoforms at the single cell level.
Posted ContentDOI
Nanopore sequencing and assembly of a human genome with ultra-long reads
Miten Jain,Sergey Koren,Josh Quick,Arthur C Rand,Thomas A Sasani,John R. Tyson,Andrew D Beggs,Alexander T. Dilthey,Ian T. Fiddes,Sunir Malla,Hannah Marriott,Karen H. Miga,Tom Nieto,Justin O'Grady,Hugh E. Olsen,Brent S. Pedersen,Arang Rhie,Hollian Richardson,Aaron R. Quinlan,Terrance P. Snutch,Louise Tee,Benedict Paten,Adam M. Phillippy,Jared T. Simpson,Nicholas J. Loman,Matthew Loose +25 more
TL;DR: Modelling the repeat structure of the human genome predicts extraordinarily contiguous assemblies may be possible using nanopore reads alone, and it is found that adding an additional 5×-coverage of ‘ultra-long’ reads more than doubled the assembly contiguity.
Posted Content
Faster and More Accurate Sequence Alignment with SNAP
Matei Zaharia,William J. Bolosky,Kristal Curtis,Armando Fox,David A. Patterson,Scott Shenker,Ion Stoica,Richard M. Karp,Taylor Sittler +8 more
TL;DR: The Scalable Nucleotide Alignment Program is presented, a new short and long read aligner that is both more accurate and faster than state-of-the-art tools such as BWA and provides a rich error model that can match classes of mutations that today's fast aligners ignore.
Journal ArticleDOI
Optimal sequence alignment using affine gap costs
TL;DR: This paper provides an example for which this part of Gotoh's algorithm fails and describes an algorithm that finds all and only the optimal alignments, which still requires orderMN steps.
Mason – A Read Simulator for Second Generation Sequencing Data
TL;DR: A read simulator software for Illumina, 454 and Sanger reads that has been written with performance in mind and can sample reads from large genomes.