Minimap2: pairwise alignment for nucleotide sequences

doi:10.1093/BIOINFORMATICS/BTY191

Open AccessJournal ArticleDOI

Minimap2: pairwise alignment for nucleotide sequences

Heng Li

- 15 Sep 2018 -

Bioinformatics

- Vol. 34, Iss: 18, pp 3094-3100

Chats0

TLDR

Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.

Abstract:

Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2. Supplementary information Supplementary data are available at Bioinformatics online.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Improved metagenomic analysis with Kraken 2.

Derrick E. Wood, +2 more

- 28 Nov 2019 -

Genome Biology

TL;DR: Kraken 2 improves upon Kraken 1 by reducing memory usage by 85%, allowing greater amounts of reference genomic data to be used, while maintaining high accuracy and increasing speed fivefold.

...read moreread less

Integrative Genomics Viewer

James T. Robinson, +7 more

TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.

...read moreread less

Journal ArticleDOI

The Architecture of SARS-CoV-2 Transcriptome.

Dong Wan Kim, +5 more

- 14 May 2020 -

Cell

TL;DR: Functional investigation of the unknown transcripts and RNA modifications discovered in this study will open new directions to the understanding of the life cycle and pathogenicity of SARS-CoV-2.

...read moreread less

Journal ArticleDOI

Performance of neural network basecalling tools for Oxford Nanopore sequencing.

Ryan R. Wick, +3 more

- 24 Jun 2019 -

Genome Biology

TL;DR: The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance, and users should consider producing a custom model using a larger neural network and/or training data from the same species.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells

Ashley Byrne, +9 more

- 19 Jul 2017 -

Nature Communications

TL;DR: This work investigates whether RNAseq using the long-read single-molecule Oxford Nanopore MinION sequencer is able to identify and quantify complex isoforms without sacrificing accurate gene expression quantification, and shows that it can identify and quantify complexisoforms at the single cell level.

...read moreread less

Posted ContentDOI

Nanopore sequencing and assembly of a human genome with ultra-long reads

Miten Jain, +25 more

- 20 Apr 2017 -

bioRxiv

TL;DR: Modelling the repeat structure of the human genome predicts extraordinarily contiguous assemblies may be possible using nanopore reads alone, and it is found that adding an additional 5×-coverage of ‘ultra-long’ reads more than doubled the assembly contiguity.

...read moreread less

Posted Content

Faster and More Accurate Sequence Alignment with SNAP

Matei Zaharia, +8 more

- 23 Nov 2011 -

arXiv: Data Structures and Algorithms

TL;DR: The Scalable Nucleotide Alignment Program is presented, a new short and long read aligner that is both more accurate and faster than state-of-the-art tools such as BWA and provides a rich error model that can match classes of mutations that today's fast aligners ignore.

...read moreread less

Journal ArticleDOI

Optimal sequence alignment using affine gap costs

Stephen F. Altschul, +2 more

- 01 Jan 1986 -

Bulletin of Mathematical Biology

TL;DR: This paper provides an example for which this part of Gotoh's algorithm fails and describes an algorithm that finds all and only the optimal alignments, which still requires orderMN steps.

...read moreread less

DOI

Mason – A Read Simulator for Second Generation Sequencing Data

Manuel Holtgrewe

TL;DR: A read simulator software for Illumina, 454 and Sanger reads that has been written with performance in mind and can sample reads from large genomes.

...read moreread less