Minimap2: pairwise alignment for nucleotide sequences

doi:10.1093/BIOINFORMATICS/BTY191

Open AccessJournal ArticleDOI

Minimap2: pairwise alignment for nucleotide sequences

Heng Li

- 15 Sep 2018 -

Bioinformatics

- Vol. 34, Iss: 18, pp 3094-3100

Chats0

TLDR

Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.

Abstract:

Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2. Supplementary information Supplementary data are available at Bioinformatics online.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Improved metagenomic analysis with Kraken 2.

Derrick E. Wood, +2 more

- 28 Nov 2019 -

Genome Biology

TL;DR: Kraken 2 improves upon Kraken 1 by reducing memory usage by 85%, allowing greater amounts of reference genomic data to be used, while maintaining high accuracy and increasing speed fivefold.

...read moreread less

Integrative Genomics Viewer

James T. Robinson, +7 more

TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.

...read moreread less

Journal ArticleDOI

The Architecture of SARS-CoV-2 Transcriptome.

Dong Wan Kim, +5 more

- 14 May 2020 -

Cell

TL;DR: Functional investigation of the unknown transcripts and RNA modifications discovered in this study will open new directions to the understanding of the life cycle and pathogenicity of SARS-CoV-2.

...read moreread less

Journal ArticleDOI

Performance of neural network basecalling tools for Oxford Nanopore sequencing.

Ryan R. Wick, +3 more

- 24 Jun 2019 -

Genome Biology

TL;DR: The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance, and users should consider producing a custom model using a larger neural network and/or training data from the same species.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Nanopore sequencing and assembly of a human genome with ultra-long reads

Miten Jain, +26 more

- 29 Jan 2018 -

Nature Biotechnology

TL;DR: Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.

...read moreread less

Journal ArticleDOI

MUMmer4: A fast and versatile genome alignment system.

Guillaume Marçais, +6 more

- 26 Jan 2018 -

PLOS Computational Biology

TL;DR: MUMmer4 is described, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of Mummer to a 48- bit suffix array, and that offers improved speed through parallel processing of input query sequences.

...read moreread less

Journal ArticleDOI

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory

Mark Chaisson, +1 more

- 19 Sep 2012 -

BMC Bioinformatics

TL;DR: The results indicate that it is possible to map SMS reads with high accuracy and speed, and the inferences made on the mapability of SMS reads using the combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.

...read moreread less

Journal ArticleDOI

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences

Heng Li

- 15 Jul 2016 -

Bioinformatics

TL;DR: A new mapper, minimap and a de novo assembler, miniasm, is presented for efficiently mapping and assembling SMRT and ONT reads without an error correction stage.

...read moreread less

Journal ArticleDOI

Accurate detection of complex structural variations using single-molecule sequencing.

Fritz J. Sedlazeck, +8 more

- 30 Apr 2018 -

Nature Methods

TL;DR: NGMLR and Sniffles perform highly accurate alignment and structural variation detection from long-read sequencing data and can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.

...read moreread less