scispace - formally typeset
Open AccessJournal ArticleDOI

Minimap2: pairwise alignment for nucleotide sequences

Heng Li
- 15 Sep 2018 - 
- Vol. 34, Iss: 18, pp 3094-3100
Reads0
Chats0
TLDR
Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.
Abstract
Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2. Supplementary information Supplementary data are available at Bioinformatics online.

read more

Citations
More filters
Journal ArticleDOI

Improved metagenomic analysis with Kraken 2.

TL;DR: Kraken 2 improves upon Kraken 1 by reducing memory usage by 85%, allowing greater amounts of reference genomic data to be used, while maintaining high accuracy and increasing speed fivefold.

Integrative Genomics Viewer

TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Journal ArticleDOI

The Architecture of SARS-CoV-2 Transcriptome.

TL;DR: Functional investigation of the unknown transcripts and RNA modifications discovered in this study will open new directions to the understanding of the life cycle and pathogenicity of SARS-CoV-2.
Journal ArticleDOI

Performance of neural network basecalling tools for Oxford Nanopore sequencing.

TL;DR: The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance, and users should consider producing a custom model using a larger neural network and/or training data from the same species.
References
More filters
Journal ArticleDOI

MUMmer4: A fast and versatile genome alignment system.

TL;DR: MUMmer4 is described, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of Mummer to a 48- bit suffix array, and that offers improved speed through parallel processing of input query sequences.
Journal ArticleDOI

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory

TL;DR: The results indicate that it is possible to map SMS reads with high accuracy and speed, and the inferences made on the mapability of SMS reads using the combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.
Journal ArticleDOI

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences

TL;DR: A new mapper, minimap and a de novo assembler, miniasm, is presented for efficiently mapping and assembling SMRT and ONT reads without an error correction stage.
Journal ArticleDOI

Accurate detection of complex structural variations using single-molecule sequencing.

TL;DR: NGMLR and Sniffles perform highly accurate alignment and structural variation detection from long-read sequencing data and can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.
Related Papers (5)