scispace - formally typeset
Open AccessJournal ArticleDOI

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, +1 more
- 01 Jul 2009 - 
- Vol. 25, Iss: 14, pp 1754-1760
TLDR
Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract
Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

TL;DR: This work presents a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index, and uses it to represent and search an expanded model of the human reference genome.
Journal ArticleDOI

Comprehensive molecular characterization of gastric adenocarcinoma

Adam J. Bass, +257 more
- 11 Sep 2014 - 
TL;DR: A comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project is described and a molecular classification dividing gastric cancer into four subtypes is proposed.

Integrative analysis of 111 reference human epigenomes

TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Posted ContentDOI

fastp: an ultra-fast all-in-one FASTQ preprocessor

TL;DR: Fastp is developed as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features that can perform quality control, adapter trimming, quality filtering, per-read quality cutting, and many other operations with a single scan of the FastQ data.
Journal ArticleDOI

A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD

Alan E. Renton, +85 more
- 20 Oct 2011 - 
TL;DR: The chromosome 9p21 amyotrophic lateral sclerosis-frontotemporal dementia (ALS-FTD) locus contains one of the last major unidentified autosomal-dominant genes underlying these common neurodegenerative diseases, and a large hexanucleotide repeat expansion in the first intron of C9ORF72 is shown.
References
More filters
Journal ArticleDOI

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI

The Sequence Alignment/Map format and SAMtools

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Journal ArticleDOI

Improved tools for biological sequence comparison.

TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Journal ArticleDOI

BLAT—The BLAST-Like Alignment Tool

TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.
Related Papers (5)