Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TLDR
Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.Abstract:
Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals.
Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package.
Availability: http://maq.sourceforge.net
Contact: [email protected]read more
Citations
More filters
Journal ArticleDOI
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
TL;DR: This work presents a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index, and uses it to represent and search an expanded model of the human reference genome.
Journal ArticleDOI
Comprehensive molecular characterization of gastric adenocarcinoma
Adam J. Bass,Vesteinn Thorsson,Ilya Shmulevich,Sheila Reynolds,Michael Miller,Brady Bernard,Toshinori Hinoue,Peter W. Laird,Christina Curtis,Hui Shen,Daniel J. Weisenberger,Nikolaus Schultz,Ronglai Shen,Nils Weinhold,David P. Kelsen,Reanne Bowlby,Andy Chu,Katayoon Kasaian,Andrew J. Mungall,A. Gordon Robertson,Payal Sipahimalani,Andrew D. Cherniack,Gad Getz,Yingchun Liu,Michael S. Noble,Chandra Pedamallu,Carrie Sougnez,Amaro Taylor-Weiner,Rehan Akbani,Ju Seog Lee,Wenbin Liu,GB Mills,Da Yang,Wei Zhang,Angeliki Pantazi,Michael Parfenov,Margaret L. Gulley,M. Blanca Piazuelo,Barbara G. Schneider,Jihun Kim,Alex Boussioutas,Margi Sheth,John A. Demchok,Charles S. Rabkin,Joseph Willis,Sam Ng,Katherine S. Garman,David G. Beer,Arjun Pennathur,Benjamin J. Raphael,Hsin-Ta Wu,Robert D. Odze,Hark Kyun Kim,Jay Bowen,Kristen M. Leraas,Tara M. Lichtenberg,Stephanie Weaver,Michael D. McLellan,Maciej Wiznerowicz,Ryo Sakai,Michael S. Lawrence,Kristian Cibulskis,Lee Lichtenstein,Sheila Fisher,Stacey Gabriel,Eric S. Lander,Li Ding,Beifang Niu,Adrian Ally,Miruna Balasundaram,Inanc Birol,Denise Brooks,Yaron S.N. Butterfield,Rebecca Carlsen,Justin Chu,Eric Chuah,Hye Jung E. Chun,Amanda Clarke,Noreen Dhalla,Ranabir Guin,Robert A. Holt,Steven J.M. Jones,Darlene Lee,Haiyan A. Li,Emilia L. Lim,Yussanne Ma,Marco A. Marra,Michael Mayo,Richard A. Moore,Karen Mungall,Ka Ming Nip,Jacqueline E. Schein,Angela Tam,Nina Thiessen,Rameen Beroukhim,Scott L. Carter,Andrew D. Cherniack,Juok Cho,Daniel DiCara,Scott Frazer,Nils Gehlenborg,David I. Heiman,Joonil Jung,Jaegil Kim,Pei Lin,Matthew Meyerson,Akinyemi I. Ojesina,Chandra Sekhar Pedamallu,Gordon Saksena,Steven E. Schumacher,Petar Stojanov,Barbara Tabak,Doug Voet,Mara Rosenberg,Travis I. Zack,Hailei Zhang,Lihua Zou,Alexei Protopopov,Netty Santoso,Semin Lee,Jianhua Zhang,Harshad S. Mahadeshwar,Jiabin Tang,Xiaojia Ren,Sahil Seth,Lixing Yang,Andrew Wei Xu,Xingzhi Song,Ruibin Xi,Christopher A. Bristow,Angela Hadjipanayis,Jonathan G. Seidman,Lynda Chin,Peter J. Park,Raju Kucherlapati,Shiyun Ling,Arvind Rao,John N. Weinstein,Sang Bae Kim,Yiling Lu,Gordon B. Mills,Moiz S. Bootwalla,Phillip H. Lai,Timothy J. Triche,David Van Den Berg,Stephen B. Baylin,James G. Herman,Bradley A. Murray,B. Arman Askoy,Giovanni Ciriello,Gideon Dresdner,Jianjiong Gao,Benjamin Gross,Anders Jacobsen,William Lee,Ricardo Ramirez,Chris Sander,Yasin Senbabaoglu,Rileen Sinha,S. Onur Sumer,Yichao Sun,Lisa Iype,Roger Kramer,Richard Kreisberg,Hector Rovira,Natalie Tasman,Santa Cruz Sam Ng,David Haussler,Josh Stuart,Roeland Verhaak,Mark D.M. Leiserson,Barry S. Taylor,Aaron D. Black,Julie Ann Carney,Julie M. Gastier-Foster,Carmen Helsel,Cynthia McAllister,Nilsa C. Ramirez,Teresa R. Tabler,Lisa Wise,Erik Zmuda,Robert Penny,Daniel Crain,Johanna Gardner,Kevin Lau,Erin Curely,David Mallery,Scott Morris,Joseph Paulauskis,Troy Shelton,Candace Shelton,Mark E. Sherman,Christopher C. Benz,Jae Hyuk Lee,Konstantin V. Fedosenko,Georgy Manikhas,Olga Potapova,Olga Voronina,Smitry Belyaev,Oleg Dolzhansky,W. Kimryn Rathmell,Jakub Brzezinski,Matthew Ibbs,Konstanty Korski,Witold Kycler,Radoslaw ŁaŸniak,Ewa Leporowska,Andrzej Mackiewicz,Dawid Murawa,Pawel Murawa,Arkadiusz Spychała,Wiktoria Maria Suchorska,Honorata Tatka,M. Teresiak,Raafat Abdel-Misih,Joseph J. Bennett,Jennifer Brown,Mary Iacocca,Brenda Rabeno,Sun Young Kwon,Ariane Kemkes,Erin Curley,Iakovina Alexopoulou,Jay Engel,John M. S. Bartlett,Monique Albert,Do-Youn Park,Rajiv Dhir,James D. Luketich,Rodney J. Landreneau,Yelena Y. Janjigian,Eunjung Cho,Marc Ladanyi,Laura H. Tang,Shannon J. McCall,Young Soo Park,Jae Ho Cheong,Jaffer A. Ajani,M. Constanza Camargo,Shelley Alonso,Brenda Ayala,Mark A. Jensen,Todd Pihl,Rohini Raman,Jessica Walton,Yunhu Wan,Greg Eley,Kenna R. Mills Shaw,Roy Tarnuzzer,Zhining Wang,Liming Yang,Jean C. Zenklusen,Tanja M. Davidsen,Carolyn M. Hutter,Heidi J. Sofia,Robert Burton,Sudha Chudamani,Jia Liu +257 more
TL;DR: A comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project is described and a molecular classification dividing gastric cancer into four subtypes is proposed.
Integrative analysis of 111 reference human epigenomes
Anshul Kundaje,Wouter Meuleman,Jason Ernst,Angela Yen,Pouya Kheradpour,Zhizhuo Zhang,Jianrong Wang,Lucas D. Ward,Abhishek Sarkar,Gerald Quon,Matthew L. Eaton,Yi-Chieh Wu,Andreas R. Pfenning,Xinchen Wang,Melina Claussnitzer,Yaping Liu,Mukul S. Bansal,Soheil Feizi-Khankandi,Ah Ram Kim,Richard C Sallari,Nicholas A Sinnott-Armstrong,Laurie A. Boyer,Elizabeta Gjoneska,Li-Huei Tsai,Manolis Kellis +24 more
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Posted ContentDOI
fastp: an ultra-fast all-in-one FASTQ preprocessor
TL;DR: Fastp is developed as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features that can perform quality control, adapter trimming, quality filtering, per-read quality cutting, and many other operations with a single scan of the FastQ data.
Journal ArticleDOI
A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD
Alan E. Renton,Elisa Majounie,Adrian James Waite,Javier Simón-Sánchez,Javier Simón-Sánchez,Sara Rollinson,J. Raphael Gibbs,J. Raphael Gibbs,Jennifer C. Schymick,Hannu Laaksovirta,John C. van Swieten,John C. van Swieten,Liisa Myllykangas,Hannu Kalimo,Anders Paetau,Yevgeniya Abramzon,Anne M. Remes,Alice Kaganovich,Sonja W. Scholz,Sonja W. Scholz,Sonja W. Scholz,Jamie Duckworth,Jinhui Ding,Daniel W. Harmer,Dena G. Hernandez,Dena G. Hernandez,Janel O. Johnson,Janel O. Johnson,Kin Y. Mok,Mina Ryten,Danyah Trabzuni,Rita Guerreiro,Richard W. Orrell,James Neal,Alexandra Murray,J. P. Pearson,Iris E. Jansen,David Sondervan,Harro Seelaar,Derek J. Blake,Kate Young,Nicola Halliwell,Janis Bennion Callister,Greg Toulson,Anna Richardson,Alexander Gerhard,Julie S. Snowden,David M. A. Mann,David Neary,Mike A. Nalls,Terhi Peuralinna,Lilja Jansson,Veli-Matti Isoviita,Anna-Lotta Kaivorinne,Maarit Hölttä-Vuori,Elina Ikonen,Raimo Sulkava,Michael Benatar,Joanne Wuu,Adriano Chiò,Gabriella Restagno,Giuseppe Borghero,Mario Sabatelli,David Heckerman,Ekaterina Rogaeva,Lorne Zinman,Jeffrey D. Rothstein,Michael Sendtner,Carsten Drepper,Evan E. Eichler,Can Alkan,Ziedulla Abdullaev,Svetlana Pack,Amalia Dutra,Evgenia Pak,John Hardy,Andrew B. Singleton,Nigel Williams,Peter Heutink,Stuart Pickering-Brown,Huw R. Morris,Huw R. Morris,Huw R. Morris,Pentti J. Tienari,Bryan J. Traynor,Bryan J. Traynor +85 more
TL;DR: The chromosome 9p21 amyotrophic lateral sclerosis-frontotemporal dementia (ALS-FTD) locus contains one of the last major unidentified autosomal-dominant genes underlying these common neurodegenerative diseases, and a large hexanucleotide repeat expansion in the first intron of C9ORF72 is shown.
References
More filters
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Journal ArticleDOI
Improved tools for biological sequence comparison.
TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Journal ArticleDOI
BLAT—The BLAST-Like Alignment Tool
TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.