scispace - formally typeset
Topic

Hybrid genome assembly

About: Hybrid genome assembly is a(n) research topic. Over the lifetime, 963 publication(s) have been published within this topic receiving 197259 citation(s).
Papers
More filters

Journal ArticleDOI
01 Jul 2009-Bioinformatics
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

35,234 citations


Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1, Michael C. Zody1, Jennifer Baldwin1, Keri Devon1, Ken Dewar1, Michael Doyle1, William Fitzhugh1, Roel Funke1, Diane Gage1, Katrina Harris1, Andrew Heaford1, John Howland1, Lisa Kann1, Jessica A. Lehoczky1, Rosie Levine1, Paul A. McEwan1, Kevin McKernan1, James Meldrim1, Jill P. Mesirov1, Cher Miranda1, William Morris1, Jerome Naylor1, Christina Raymond1, Mark Rosetti1, Ralph Santos1, Andrew Sheridan1, Carrie Sougnez1, Nicole Stange-Thomann1, Nikola Stojanovic1, Aravind Subramanian1, Dudley Wyman1, Jane Rogers2, John Sulston2, R Ainscough2, Stephan Beck2, David Bentley2, John Burton2, C M Clee2, Nigel P. Carter2, Alan Coulson2, Rebecca Deadman2, Panos Deloukas2, Andrew Dunham2, Ian Dunham2, Richard Durbin2, Lisa French2, Darren Grafham2, Simon G. Gregory2, Tim Hubbard2, Sean Humphray2, Adrienne Hunt2, Matthew Jones2, Christine Lloyd2, Amanda McMurray2, Lucy Matthews2, Simon Mercer2, Sarah Milne2, James C. Mullikin2, Andrew J. Mungall2, Robert W. Plumb2, Mark T. Ross2, Ratna Shownkeen2, Sarah Sims2, Robert H. Waterston3, Richard K. Wilson3, LaDeana W. Hillier3, John Douglas Mcpherson3, Marco A. Marra3, Elaine R. Mardis3, Lucinda Fulton3, Asif T. Chinwalla3, Kymberlie H. Pepin3, Warren Gish3, Stephanie L. Chissoe3, Michael C. Wendl3, Kim D. Delehaunty3, Tracie L. Miner3, Andrew Delehaunty3, Jason B. Kramer3, Lisa Cook3, Robert S. Fulton3, Douglas L. Johnson3, Patrick Minx3, Sandra W. Clifton3, Trevor Hawkins4, Elbert Branscomb4, Paul Predki4, Paul G. Richardson4, Sarah Wenning4, Tom Slezak4, Norman A. Doggett4, Jan Fang Cheng4, Anne S. Olsen4, Susan Lucas4, Christopher J. Elkin4, Edward Uberbacher4, Marvin Frazier4, Richard A. Gibbs5, Donna M. Muzny5, Steven E. Scherer5, John Bouck5, Erica Sodergren5, Kim C. Worley5, Catherine M. Rives5, James H. Gorrell5, Michael L. Metzker5, Susan L. Naylor6, Raju Kucherlapati7, David L. Nelson8, George M. Weinstock8, Yoshiyuki Sakaki, Asao Fujiyama, Masahira Hattori, Tetsushi Yada, Atsushi Toyoda, Takehiko Itoh, Chiharu Kawagoe, Hidemi Watanabe, Yasushi Totoki, Todd D. Taylor, Jean Weissenbach9, Roland Heilig9, William Saurin9, François Artiguenave9, Philippe Brottier9, Thomas Brüls9, Eric Pelletier9, Catherine Robert9, Patrick Wincker9, André Rosenthal10, Matthias Platzer10, Gerald Nyakatura10, Stefan Taudien10, Andreas Rump10, Douglas R. Smith, Lynn Doucette-Stamm, Marc Rubenfield, Keith Weinstock, Mei Lee Hong, Joann Dubois, Huanming Yang11, Jun Yu11, Jian Wang11, Guyang Huang12, Jun Gu12, Leroy Hood13, Lee Rowen13, Anup Madan13, Shizen Qin13, Ronald W. Davis14, Nancy A. Federspiel14, A. Pia Abola14, Michael Proctor14, Bruce A. Roe15, Feng Chen15, Huaqin Pan15, Juliane Ramser16, Hans Lehrach16, Richard Reinhardt16, W. Richard McCombie17, Melissa De La Bastide17, Neilay Dedhia17, H. Blöcker, K. Hornischer, Gabriele Nordsiek, Richa Agarwala10, L. Aravind10, Jeffrey A. Bailey18, Alex Bateman2, Serafim Batzoglou1, Ewan Birney, Peer Bork19, Daniel G. Brown1, Christopher B. Burge1, Lorenzo Cerutti, Hsiu Chuan Chen10, Deanna M. Church10, Michele Clamp2, Richard R. Copley, Tobias Doerks19, Sean R. Eddy3, Evan E. Eichler18, Terrence S. Furey20, James E. Galagan1, James G. R. Gilbert2, Cyrus L. Harmon21, Yoshihide Hayashizaki, David Haussler20, Henning Hermjakob, Karsten Hokamp22, Wonhee Jang10, L. Steven Johnson3, Thomas A. Jones3, Simon Kasif1, Arek Kaspryzk, Scot Kennedy20, W. James Kent20, Paul Kitts10, Eugene V. Koonin10, Ian F Korf3, David Kulp21, Doron Lancet23, Todd M. Lowe14, Aoife McLysaght22, Tarjei S. Mikkelsen1, John V. Moran24, Nicola Mulder, Victor J. Pollara1, Chris P. Ponting25, Greg Schuler10, Jörg Schultz, Guy Slater, Arian F.A. Smit13, Elia Stupka, Joseph Szustakowki1, Danielle Thierry-Mieg10, Jean Thierry-Mieg10, Lukas Wagner10, John W. Wallis3, Raymond Wheeler21, Alan Williams21, Yuri I. Wolf10, Kenneth H. Wolfe22, Shiaw Pyng Yang3, Ru Fang Yeh1, Francis S. Collins10, Mark S. Guyer10, Jane Peterson10, Adam Felsenfeld10, Kris A. Wetterstrand10, Richard M. Myers14, Jeremy Schmutz14, Mark Dickson14, Jane Grimwood14, David R. Cox14, Maynard V. Olson26, Rajinder Kaul26, Christopher K. Raymond26, Nobuyoshi Shimizu27, Kazuhiko Kawasaki27, Shinsei Minoshima27, Glen A. Evans28, Maria Athanasiou28, Roger A. Schultz28, Aristides Patrinos4, Michael J. Morgan29 
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

21,023 citations


Journal ArticleDOI
04 Mar 2009-Genome Biology
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

18,079 citations


Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1, Richard J. Mural1, Granger G. Sutton1, Hamilton O. Smith1, Mark Yandell1, Cheryl A. Evans1, Robert A. Holt1, Jeannine D. Gocayne1, Peter Amanatides1, Richard M. Ballew1, Daniel H. Huson1, Jennifer R. Wortman1, Qing Zhang1, Chinnappa D. Kodira1, Xiangqun H. Zheng1, Lin Chen1, Marian P. Skupski1, Gangadharan Subramanian1, Paul Thomas1, Jinghui Zhang1, George L. Gabor Miklos, Catherine R. Nelson2, Samuel Broder1, Andrew G. Clark3, J. H. Nadeau4, Victor A. McKusick5, Norton D. Zinder6, Arnold J. Levine6, Richard J. Roberts7, M. I. Simon8, Carolyn W. Slayman9, Michael W. Hunkapiller10, Randall Bolanos1, Arthur L. Delcher1, Ian M. Dew1, Daniel Fasulo1, Michael Flanigan1, Liliana Florea1, Aaron L. Halpern1, Sridhar Hannenhalli1, Saul A. Kravitz1, Samuel Levy1, Clark M. Mobarry1, Knut Reinert1, Karin A. Remington1, Jane Abu-Threideh1, Ellen M. Beasley1, Kendra Biddick1, Vivien Bonazzi1, Rhonda Brandon1, Michele Cargill1, Ishwar Chandramouliswaran1, Rosane Charlab1, Kabir Chaturvedi1, Zuoming Deng1, Valentina Di Francesco1, Patrick Dunn1, Karen Eilbeck1, Carlos Evangelista1, Andrei Gabrielian1, Weiniu Gan1, Wangmao Ge1, Fangcheng Gong1, Zhiping Gu1, Ping Guan1, Thomas J. Heiman1, Maureen E. Higgins1, Rui-Ru Ji1, Zhaoxi Ke1, Karen A. Ketchum1, Zhongwu Lai1, Yiding Lei1, Zhenya Li1, Jiayin Li1, Yong Liang1, Xiaoying Lin1, Fu Lu1, Gennady V. Merkulov1, Natalia Milshina1, Helen M. Moore1, Ashwinikumar K Naik1, Vaibhav A. Narayan1, Beena Neelam1, Deborah Nusskern1, Douglas B. Rusch1, Steven L. Salzberg, Wei Shao1, Bixiong Chris Shue1, Jingtao Sun1, Zhen Yuan Wang1, Aihui Wang1, Xin Wang1, Jian Wang1, Ming-Hui Wei1, Ron Wides11, Chunlin Xiao1, Chunhua Yan1, Alison Yao1, Jane Ye1, Ming Zhan1, Weiqing Zhang1, Hongyu Zhang1, Qi Zhao1, Liansheng Zheng1, Fei Zhong1, Wenyan Zhong1, Shiaoping C. Zhu1, Shaying Zhao, Dennis A. Gilbert1, Suzanna Baumhueter1, Gene Spier1, Christine Carter1, Anibal Cravchik1, Trevor Woodage1, Feroze Ali1, Huijin An1, Aderonke Awe1, Danita Baldwin1, Holly Baden1, Mary Barnstead1, Ian Barrow1, Karen Beeson1, Dana A. Busam1, Amy Carver1, Ming Lai Cheng1, Liz Curry1, Steve Danaher1, Lionel Davenport1, Raymond Desilets1, Susanne Dietz1, Kristina Dodson1, Lisa Doup1, Steven Ferriera1, Neha Garg1, Andres Gluecksmann1, Brit J. Hart1, Jason Haynes1, Charles Haynes1, Cheryl Heiner1, Suzanne Hladun1, Damon Hostin1, Jarrett Houck1, Timothy Howland1, Chinyere Ibegwam1, Jeffery Johnson1, Francis Kalush1, Lesley Kline1, Shashi Koduru1, Amy Love1, Felecia Mann1, David May1, Steven McCawley1, Tina C. McIntosh1, Ivy McMullen1, Mee Moy1, Linda Moy1, Brian Murphy1, Keith Nelson1, Cynthia Pfannkoch1, Eric Pratts1, Vinita Puri1, Hina Qureshi1, Matthew Reardon1, Robert Rodriguez1, Yu-Hui Rogers1, Deanna Romblad1, Bob Ruhfel1, Richard T. Scott1, Cynthia Sitter1, Michelle Smallwood1, Erin Stewart1, Renee Strong1, Ellen Suh1, Reginald Thomas1, Ni Ni Tint1, Sukyee Tse1, Claire Vech1, Gary Wang1, Jeremy Wetter1, Sherita Williams1, Monica Williams1, Sandra Windsor1, Emily Winn-Deen1, Keriellen Wolfe1, Jayshree Zaveri1, Karena Zaveri1, Josep F. Abril12, Roderic Guigó12, Michael J. Campbell1, Kimmen Sjölander1, Brian Karlak1, Anish Kejariwal1, Huaiyu Mi1, Betty Lazareva1, Thomas Hatton1, Apurva Narechania1, Karen Diemer1, Anushya Muruganujan1, Nan Guo1, Shinji Sato1, Vineet Bafna1, Sorin Istrail1, Ross Lippert1, Russell Schwartz1, Brian P. Walenz1, Shibu Yooseph1, David Allen1, Anand Basu1, James Baxendale1, Louis Blick1, Marcelo Caminha1, John Carnes-Stine1, Parris Caulk1, Yen-Hui Chiang1, My Coyne1, Carl Dahlke1, Anne Deslattes Mays1, Maria Dombroski1, Michael Donnelly1, Dale Ely1, Shiva Esparham1, Carl Fosler1, Harold Gire1, Stephen Glanowski1, Kenneth Glasser1, Anna Glodek1, Mark Gorokhov1, Ken Graham1, Barry Gropman1, Michael Harris1, Jeremy Heil1, Scott Henderson1, Jeffrey Hoover1, Donald Jennings1, Catherine Jordan1, James Jordan1, John Kasha1, Leonid Kagan1, Cheryl L. Kraft1, Alexander Levitsky1, Mark Lewis1, Xiangjun Liu1, John Lopez1, Daniel Ma1, William H. Majoros1, Joe McDaniel1, Sean C. Murphy1, Matthew Newman1, Trung Hieu Nguyen1, Ngoc Nguyen1, Marc Nodell1, Sue Pan1, Jim Peck1, Marshall Peterson1, William Rowe1, Robert Sanders1, John Scott1, Michael Simpson1, Thomas J. Smith1, Arlan Sprague1, Timothy B. Stockwell1, Russell Turner1, Eli Venter1, Mei Wang1, Meiyuan Wen1, David Wu1, Mitchell Wu1, Ashley Xia1, Ali Zandieh1, Xiaohong Zhu1 
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

11,645 citations


Journal ArticleDOI
15 Sep 2005-Nature
TL;DR: A scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments with 96% coverage at 99.96% accuracy in one run of the machine is described.
Abstract: The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.

8,233 citations


Network Information
Related Topics (5)
Reference genome

5.5K papers, 367.5K citations

90% related
Sequence assembly

4.3K papers, 322.5K citations

90% related
Genome project

5.7K papers, 482.2K citations

90% related
Genomics

15.4K papers, 1M citations

90% related
Contig

3.1K papers, 146.4K citations

89% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202110
202014
201913
20184
201780
201694

Top Attributes

Show by:

Topic's top 5 most impactful authors

Steven L. Salzberg

13 papers, 29.1K citations

Michael C. Schatz

10 papers, 789 citations

Sergey Koren

9 papers, 4.1K citations

Jay Shendure

8 papers, 5.6K citations

Laurent Mouchard

6 papers, 98 citations