scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A whole-genome assembly of the domestic cow, Bos taurus

TL;DR: By using independent mapping data and conserved synteny between the cow and human genomes, this work was able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes.
Abstract: Background: The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results: We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions. Conclusions: By using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
Guojie Zhang1, Guojie Zhang2, Cai Li1, Qiye Li1, Bo Li1, Denis M. Larkin3, Chul Hee Lee4, Jay F. Storz5, Agostinho Antunes6, Matthew J. Greenwold7, Robert W. Meredith8, Anders Ödeen9, Jie Cui10, Qi Zhou11, Luohao Xu1, Hailin Pan1, Zongji Wang12, Lijun Jin1, Pei Zhang1, Haofu Hu1, Wei Yang1, Jiang Hu1, Jin Xiao1, Zhikai Yang1, Yang Liu1, Qiaolin Xie1, Hao Yu1, Jinmin Lian1, Ping Wen1, Fang Zhang1, Hui Li1, Yongli Zeng1, Zijun Xiong1, Shiping Liu12, Long Zhou1, Zhiyong Huang1, Na An1, Jie Wang13, Qiumei Zheng1, Yingqi Xiong1, Guangbiao Wang1, Bo Wang1, Jingjing Wang1, Yu Fan14, Rute R. da Fonseca2, Alonzo Alfaro-Núñez2, Mikkel Schubert2, Ludovic Orlando2, Tobias Mourier2, Jason T. Howard15, Ganeshkumar Ganapathy15, Andreas R. Pfenning15, Osceola Whitney15, Miriam V. Rivas15, Erina Hara15, Julia Smith15, Marta Farré3, Jitendra Narayan16, Gancho T. Slavov16, Michael N Romanov17, Rui Borges6, João Paulo Machado6, Imran Khan6, Mark S. Springer18, John Gatesy18, Federico G. Hoffmann19, Juan C. Opazo20, Olle Håstad21, Roger H. Sawyer7, Heebal Kim4, Kyu-Won Kim4, Hyeon Jeong Kim4, Seoae Cho4, Ning Li22, Yinhua Huang22, Michael William Bruford23, Xiangjiang Zhan13, Andrew Dixon, Mads F. Bertelsen24, Elizabeth P. Derryberry25, Wesley C. Warren26, Richard K. Wilson26, Shengbin Li27, David A. Ray19, Richard E. Green28, Stephen J. O'Brien29, Darren K. Griffin17, Warren E. Johnson30, David Haussler28, Oliver A. Ryder, Eske Willerslev2, Gary R. Graves31, Per Alström21, Jon Fjeldså32, David P. Mindell33, Scott V. Edwards34, Edward L. Braun35, Carsten Rahbek32, David W. Burt36, Peter Houde37, Yong Zhang1, Huanming Yang38, Jian Wang1, Erich D. Jarvis15, M. Thomas P. Gilbert39, M. Thomas P. Gilbert2, Jun Wang 
12 Dec 2014-Science
TL;DR: This work explored bird macroevolution using full genomes from 48 avian species representing all major extant clades to reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.
Abstract: Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.

872 citations


Cites background from "A whole-genome assembly of the dome..."

  • ...Species name Common name UCSC version Publication Homo sapiens Human hg18 (23) Pan troglodytes Chimpanzee panTro2 (190) Macaca mulatta Rhesus rheMac2 (186) Otolemur garnettii Bushbaby otoGar1 (46) Tupaia belangeri tree shrew tupBel1 (46) Mus musculus Mouse mm8 (188) Rattus norvegicus Rat rn4 (192) Cavia porcellus Guinea pig cavPor2 (46) Oryctolagus cuniculus Rabbit oryCun1 (46) Sorex araneus Shrew sorAra1 (46) Erinaceus europaeus Hedgehog eriEur1 (46) Canis lupus Dog canFam2 (46) Felis catus Cat felCat3 (118) Equus caballus Horse equCab1 (184) Bos taurus Cow bosTau3 (183) Dasypus novemcinctus Armadillo dasNov1 (46) Loxodonta africana Elephant loxAfr1 (46) Echinops telfairi Tenrec echTel1 (46)...

    [...]

  • ...7 (183) Callithrix jacchus marmoset Ensembl 69 29K/0....

    [...]

Journal ArticleDOI
TL;DR: This year the Genome Browser has introduced ‘track data hubs’, which allow theGenome Browser to provide access to remotely located sets of annotations, and several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browser's image.
Abstract: The University of California Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic data sets. In the past year, the local database has been updated with four new species assemblies, and we anticipate another four will be released by the end of 2011. Further, a large number of annotation tracks have been either added, updated by contributors, or remapped to the latest human reference genome. Among these are new phenotype and disease annotations, UCSC genes, and a major dbSNP update, which required new visualization methods. Growing beyond the local database, this year we have introduced ‘track data hubs’, which allow the Genome Browser to provide access to remotely located sets of annotations. This feature is designed to significantly extend the number and variety of annotation tracks that are publicly available for visualization and analysis from within our site. We have also introduced several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browser's image.

710 citations


Cites background from "A whole-genome assembly of the dome..."

  • ...1/bosTau6) from the Center for Bioinformatics and Computational Biology, University of Maryland (17); the microbat (Myotis lucifugus) draft assembly (Broad Myoluc2....

    [...]

Journal ArticleDOI
TL;DR: The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls.
Abstract: The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes project, we sequenced the whole genomes of 234 cattle to an average of 8.3-fold coverage. This sequencing includes data for 129 individuals from the global Holstein-Friesian population, 43 individuals from the Fleckvieh breed and 15 individuals from the Jersey breed. We identified a total of 28.3 million variants, with an average of 1.44 heterozygous sites per kilobase for each individual. We demonstrate the use of this database in identifying a recessive mutation underlying embryonic death and a dominant mutation underlying lethal chrondrodysplasia. We also performed genome-wide association studies for milk production and curly coat, using imputed sequence variants, and identified variants associated with these traits in cattle.

690 citations

Journal ArticleDOI
TL;DR: This review addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT - next generation sequencing, second generation HT- NGS platforms, third generation HT -NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencer, applications, advances and future perspectives of sequencing technologies on human and animal genome research.
Abstract: The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern bioinformatics tools at unprecedented pace, the target goal of sequencing individual genomes of living organism at a cost of $1,000 each is seemed to be realistically feasible in the near future. In the relatively short time frame since 2005, the HT-NGS technologies are revolutionizing the human and animal genome researches by analysis of chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole genome genotyping, genome wide structural variation, de novo assembling and re-assembling of genome, mutation detection and carrier screening, detection of inherited disorders and complex human diseases, DNA library preparation, paired ends and genomic captures, sequencing of mitochondrial genome and personal genomics. In this review, we addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT-NGS, second generation HT-NGS platforms, third generation HT-NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencers, Nanopore, Archon Genomics X PRIZE foundation, comparison of second and third HT-NGS platforms, applications, advances and future perspectives of sequencing technologies on human and animal genome research.

690 citations


Cites background from "A whole-genome assembly of the dome..."

  • ...Zimin et al. 2009 Identification of the genetic disorder: using combined an array-based sequence capture and massively parallel sequencing approach causative mutation of Bovine Arachnomelia was identified in bovine sulfite oxidase (SUOX) gene....

    [...]

  • ...…animals genome sequencing and assembling, a series of publications and studies were published(Table 3), for bovine (Liu 2009; Elsik et al. 2009; Zimin et al. 2009), porcine (Wiedmann et al. 2008; Amaral et al. 2009; Ramos et al. 2009; Isom et al. 2010; Leifer et al. 2010), sheep (Archibald et…...

    [...]

Journal ArticleDOI
TL;DR: The relevant concepts and issues raised by the current high‐throughput DNA sequencing technologies are reviewed and compared and how future developments may overcome these limitations are analyzed.
Abstract: Recent advances in DNA sequencing have revolutionized the field of genomics, making it possible for even single research groups to generate large amounts of sequence data very rapidly and at a substantially lower cost. These high-throughput sequencing technologies make deep transcriptome sequencing and transcript quantification, whole genome sequencing and resequencing available to many more researchers and projects. However, while the cost and time have been greatly reduced, the error profiles and limitations of the new platforms differ significantly from those of previous sequencing technologies. The selection of an appropriate sequencing platform for particular types of experiments is an important consideration, and requires a detailed understanding of the technologies available; including sources of error, error rate, as well as the speed and cost of sequencing. We review the relevant concepts and compare the issues raised by the current high-throughput DNA sequencing technologies. We analyze how future developments may overcome these limitations and what challenges remain.

651 citations

References
More filters
Journal ArticleDOI
TL;DR: This book aims to provide a history of Chinese modern art from 17th Century to the present day through the lens of 20th Century critics, practitioners, journalists, and mediaeval and modern-day critics.
Abstract: J. Craig Venter,* Mark D. Adams, Eugene W. Myers, Peter W. Li, Richard J. Mural, Granger G. Sutton, Hamilton O. Smith, Mark Yandell, Cheryl A. Evans, Robert A. Holt, Jeannine D. Gocayne, Peter Amanatides, Richard M. Ballew, Daniel H. Huson, Jennifer Russo Wortman, Qing Zhang, Chinnappa D. Kodira, Xiangqun H. Zheng, Lin Chen, Marian Skupski, Gangadharan Subramanian, Paul D. Thomas, Jinghui Zhang, George L. Gabor Miklos, Catherine Nelson, Samuel Broder, Andrew G. Clark, Joe Nadeau, Victor A. McKusick, Norton Zinder, Arnold J. Levine, Richard J. Roberts, Mel Simon, Carolyn Slayman, Michael Hunkapiller, Randall Bolanos, Arthur Delcher, Ian Dew, Daniel Fasulo, Michael Flanigan, Liliana Florea, Aaron Halpern, Sridhar Hannenhalli, Saul Kravitz, Samuel Levy, Clark Mobarry, Knut Reinert, Karin Remington, Jane Abu-Threideh, Ellen Beasley, Kendra Biddick, Vivien Bonazzi, Rhonda Brandon, Michele Cargill, Ishwar Chandramouliswaran, Rosane Charlab, Kabir Chaturvedi, Zuoming Deng, Valentina Di Francesco, Patrick Dunn, Karen Eilbeck, Carlos Evangelista, Andrei E. Gabrielian, Weiniu Gan, Wangmao Ge, Fangcheng Gong, Zhiping Gu, Ping Guan, Thomas J. Heiman, Maureen E. Higgins, Rui-Ru Ji, Zhaoxi Ke, Karen A. Ketchum, Zhongwu Lai, Yiding Lei, Zhenya Li, Jiayin Li, Yong Liang, Xiaoying Lin, Fu Lu, Gennady V. Merkulov, Natalia Milshina, Helen M. Moore, Ashwinikumar K Naik, Vaibhav A. Narayan, Beena Neelam, Deborah Nusskern, Douglas B. Rusch, Steven Salzberg, Wei Shao, Bixiong Shue, Jingtao Sun, Zhen Yuan Wang, Aihui Wang, Xin Wang, Jian Wang, Ming-Hui Wei, Ron Wides, Chunlin Xiao, Chunhua Yan, Alison Yao, Jane Ye, Ming Zhan, Weiqing Zhang, Hongyu Zhang, Qi Zhao, Liansheng Zheng, Fei Zhong, Wenyan Zhong, Shiaoping C. Zhu, Shaying Zhao, Dennis Gilbert, Suzanna Baumhueter, Gene Spier, Christine Carter, Anibal Cravchik, Trevor Woodage, Feroze Ali, Huijin An, Aderonke Awe, Danita Baldwin, Holly Baden, Mary Barnstead, Ian Barrow, Karen Beeson, Dana Busam, Amy Carver, Angela Center, Ming Lai Cheng, Liz Curry, Steve Danaher, Lionel Davenport, Raymond Desilets, Susanne Dietz, Kristina Dodson, Lisa Doup, Steven Ferriera, Neha Garg, Andres Gluecksmann, Brit Hart, Jason Haynes, Charles Haynes, Cheryl Heiner, Suzanne Hladun, Damon Hostin, Jarrett Houck, Timothy Howland, Chinyere Ibegwam, Jeffery Johnson, Francis Kalush, Lesley Kline, Shashi Koduru, Amy Love, Felecia Mann, David May, Steven McCawley, Tina McIntosh, Ivy McMullen, Mee Moy, Linda Moy, Brian Murphy, Keith Nelson, Cynthia Pfannkoch, Eric Pratts, Vinita Puri, Hina Qureshi, Matthew Reardon, Robert Rodriguez, Yu-Hui Rogers, Deanna Romblad, Bob Ruhfel, Richard Scott, Cynthia Sitter, Michelle Smallwood, Erin Stewart, Renee Strong, Ellen Suh, Reginald Thomas, Ni Ni Tint, Sukyee Tse, Claire Vech, Gary Wang, Jeremy Wetter, Sherita Williams, Monica Williams, Sandra Windsor, Emily Winn-Deen, Keriellen Wolfe, Jayshree Zaveri, Karena Zaveri, Josep F. Abril, Roderic Guigó, Michael J. Campbell, Kimmen V. Sjolander, Brian Karlak, Anish Kejariwal, Huaiyu Mi, Betty Lazareva, Thomas Hatton, Apurva Narechania, Karen Diemer, Anushya Muruganujan, Nan Guo, Shinji Sato, Vineet Bafna, Sorin Istrail, Ross Lippert, Russell Schwartz, Brian Walenz, Shibu Yooseph, David Allen, Anand Basu, James Baxendale, Louis Blick, Marcelo Caminha, John Carnes-Stine, Parris Caulk, Yen-Hui Chiang, My Coyne, Carl Dahlke, Anne Deslattes Mays, Maria Dombroski, Michael Donnelly, Dale Ely, Shiva Esparham, Carl Fosler, Harold Gire, Stephen Glanowski, Kenneth Glasser, Anna Glodek, Mark Gorokhov, Ken Graham, Barry Gropman, Michael Harris, Jeremy Heil, Scott Henderson, Jeffrey Hoover, Donald Jennings, Catherine Jordan, James Jordan, John Kasha, Leonid Kagan, Cheryl Kraft, Alexander Levitsky, Mark Lewis, Xiangjun Liu, John Lopez, Daniel Ma, William Majoros, Joe McDaniel, Sean Murphy, Matthew Newman, Trung Nguyen, Ngoc Nguyen, Marc Nodell, Sue Pan, Jim Peck, Marshall Peterson, William Rowe, Robert Sanders, John Scott, Michael Simpson, Thomas Smith, Arlan Sprague, Timothy Stockwell, Russell Turner, Eli Venter, Mei Wang, Meiyuan Wen, David Wu, Mitchell Wu, Ashley Xia, Ali Zandieh, Xiaohong Zhu T H E H U M A N G E N O M E

5,205 citations


"A whole-genome assembly of the dome..." refers background in this paper

  • ...Background Seven years after the first whole-genome assembly of the human genome [1], sequencing and assembly of mammalian genomes has become almost routine....

    [...]

Journal ArticleDOI
TL;DR: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.
Abstract: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at http://www.tigr.org/software/mummer.

4,886 citations


"A whole-genome assembly of the dome..." refers methods in this paper

  • ...The MUMmer package [14] was used for these alignments and for the Cmap alignments....

    [...]

  • ...First, all cow scaffolds were aligned to the human genome using nucmer [14] with its maximal unique match (mum) option in order to avoid alignments of repetitive sequence....

    [...]

Journal ArticleDOI
TL;DR: The National Center for Biotechnology Information Reference Sequence (RefSeq) database provides a non-redundant collection of sequences representing genomic data, transcripts and proteins that pragmatically includes sequence data that are currently publicly available in the archival databases.
Abstract: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

4,229 citations


"A whole-genome assembly of the dome..." refers methods in this paper

  • ...Messenger RNA alignment Known full-length gene sequences were downloaded from the RefSeq project at NCBI (release date: November 10, 2008) [18]....

    [...]

Journal ArticleDOI
21 Oct 2004-Nature
TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

3,989 citations

Journal ArticleDOI
TL;DR: GMAP, a standalone program for mapping and aligning cDNA sequences to a genome with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets, demonstrates a several-fold increase in speed over existing programs.
Abstract: Motivation: We introduce gmap, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. Results: On a set of human messenger RNAs with random mutations at a 1 and 3% rate, gmap identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, gmap provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, gmap performed comparably with GeneSeqer. In these experiments, gmap demonstrated a several-fold increase in speed over existing programs. Availability: Source code for gmap and associated programs is available at http://www.gene.com/share/gmap Contact: [email protected] Supplementary information: http://www.gene.com/share/gmap

2,058 citations


"A whole-genome assembly of the dome..." refers methods in this paper

  • ...Alignments were also produced with an alternative mapping tool, GMAP [21], and used to confirm and classify the observed discrepancies in gene content between the two assemblies....

    [...]