Home
/
Authors
/
Kerstin Howe

Author

Kerstin Howe

Bio: Kerstin Howe is an academic researcher from Wellcome Trust Sanger Institute. The author has contributed to research in topics: Genome & Reference genome. The author has an hindex of 29, co-authored 81 publications receiving 8163 citations. Previous affiliations of Kerstin Howe include Yale University.

Topics: Genome, Reference genome, Sequence assembly, Genomics, Biology ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2004

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The zebrafish reference genome sequence and its relationship to the human genome.

[...]

Kerstin Howe, Matthew D. Clark, Carlos Torroja¹, Carlos Torroja² +171 more•Institutions (11)

25 Apr 2013-Nature

TL;DR: A high-quality sequence assembly of the zebrafish genome is generated, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map, providing a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebra fish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

...read moreread less

Abstract: Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

...read moreread less

3,573 citations

Journal Article•DOI•

Analyses of pig genomes provide insight into porcine demography and evolution

[...]

Martien A. M. Groenen¹, Alan Archibald², Hirohide Uenishi, Christopher K. Tuggle³, Yasuhiro Takeuchi⁴, Max F. Rothschild³, Claire Rogel-Gaillard⁵, Chankyu Park⁶, Denis Milan⁷, Hendrik-Jan Megens¹, Shengting Li⁸, Denis M. Larkin⁹, Heebal Kim¹⁰, Laurent A. F. Frantz¹, Mario Caccamo¹¹, Hyeonju Ahn¹⁰, Bronwen Aken¹², Anna Anselmo¹³, Christian Anthon¹⁴, Loretta Auvil¹⁵, Bouabid Badaoui¹³, Craig W. Beattie¹⁶, Christian Bendixen⁸, Daniel Berman¹⁷, Frank Blecha¹⁸, Jonas Blomberg¹⁹, Lars Bolund⁸, Mirte Bosse¹, Sara Botti¹³, Zhan Bujie⁸, Megan Bystrom³, Boris Capitanu¹⁵, Denise Carvalho-Silva²⁰, Patrick Chardon⁵, Celine Chen²¹, Ryan Cheng³, Sang-Haeng Choi, William Chow¹², Richard Clark¹², C M Clee¹², Richard P. M. A. Crooijmans¹, Harry D. Dawson²¹, Patrice Dehais⁷, Fioravante De Sapio², Bert Dibbits¹, Nizar Drou¹¹, Zhi-Qiang Du³, Kellye Eversole, João Fadista²², João Fadista⁸, Susan Fairley¹², Thomas Faraut⁷, Geoffrey J. Faulkner²², Geoffrey J. Faulkner², Katie E. Fowler²³, Merete Fredholm¹⁴, Eric Fritz³, James G. R. Gilbert¹², Elisabetta Giuffra⁵, Elisabetta Giuffra¹³, Jan Gorodkin¹⁴, Darren K. Griffin²³, Jennifer Harrow¹², Alexander Hayward²⁴, Kerstin Howe¹², Zhi-Liang Hu³, Sean Humphray²², Sean Humphray¹², Toby Hunt¹², Henrik Hornshøj⁸, Jin-Tae Jeon²⁵, Patric Jern²⁴, Matthew Jones¹², Jerzy Jurka²⁶, Hiroyuki Kanamori, Ronan Kapetanovic², Jaebum Kim¹⁵, Jaebum Kim⁶, Jae-Hwan Kim, Kyu-Won Kim, Tae-Hun Kim, Greger Larson²⁷, Kyooyeol Lee⁶, Kyung-Tai Lee, Richard M. Leggett¹¹, Harris A. Lewin²⁸, Yingrui Li, Wan Sheng Liu²⁹, Jane E. Loveland¹², Yao Lu, Joan K. Lunney¹⁷, Jian Ma¹⁵, Ole Madsen¹, Katherine M. Mann²², Katherine M. Mann¹⁷, Lucy Matthews¹², Stuart McLaren¹², Takeya Morozumi, Michael P. Murtaugh³⁰, Jitendra Narayan⁹, Dinh Truong Nguyen⁶, Peixiang Ni, Song-Jung Oh³¹, Suneel Kumar Onteru³, Frank Panitz⁸, Eung-Woo Park, Hong-Seog Park, Géraldine Pascal³², Yogesh Paudel¹, Miguel Pérez-Enciso, Ricardo H. Ramirez-Gonzalez¹¹, James M. Reecy³, Sandra L. Rodriguez-Zas¹⁵, Gary A. Rohrer¹⁷, Lauretta A. Rund¹⁵, Yongming Sang¹⁸, Kyle M. Schachtschneider¹⁵, Joshua G. Schraiber³³, John C. Schwartz³⁰, Linda Scobie³⁴, Carol Scott¹², Stephen M. J. Searle¹², Bertrand Servin⁷, Bruce R. Southey¹⁵, Göran O. Sperber¹⁹, Peter F. Stadler³⁵, Jonathan V. Sweedler¹⁵, Hakim Tafer³⁵, Bo Thomsen⁸, Rashmi Wali³⁴, Jian Wang, Jun Wang¹⁴, Simon D. M. White¹², Xun Xu, Martine Yerle⁷, Guojie Zhang, Jianguo Zhang, Jie Zhang³⁶, Shuhong Zhao³⁶, Jane Rogers¹¹, Carol Churcher¹², Lawrence B. Schook¹⁵ - Show less +138 more•Institutions (36)

Wageningen University and Research Centre¹, University of Edinburgh², Iowa State University³, University College London⁴, Agro ParisTech⁵, Konkuk University⁶, Institut national de la recherche agronomique⁷, Aarhus University⁸, Aberystwyth University⁹, Seoul National University¹⁰, Norwich Research Park¹¹, Wellcome Trust Sanger Institute¹², Parco Tecnologico Padano¹³, University of Copenhagen¹⁴, University of Illinois at Urbana–Champaign¹⁵, University of Illinois at Chicago¹⁶, Agricultural Research Service¹⁷, Kansas State University¹⁸, Uppsala University¹⁹, European Bioinformatics Institute²⁰, United States Department of Agriculture²¹, Washington University in St. Louis²², University of Kent²³, Science for Life Laboratory²⁴, Gyeongsang National University²⁵, Genetic Information Research Institute²⁶, Durham University²⁷, University of California, Davis²⁸, Pennsylvania State University²⁹, University of Minnesota³⁰, Jeju National University³¹, François Rabelais University³², University of California, Berkeley³³, Glasgow Caledonian University³⁴, Leipzig University³⁵, Huazhong Agricultural University³⁶

15 Nov 2012-Nature

TL;DR: The assembly and analysis of the genome sequence of a female domestic Duroc pig and a comparison with the genomes of wild and domestic pigs from Europe and Asia reveal a deep phylogenetic split between European and Asian wild boars ∼1 million years ago.

...read moreread less

Abstract: For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ∼1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model.

...read moreread less

1,189 citations

Journal Article•DOI•

Identifying and removing haplotypic duplication in primary genome assemblies.

[...]

Dengfeng Guan¹, Dengfeng Guan², Shane A. McCarthy¹, Jonathan Wood³, Kerstin Howe³, Yadong Wang², Richard Durbin³, Richard Durbin¹ - Show less +4 more•Institutions (3)

University of Cambridge¹, Harbin Institute of Technology², Wellcome Trust Sanger Institute³

01 May 2020-Bioinformatics

TL;DR: A novel tool, purge_dups, is presented, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps and can reduce heter allele duplication and increase assembly continuity while maintaining completeness of the primary assembly.

...read moreread less

Abstract: Motivation Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either focus only on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors. Results Here we present a novel tool, purge_dups, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps. In comparison with current tools, we demonstrate that purge_dups can reduce heterozygous duplication and increase assembly continuity while maintaining completeness of the primary assembly. Moreover, purge_dups is fully automatic and can easily be integrated into assembly pipelines. Availability and implementation The source code is written in C and is available at https://github.com/dfguan/purge_dups. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

728 citations

Journal Article•DOI•

Towards complete and error-free genome assemblies of all vertebrate species

[...]

Arang Rhie¹, Shane A. McCarthy², Shane A. McCarthy³, Olivier Fedrigo⁴, Joana Damas⁵, Giulio Formenti⁴, Sergey Koren¹, Marcela Uliano-Silva⁶, William Chow³, Arkarachai Fungtammasan, J. H. Kim⁷, Chul Hee Lee⁷, Byung June Ko⁷, Mark Chaisson⁸, Gregory Gedman⁴, Lindsey J. Cantin⁴, Françoise Thibaud-Nissen¹, Leanne Haggerty⁹, Iliana Bista³, Iliana Bista², Michelle Smith³, Bettina Haase⁴, Jacquelyn Mountcastle⁴, Sylke Winkler¹⁰, Sylke Winkler¹¹, Sadye Paez⁴, Jason T. Howard, Sonja C. Vernes¹⁰, Sonja C. Vernes¹², Sonja C. Vernes¹³, Tanya M. Lama¹⁴, Frank Grützner¹⁵, Wesley C. Warren¹⁶, Christopher N. Balakrishnan¹⁷, Dave W Burt¹⁸, Jimin George¹⁹, Matthew T. Biegler⁴, David Iorns, Andrew Digby, Daryl Eason, Bruce C. Robertson²⁰, Taylor Edwards²¹, Mark Wilkinson²², George F. Turner²³, Axel Meyer²⁴, Andreas F. Kautt²⁵, Andreas F. Kautt²⁴, Paolo Franchini²⁴, H. William Detrich²⁶, Hannes Svardal²⁷, Hannes Svardal²⁸, Maximilian Wagner²⁹, Gavin J. P. Naylor³⁰, Martin Pippel¹⁰, Milan Malinsky³, Milan Malinsky³¹, Mark Mooney, Maria Simbirsky, Brett T. Hannigan, Trevor Pesout³², Marlys L. Houck³³, Ann C Misuraca³³, Sarah B. Kingan³⁴, Richard Hall³⁴, Zev N. Kronenberg³⁴, Ivan Sović³⁴, Christopher Dunn³⁴, Zemin Ning³, Alex Hastie, Joyce V. Lee, Siddarth Selvaraj, Richard E. Green³², Nicholas H. Putnam, Ivo Gut³⁵, Jay Ghurye³⁶, Erik Garrison³², Ying Sims³, Joanna Collins³, Sarah Pelan³, James Torrance³, Alan Tracey³, Jonathan Wood³, Robel E. Dagnew⁸, Dengfeng Guan², Dengfeng Guan³⁷, Sarah E. London³⁸, David F. Clayton¹⁹, Claudio V. Mello³⁹, Samantha R. Friedrich³⁹, Peter V. Lovell³⁹, Ekaterina Osipova¹⁰, Farooq O. Al-Ajli⁴⁰, Farooq O. Al-Ajli⁴¹, Simona Secomandi⁴², Heebal Kim⁷, Constantina Theofanopoulou⁴, Michael Hiller⁴³, Yang Zhou, Robert S. Harris⁴⁴, Kateryna D. Makova⁴⁴, Paul Medvedev⁴⁴, Jinna Hoffman¹, Patrick Masterson¹, Karen Clark¹, Fergal J. Martin⁹, Kevin L. Howe⁹, Paul Flicek⁹, Brian P. Walenz¹, Woori Kwak, Hiram Clawson³², Mark Diekhans³², Luis R Nassar³², Benedict Paten³², Robert H. S. Kraus²⁴, Robert H. S. Kraus¹⁰, Andrew J. Crawford⁴⁵, M. Thomas P. Gilbert⁴⁶, M. Thomas P. Gilbert⁴⁷, Guojie Zhang, Byrappa Venkatesh⁴⁸, Robert W. Murphy⁴⁹, Klaus-Peter Koepfli⁵⁰, Beth Shapiro³², Beth Shapiro⁵¹, Warren E. Johnson⁵², Warren E. Johnson⁵⁰, Federica Di Palma⁵³, Tomas Marques-Bonet, Emma C. Teeling⁵⁴, Tandy Warnow⁵⁵, Jennifer A. Marshall Graves⁵⁶, Oliver A. Ryder⁵⁷, Oliver A. Ryder³³, David Haussler³², Stephen J. O'Brien⁵⁸, Jonas Korlach³⁴, Harris A. Lewin⁵, Kerstin Howe³, Eugene W. Myers¹⁰, Eugene W. Myers¹¹, Richard Durbin², Richard Durbin³, Adam M. Phillippy¹, Erich D. Jarvis⁵¹, Erich D. Jarvis⁴ - Show less +141 more•Institutions (58)

National Institutes of Health¹, University of Cambridge², Wellcome Trust Sanger Institute³, Rockefeller University⁴, University of California, Davis⁵, Leibniz Association⁶, Seoul National University⁷, University of Southern California⁸, European Bioinformatics Institute⁹, Max Planck Society¹⁰, Dresden University of Technology¹¹, University of St Andrews¹², Radboud University Nijmegen¹³, University of Massachusetts Amherst¹⁴, University of Adelaide¹⁵, University of Missouri¹⁶, East Carolina University¹⁷, University of Queensland¹⁸, Clemson University¹⁹, University of Otago²⁰, University of Arizona²¹, Natural History Museum²², Bangor University²³, University of Konstanz²⁴, Harvard University²⁵, Northeastern University²⁶, University of Antwerp²⁷, National Museum of Natural History²⁸, University of Graz²⁹, University of Florida³⁰, University of Basel³¹, University of California, Santa Cruz³², Zoological Society of San Diego³³, Pacific Biosciences³⁴, Pompeu Fabra University³⁵, University of Maryland, College Park³⁶, Harbin Institute of Technology³⁷, University of Chicago³⁸, Oregon Health & Science University³⁹, Qatar Airways⁴⁰, Monash University Malaysia Campus⁴¹, University of Milan⁴², Goethe University Frankfurt⁴³, Pennsylvania State University⁴⁴, University of Los Andes⁴⁵, University of Copenhagen⁴⁶, Norwegian University of Science and Technology⁴⁷, Agency for Science, Technology and Research⁴⁸, Royal Ontario Museum⁴⁹, Smithsonian Institution⁵⁰, Howard Hughes Medical Institute⁵¹, Walter Reed Army Institute of Research⁵², University of East Anglia⁵³, University College Dublin⁵⁴, University of Illinois at Urbana–Champaign⁵⁵, La Trobe University⁵⁶, University of California, San Diego⁵⁷, Nova Southeastern University⁵⁸

28 Apr 2021-Nature

TL;DR: The Vertebrate Genomes Project (VGP) as mentioned in this paper is an international effort to generate high quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

...read moreread less

Abstract: High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

...read moreread less

647 citations

Journal Article•DOI•

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

[...]

Valerie A. Schneider¹, Tina A. Graves-Lindsay², Kerstin Howe³, Nathan Bouk¹, Hsiu-Chuan Chen¹, Paul Kitts¹, Terence Murphy¹, Kim D. Pruitt¹, Françoise Thibaud-Nissen¹, Derek Albracht², Robert S. Fulton², Milinn Kremitzki², Vincent Magrini², Chris Markovic², Sean McGrath², Karyn Meltz Steinberg², Kate Auger³, William Chow³, Joanna Collins³, Glenn Harden³, Tim Hubbard³, Sarah Pelan³, Jared T. Simpson³, Glen Threadgold³, James Torrance³, Jonathan Wood³, Laura Clarke⁴, Sergey Koren¹, Matthew Boitano⁵, Paul Peluso⁵, Heng Li⁶, Chen-Shan Chin⁵, Adam M. Phillippy¹, Richard Durbin³, Richard K. Wilson², Paul Flicek⁴, Evan E. Eichler⁷, Deanna M. Church¹ - Show less +34 more•Institutions (7)

National Institutes of Health¹, Washington University in St. Louis², Wellcome Trust Sanger Institute³, European Bioinformatics Institute⁴, Pacific Biosciences⁵, Broad Institute⁶, University of Washington⁷

01 May 2017-Genome Research

TL;DR: It is asserted that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote the understanding of human biology and advance the efforts to improve health.

...read moreread less

Abstract: The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

...read moreread less

643 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

GENCODE: The reference human genome annotation for The ENCODE Project

[...]

Jennifer Harrow¹, Adam Frankish¹, José M. González¹, Electra Tapanari¹, Mark Diekhans², Felix Kokocinski¹, Bronwen Aken¹, Daniel Barrell¹, Amonida Zadissa¹, Stephen M. J. Searle¹, If H. A. Barnes¹, Alexandra Bignell¹, Veronika Boychenko¹, Toby Hunt¹, M. Kay¹, Gaurab Mukherjee¹, Jeena Rajan¹, Gloria Despacio-Reyes¹, Gary Saunders¹, Charles A. Steward¹, Rachel A. Harte², Michael F. Lin³, Cédric Howald⁴, Andrea Tanzer, Thomas Derrien⁴, Jacqueline Chrast⁴, Nathalie Walters⁴, Suganthi Balasubramanian⁵, Baikang Pei⁵, Michael L. Tress, Jose Manuel Rodriguez, Iakes Ezkurdia, Jeltje Van Baren, Michael R. Brent, David Haussler², Manolis Kellis³, Alfonso Valencia, Alexandre Reymond⁴, Mark Gerstein⁵, Roderic Guigó, Tim Hubbard¹ - Show less +37 more•Institutions (5)

Wellcome Trust Sanger Institute¹, University of California, Santa Cruz², Massachusetts Institute of Technology³, University of Lausanne⁴, Yale University⁵

01 Sep 2012-Genome Research

TL;DR: This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas.

...read moreread less

Abstract: The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

...read moreread less

4,281 citations

Journal Article•DOI•

Finishing the euchromatic sequence of the human genome

[...]

Chris P. Ponting, Daniel Barker

21 Oct 2004-Nature

TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.

...read moreread less

Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

...read moreread less

3,989 citations

Journal Article•DOI•

An RNA-Sequencing Transcriptome and Splicing Database of Glia, Neurons, and Vascular Cells of the Cerebral Cortex

[...]

Ye Zhang¹, Kenian Chen², Steven A. Sloan¹, Mariko L. Bennett¹, Anja R. Scholze¹, Sean O'Keeffe³, Hemali Phatnani³, Paolo Guarnieri⁴, Christine Caneda¹, Nadine Ruderisch⁵, Shuyun Deng², Shane A. Liddelow¹, Chaolin Zhang³, Richard Daneman⁵, Tom Maniatis³, Ben A. Barres¹, Jian Qian Wu² - Show less +13 more•Institutions (5)

Stanford University¹, University of Texas at Austin², Columbia University Medical Center³, Columbia University⁴, University of California, San Francisco⁵

03 Sep 2014-The Journal of Neuroscience

TL;DR: The authors' data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycoleytic enzyme pyruvate kinase.

...read moreread less

Abstract: The major cell classes of the brain differ in their developmental processes, metabolism, signaling, and function To better understand the functions and interactions of the cell types that comprise these classes, we acutely purified representative populations of neurons, astrocytes, oligodendrocyte precursor cells, newly formed oligodendrocytes, myelinating oligodendrocytes, microglia, endothelial cells, and pericytes from mouse cerebral cortex We generated a transcriptome database for these eight cell types by RNA sequencing and used a sensitive algorithm to detect alternative splicing events in each cell type Bioinformatic analyses identified thousands of new cell type-enriched genes and splicing isoforms that will provide novel markers for cell identification, tools for genetic manipulation, and insights into the biology of the brain For example, our data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycolytic enzyme pyruvate kinase This dataset will provide a powerful new resource for understanding the development and function of the brain To ensure the widespread distribution of these datasets, we have created a user-friendly website (http://webstanfordedu/group/barres_lab/brain_rnaseqhtml) that provides a platform for analyzing and comparing transciption and alternative splicing profiles for various cell classes in the brain

...read moreread less

3,891 citations

Journal Article•DOI•

The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003

[...]

Brigitte Boeckmann¹, Amos Marc Bairoch, Rolf Apweiler, Marie-Claude Blatter, Anne Estreicher, Elisabeth Gasteiger, Maria Jesus Martin, Karine Michoud, Claire O'Donovan, Isabelle Phan, Sandrine Pilbout, Michel Schneider - Show less +8 more•Institutions (1)

Swiss Institute of Bioinformatics¹

01 Jan 2003-Nucleic Acids Research

TL;DR: The SWISS-PROT protein knowledgebase connects amino acid sequences with the current knowledge in the Life Sciences by providing an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions.

...read moreread less

Abstract: The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.

...read moreread less

3,440 citations

Journal Article•DOI•

Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses

[...]

Moran N. Cabili¹, Cole Trapnell², Cole Trapnell¹, Loyal A. Goff¹, Magdalena J. Koziol², Magdalena J. Koziol¹, Barbara Tazon-Vega¹, Barbara Tazon-Vega², Aviv Regev¹, John L. Rinn¹, John L. Rinn² - Show less +7 more•Institutions (2)

Massachusetts Institute of Technology¹, Harvard University²

15 Sep 2011-Genes & Development

TL;DR: It is found that lincRNA expression is strikingly tissue-specific compared with coding genes, and that l incRNAs are typically coexpressed with their neighboring genes, albeit to an extent similar to that of pairs of neighboring protein-coding genes.

...read moreread less

Abstract: Large intergenic noncoding RNAs (lincRNAs) are emerging as key regulators of diverse cellular processes. Determining the function of individual lincRNAs remains a challenge. Recent advances in RNA sequencing (RNA-seq) and computational methods allow for an unprecedented analysis of such transcripts. Here, we present an integrative approach to define a reference catalog of >8000 human lincRNAs. Our catalog unifies previously existing annotation sources with transcripts we assembled from RNA-seq data collected from ~4 billion RNA-seq reads across 24 tissues and cell types. We characterize each lincRNA by a panorama of >30 properties, including sequence, structural, transcriptional, and orthology features. We found that lincRNA expression is strikingly tissue-specific compared with coding genes, and that lincRNAs are typically coexpressed with their neighboring genes, albeit to an extent similar to that of pairs of neighboring protein-coding genes. We distinguish an additional subset of transcripts that have high evolutionary conservation but may include short ORFs and may serve as either lincRNAs or small peptides. Our integrated, comprehensive, yet conservative reference catalog of human lincRNAs reveals the global properties of lincRNAs and will facilitate experimental studies and further functional classification of these genes.

...read moreread less

3,114 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse